Mercurial > emacs
annotate lispref/searching.texi @ 33863:2e449f784ca7
(init_from_display_pos): If POS says we're already after
an overlay string ending at POS, make sure to pop the iterator
because it will be in front of that overlay string. When POS is
ZV, we've thereby also ``processed'' overlay strings at ZV.
author | Gerd Moellmann <gerd@gnu.org> |
---|---|
date | Fri, 24 Nov 2000 19:29:05 +0000 |
parents | c3aecbe98b99 |
children | d3872b19023d |
rev | line source |
---|---|
6552 | 1 @c -*-texinfo-*- |
2 @c This is part of the GNU Emacs Lisp Reference Manual. | |
27189 | 3 @c Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995, 1998, 1999 |
4 @c Free Software Foundation, Inc. | |
6552 | 5 @c See the file elisp.texi for copying conditions. |
6 @setfilename ../info/searching | |
21007
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
7 @node Searching and Matching, Syntax Tables, Non-ASCII Characters, Top |
6552 | 8 @chapter Searching and Matching |
9 @cindex searching | |
10 | |
11 GNU Emacs provides two ways to search through a buffer for specified | |
12 text: exact string searches and regular expression searches. After a | |
13 regular expression search, you can examine the @dfn{match data} to | |
14 determine which text matched the whole regular expression or various | |
15 portions of it. | |
16 | |
17 @menu | |
18 * String Search:: Search for an exact match. | |
19 * Regular Expressions:: Describing classes of strings. | |
20 * Regexp Search:: Searching for a match for a regexp. | |
12067 | 21 * POSIX Regexps:: Searching POSIX-style for the longest match. |
6552 | 22 * Search and Replace:: Internals of @code{query-replace}. |
23 * Match Data:: Finding out which part of the text matched | |
24 various parts of a regexp, after regexp search. | |
25 * Searching and Case:: Case-independent or case-significant searching. | |
26 * Standard Regexps:: Useful regexps for finding sentences, pages,... | |
27 @end menu | |
28 | |
29 The @samp{skip-chars@dots{}} functions also perform a kind of searching. | |
30 @xref{Skipping Characters}. | |
31 | |
32 @node String Search | |
33 @section Searching for Strings | |
34 @cindex string search | |
35 | |
36 These are the primitive functions for searching through the text in a | |
37 buffer. They are meant for use in programs, but you may call them | |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
38 interactively. If you do so, they prompt for the search string; the |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
39 arguments @var{limit} and @var{noerror} are @code{nil}, and @var{repeat} |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
40 is 1. |
6552 | 41 |
21007
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
42 These search functions convert the search string to multibyte if the |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
43 buffer is multibyte; they convert the search string to unibyte if the |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
44 buffer is unibyte. @xref{Text Representations}. |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
45 |
6552 | 46 @deffn Command search-forward string &optional limit noerror repeat |
21007
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
47 This function searches forward from point for an exact match for |
6552 | 48 @var{string}. If successful, it sets point to the end of the occurrence |
49 found, and returns the new value of point. If no match is found, the | |
50 value and side effects depend on @var{noerror} (see below). | |
51 @c Emacs 19 feature | |
52 | |
21007
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
53 In the following example, point is initially at the beginning of the |
6552 | 54 line. Then @code{(search-forward "fox")} moves point after the last |
55 letter of @samp{fox}: | |
56 | |
57 @example | |
58 @group | |
59 ---------- Buffer: foo ---------- | |
60 @point{}The quick brown fox jumped over the lazy dog. | |
61 ---------- Buffer: foo ---------- | |
62 @end group | |
63 | |
64 @group | |
65 (search-forward "fox") | |
66 @result{} 20 | |
67 | |
68 ---------- Buffer: foo ---------- | |
69 The quick brown fox@point{} jumped over the lazy dog. | |
70 ---------- Buffer: foo ---------- | |
71 @end group | |
72 @end example | |
73 | |
21007
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
74 The argument @var{limit} specifies the upper bound to the search. (It |
6552 | 75 must be a position in the current buffer.) No match extending after |
76 that position is accepted. If @var{limit} is omitted or @code{nil}, it | |
77 defaults to the end of the accessible portion of the buffer. | |
78 | |
79 @kindex search-failed | |
21007
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
80 What happens when the search fails depends on the value of |
6552 | 81 @var{noerror}. If @var{noerror} is @code{nil}, a @code{search-failed} |
82 error is signaled. If @var{noerror} is @code{t}, @code{search-forward} | |
83 returns @code{nil} and does nothing. If @var{noerror} is neither | |
84 @code{nil} nor @code{t}, then @code{search-forward} moves point to the | |
21007
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
85 upper bound and returns @code{nil}. (It would be more consistent now to |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
86 return the new position of point in that case, but some existing |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
87 programs may depend on a value of @code{nil}.) |
6552 | 88 |
8427
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
89 If @var{repeat} is supplied (it must be a positive number), then the |
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
90 search is repeated that many times (each time starting at the end of the |
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
91 previous time's match). If these successive searches succeed, the |
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
92 function succeeds, moving point and returning its new value. Otherwise |
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
93 the search fails. |
6552 | 94 @end deffn |
95 | |
96 @deffn Command search-backward string &optional limit noerror repeat | |
97 This function searches backward from point for @var{string}. It is | |
98 just like @code{search-forward} except that it searches backwards and | |
99 leaves point at the beginning of the match. | |
100 @end deffn | |
101 | |
102 @deffn Command word-search-forward string &optional limit noerror repeat | |
103 @cindex word search | |
104 This function searches forward from point for a ``word'' match for | |
105 @var{string}. If it finds a match, it sets point to the end of the | |
106 match found, and returns the new value of point. | |
107 @c Emacs 19 feature | |
108 | |
109 Word matching regards @var{string} as a sequence of words, disregarding | |
110 punctuation that separates them. It searches the buffer for the same | |
111 sequence of words. Each word must be distinct in the buffer (searching | |
112 for the word @samp{ball} does not match the word @samp{balls}), but the | |
113 details of punctuation and spacing are ignored (searching for @samp{ball | |
114 boy} does match @samp{ball. Boy!}). | |
115 | |
116 In this example, point is initially at the beginning of the buffer; the | |
117 search leaves it between the @samp{y} and the @samp{!}. | |
118 | |
119 @example | |
120 @group | |
121 ---------- Buffer: foo ---------- | |
122 @point{}He said "Please! Find | |
123 the ball boy!" | |
124 ---------- Buffer: foo ---------- | |
125 @end group | |
126 | |
127 @group | |
128 (word-search-forward "Please find the ball, boy.") | |
129 @result{} 35 | |
130 | |
131 ---------- Buffer: foo ---------- | |
132 He said "Please! Find | |
133 the ball boy@point{}!" | |
134 ---------- Buffer: foo ---------- | |
135 @end group | |
136 @end example | |
137 | |
138 If @var{limit} is non-@code{nil} (it must be a position in the current | |
139 buffer), then it is the upper bound to the search. The match found must | |
140 not extend after that position. | |
141 | |
142 If @var{noerror} is @code{nil}, then @code{word-search-forward} signals | |
143 an error if the search fails. If @var{noerror} is @code{t}, then it | |
144 returns @code{nil} instead of signaling an error. If @var{noerror} is | |
145 neither @code{nil} nor @code{t}, it moves point to @var{limit} (or the | |
146 end of the buffer) and returns @code{nil}. | |
147 | |
148 If @var{repeat} is non-@code{nil}, then the search is repeated that many | |
149 times. Point is positioned at the end of the last match. | |
150 @end deffn | |
151 | |
152 @deffn Command word-search-backward string &optional limit noerror repeat | |
153 This function searches backward from point for a word match to | |
154 @var{string}. This function is just like @code{word-search-forward} | |
155 except that it searches backward and normally leaves point at the | |
156 beginning of the match. | |
157 @end deffn | |
158 | |
159 @node Regular Expressions | |
160 @section Regular Expressions | |
161 @cindex regular expression | |
162 @cindex regexp | |
163 | |
164 A @dfn{regular expression} (@dfn{regexp}, for short) is a pattern that | |
165 denotes a (possibly infinite) set of strings. Searching for matches for | |
166 a regexp is a very powerful operation. This section explains how to write | |
167 regexps; the following section says how to search for them. | |
168 | |
169 @menu | |
170 * Syntax of Regexps:: Rules for writing regular expressions. | |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
171 * Regexp Functions:: Functions for operating on regular expressions. |
6552 | 172 * Regexp Example:: Illustrates regular expression syntax. |
173 @end menu | |
174 | |
175 @node Syntax of Regexps | |
176 @subsection Syntax of Regular Expressions | |
177 | |
8427
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
178 Regular expressions have a syntax in which a few characters are |
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
179 special constructs and the rest are @dfn{ordinary}. An ordinary |
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
180 character is a simple regular expression that matches that character and |
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
181 nothing else. The special characters are @samp{.}, @samp{*}, @samp{+}, |
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
182 @samp{?}, @samp{[}, @samp{]}, @samp{^}, @samp{$}, and @samp{\}; no new |
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
183 special characters will be defined in the future. Any other character |
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
184 appearing in a regular expression is ordinary, unless a @samp{\} |
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
185 precedes it. |
6552 | 186 |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
187 For example, @samp{f} is not a special character, so it is ordinary, and |
6552 | 188 therefore @samp{f} is a regular expression that matches the string |
189 @samp{f} and no other string. (It does @emph{not} match the string | |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
190 @samp{fg}, but it does match a @emph{part} of that string.) Likewise, |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
191 @samp{o} is a regular expression that matches only @samp{o}.@refill |
6552 | 192 |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
193 Any two regular expressions @var{a} and @var{b} can be concatenated. The |
8427
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
194 result is a regular expression that matches a string if @var{a} matches |
6552 | 195 some amount of the beginning of that string and @var{b} matches the rest of |
196 the string.@refill | |
197 | |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
198 As a simple example, we can concatenate the regular expressions @samp{f} |
6552 | 199 and @samp{o} to get the regular expression @samp{fo}, which matches only |
200 the string @samp{fo}. Still trivial. To do something more powerful, you | |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
201 need to use one of the special regular expression constructs. |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
202 |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
203 @menu |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
204 * Regexp Special:: Special characters in regular expressions. |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
205 * Char Classes:: Character classes used in regular expressions. |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
206 * Regexp Backslash:: Backslash-sequences in regular expressions. |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
207 @end menu |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
208 |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
209 @node Regexp Special |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
210 @subsubsection Special Characters in Regular Expressions |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
211 |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
212 Here is a list of the characters that are special in a regular |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
213 expression. |
6552 | 214 |
22274
f0cd03a7dac9
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22252
diff
changeset
|
215 @need 800 |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
216 @table @asis |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
217 @item @samp{.}@: @r{(Period)} |
6552 | 218 @cindex @samp{.} in regexp |
219 is a special character that matches any single character except a newline. | |
220 Using concatenation, we can make regular expressions like @samp{a.b}, which | |
221 matches any three-character string that begins with @samp{a} and ends with | |
222 @samp{b}.@refill | |
223 | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
224 @item @samp{*} |
6552 | 225 @cindex @samp{*} in regexp |
17884
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
226 is not a construct by itself; it is a postfix operator that means to |
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
227 match the preceding regular expression repetitively as many times as |
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
228 possible. Thus, @samp{o*} matches any number of @samp{o}s (including no |
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
229 @samp{o}s). |
6552 | 230 |
231 @samp{*} always applies to the @emph{smallest} possible preceding | |
17884
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
232 expression. Thus, @samp{fo*} has a repeating @samp{o}, not a repeating |
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
233 @samp{fo}. It matches @samp{f}, @samp{fo}, @samp{foo}, and so on. |
6552 | 234 |
21007
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
235 The matcher processes a @samp{*} construct by matching, immediately, as |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
236 many repetitions as can be found. Then it continues with the rest of |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
237 the pattern. If that fails, backtracking occurs, discarding some of the |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
238 matches of the @samp{*}-modified construct in the hope that that will |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
239 make it possible to match the rest of the pattern. For example, in |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
240 matching @samp{ca*ar} against the string @samp{caaar}, the @samp{a*} |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
241 first tries to match all three @samp{a}s; but the rest of the pattern is |
6552 | 242 @samp{ar} and there is only @samp{r} left to match, so this try fails. |
21007
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
243 The next alternative is for @samp{a*} to match only two @samp{a}s. With |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
244 this choice, the rest of the regexp matches successfully.@refill |
6552 | 245 |
11651
f43818d3bbd8
Warn about nested repetition.
Richard M. Stallman <rms@gnu.org>
parents:
10038
diff
changeset
|
246 Nested repetition operators can be extremely slow if they specify |
12067 | 247 backtracking loops. For example, it could take hours for the regular |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
248 expression @samp{\(x+y*\)*a} to try to match the sequence |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
249 @samp{xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxz}, before it ultimately fails. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
250 The slowness is because Emacs must try each imaginable way of grouping |
22252
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
251 the 35 @samp{x}s before concluding that none of them can work. To make |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
252 sure your regular expressions run fast, check nested repetitions |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
253 carefully. |
11651
f43818d3bbd8
Warn about nested repetition.
Richard M. Stallman <rms@gnu.org>
parents:
10038
diff
changeset
|
254 |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
255 @item @samp{+} |
6552 | 256 @cindex @samp{+} in regexp |
17884
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
257 is a postfix operator, similar to @samp{*} except that it must match |
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
258 the preceding expression at least once. So, for example, @samp{ca+r} |
6552 | 259 matches the strings @samp{car} and @samp{caaaar} but not the string |
260 @samp{cr}, whereas @samp{ca*r} matches all three strings. | |
261 | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
262 @item @samp{?} |
6552 | 263 @cindex @samp{?} in regexp |
21007
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
264 is a postfix operator, similar to @samp{*} except that it must match the |
17884
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
265 preceding expression either once or not at all. For example, |
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
266 @samp{ca?r} matches @samp{car} or @samp{cr}; nothing else. |
6552 | 267 |
27095
7cc86d68ccf8
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
26783
diff
changeset
|
268 @item @samp{*?}, @samp{+?}, @samp{??} |
7cc86d68ccf8
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
26783
diff
changeset
|
269 These are ``non-greedy'' variants of the operators @samp{*}, @samp{+} |
7cc86d68ccf8
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
26783
diff
changeset
|
270 and @samp{?}. Where those operators match the largest possible |
7cc86d68ccf8
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
26783
diff
changeset
|
271 substring (consistent with matching the entire containing expression), |
7cc86d68ccf8
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
26783
diff
changeset
|
272 the non-greedy variants match the smallest possible substring |
7cc86d68ccf8
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
26783
diff
changeset
|
273 (consistent with matching the entire containing expression). |
7cc86d68ccf8
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
26783
diff
changeset
|
274 |
7cc86d68ccf8
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
26783
diff
changeset
|
275 For example, the regular expression @samp{c[ad]*a} when applied to the |
7cc86d68ccf8
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
26783
diff
changeset
|
276 string @samp{cdaaada} matches the whole string; but the regular |
7cc86d68ccf8
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
26783
diff
changeset
|
277 expression @samp{c[ad]*?a}, applied to that same string, matches just |
7cc86d68ccf8
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
26783
diff
changeset
|
278 @samp{cda}. (The smallest possible match here for @samp{[ad]*?} that |
7cc86d68ccf8
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
26783
diff
changeset
|
279 permits the whole expression to match is @samp{d}.) |
7cc86d68ccf8
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
26783
diff
changeset
|
280 |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
281 @item @samp{[ @dots{} ]} |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
282 @cindex character alternative (in regexp) |
6552 | 283 @cindex @samp{[} in regexp |
284 @cindex @samp{]} in regexp | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
285 is a @dfn{character alternative}, which begins with @samp{[} and is |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
286 terminated by @samp{]}. In the simplest case, the characters between |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
287 the two brackets are what this character alternative can match. |
6552 | 288 |
17884
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
289 Thus, @samp{[ad]} matches either one @samp{a} or one @samp{d}, and |
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
290 @samp{[ad]*} matches any string composed of just @samp{a}s and @samp{d}s |
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
291 (including the empty string), from which it follows that @samp{c[ad]*r} |
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
292 matches @samp{cr}, @samp{car}, @samp{cdr}, @samp{caddaar}, etc. |
6552 | 293 |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
294 You can also include character ranges in a character alternative, by |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
295 writing the starting and ending characters with a @samp{-} between them. |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
296 Thus, @samp{[a-z]} matches any lower-case @sc{ascii} letter. Ranges may be |
17884
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
297 intermixed freely with individual characters, as in @samp{[a-z$%.]}, |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
298 which matches any lower case @sc{ascii} letter or @samp{$}, @samp{%} or |
17884
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
299 period. |
6552 | 300 |
17884
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
301 Note that the usual regexp special characters are not special inside a |
24934 | 302 character alternative. A completely different set of characters is |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
303 special inside character alternatives: @samp{]}, @samp{-} and @samp{^}. |
17884
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
304 |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
305 To include a @samp{]} in a character alternative, you must make it the |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
306 first character. For example, @samp{[]a]} matches @samp{]} or @samp{a}. |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
307 To include a @samp{-}, write @samp{-} as the first or last character of |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
308 the character alternative, or put it after a range. Thus, @samp{[]-]} |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
309 matches both @samp{]} and @samp{-}. |
6552 | 310 |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
311 To include @samp{^} in a character alternative, put it anywhere but at |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
312 the beginning. |
6552 | 313 |
32464 | 314 The beginning and end of a range of multibyte characters must be in the |
315 same character set (@pxref{Character Sets}). Thus, @samp{[\x8e0-\x97c]} | |
316 is invalid because character 0x8e0 (@samp{a} with grave accent) is in | |
317 the Emacs character set for Latin-1 but the character 0x97c (@samp{u} | |
318 with diaeresis) is in the Emacs character set for Latin-2. | |
319 | |
320 If a range starts with a unibyte character @var{c} and ends with a | |
321 multibyte character @var{c2}, the range is divided into two parts: one | |
322 is @samp{@var{c}..?\377}, the other is @samp{@var{c1}..@var{c2}}, where | |
323 @var{c1} is the first character of the charset to which @var{c2} | |
324 belongs. | |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
325 |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
326 You cannot always match all non-@sc{ascii} characters with the regular |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
327 expression @samp{[\200-\377]}. This works when searching a unibyte |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
328 buffer or string (@pxref{Text Representations}), but not in a multibyte |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
329 buffer or string, because many non-@sc{ascii} characters have codes |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
330 above octal 0377. However, the regular expression @samp{[^\000-\177]} |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
331 does match all non-@sc{ascii} characters (see below regarding @samp{^}), |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
332 in both multibyte and unibyte representations, because only the |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
333 @sc{ascii} characters are excluded. |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
334 |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
335 Starting in Emacs 21, a character alternative can also specify named |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
336 character classes (@pxref{Char Classes}). This is a POSIX feature whose |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
337 syntax is @samp{[:@var{class}:]}. Using a character class is equivalent |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
338 to mentioning each of the characters in that class; but the latter is |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
339 not feasible in practice, since some classes include thousands of |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
340 different characters. |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
341 |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
342 @item @samp{[^ @dots{} ]} |
6552 | 343 @cindex @samp{^} in regexp |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
344 @samp{[^} begins a @dfn{complemented character alternative}, which matches any |
17884
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
345 character except the ones specified. Thus, @samp{[^a-z0-9A-Z]} matches |
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
346 all characters @emph{except} letters and digits. |
6552 | 347 |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
348 @samp{^} is not special in a character alternative unless it is the first |
6552 | 349 character. The character following the @samp{^} is treated as if it |
17884
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
350 were first (in other words, @samp{-} and @samp{]} are not special there). |
6552 | 351 |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
352 A complemented character alternative can match a newline, unless newline is |
17884
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
353 mentioned as one of the characters not to match. This is in contrast to |
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
354 the handling of regexps in programs such as @code{grep}. |
6552 | 355 |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
356 @item @samp{^} |
6552 | 357 @cindex beginning of line in regexp |
8427
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
358 is a special character that matches the empty string, but only at the |
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
359 beginning of a line in the text being matched. Otherwise it fails to |
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
360 match anything. Thus, @samp{^foo} matches a @samp{foo} that occurs at |
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
361 the beginning of a line. |
6552 | 362 |
8427
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
363 When matching a string instead of a buffer, @samp{^} matches at the |
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
364 beginning of the string or after a newline character @samp{\n}. |
6552 | 365 |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
366 For historical compatibility reasons, @samp{^} can be used only at the |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
367 beginning of the regular expression, or after @samp{\(} or @samp{\|}. |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
368 |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
369 @item @samp{$} |
6552 | 370 @cindex @samp{$} in regexp |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
371 @cindex end of line in regexp |
6552 | 372 is similar to @samp{^} but matches only at the end of a line. Thus, |
373 @samp{x+$} matches a string of one @samp{x} or more at the end of a line. | |
374 | |
8427
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
375 When matching a string instead of a buffer, @samp{$} matches at the end |
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
376 of the string or before a newline character @samp{\n}. |
6552 | 377 |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
378 For historical compatibility reasons, @samp{$} can be used only at the |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
379 end of the regular expression, or before @samp{\)} or @samp{\|}. |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
380 |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
381 @item @samp{\} |
6552 | 382 @cindex @samp{\} in regexp |
383 has two functions: it quotes the special characters (including | |
384 @samp{\}), and it introduces additional special constructs. | |
385 | |
386 Because @samp{\} quotes special characters, @samp{\$} is a regular | |
8427
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
387 expression that matches only @samp{$}, and @samp{\[} is a regular |
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
388 expression that matches only @samp{[}, and so on. |
6552 | 389 |
390 Note that @samp{\} also has special meaning in the read syntax of Lisp | |
391 strings (@pxref{String Type}), and must be quoted with @samp{\}. For | |
392 example, the regular expression that matches the @samp{\} character is | |
393 @samp{\\}. To write a Lisp string that contains the characters | |
394 @samp{\\}, Lisp syntax requires you to quote each @samp{\} with another | |
395 @samp{\}. Therefore, the read syntax for a regular expression matching | |
396 @samp{\} is @code{"\\\\"}.@refill | |
397 @end table | |
398 | |
7735
7db892210924
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7086
diff
changeset
|
399 @strong{Please note:} For historical compatibility, special characters |
6552 | 400 are treated as ordinary ones if they are in contexts where their special |
401 meanings make no sense. For example, @samp{*foo} treats @samp{*} as | |
402 ordinary since there is no preceding expression on which the @samp{*} | |
8427
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
403 can act. It is poor practice to depend on this behavior; quote the |
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
404 special character anyway, regardless of where it appears.@refill |
6552 | 405 |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
406 @node Char Classes |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
407 @subsubsection Character Classes |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
408 @cindex character classes in regexp |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
409 |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
410 Here is a table of the classes you can use in a character alternative, |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
411 in Emacs 21, and what they mean: |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
412 |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
413 @table @samp |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
414 @item [:ascii:] |
27373
a6d5729aef1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27193
diff
changeset
|
415 This matches any @sc{ascii} (unibyte) character. |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
416 @item [:alnum:] |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
417 This matches any letter or digit. (At present, for multibyte |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
418 characters, it matches anything that has word syntax.) |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
419 @item [:alpha:] |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
420 This matches any letter. (At present, for multibyte characters, it |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
421 matches anything that has word syntax.) |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
422 @item [:blank:] |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
423 This matches space and tab only. |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
424 @item [:cntrl:] |
27373
a6d5729aef1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27193
diff
changeset
|
425 This matches any @sc{ascii} control character. |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
426 @item [:digit:] |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
427 This matches @samp{0} through @samp{9}. Thus, @samp{[-+[:digit:]]} |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
428 matches any digit, as well as @samp{+} and @samp{-}. |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
429 @item [:graph:] |
27374
0f5edee5242b
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27373
diff
changeset
|
430 This matches graphic characters---everything except @sc{ascii} control |
0f5edee5242b
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27373
diff
changeset
|
431 characters, space, and the delete character. |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
432 @item [:lower:] |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
433 This matches any lower-case letter, as determined by |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
434 the current case table (@pxref{Case Tables}). |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
435 @item [:nonascii:] |
27373
a6d5729aef1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27193
diff
changeset
|
436 This matches any non-@sc{ascii} (multibyte) character. |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
437 @item [:print:] |
27373
a6d5729aef1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27193
diff
changeset
|
438 This matches printing characters---everything except @sc{ascii} control |
a6d5729aef1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27193
diff
changeset
|
439 characters and the delete character. |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
440 @item [:punct:] |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
441 This matches any punctuation character. (At present, for multibyte |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
442 characters, it matches anything that has non-word syntax.) |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
443 @item [:space:] |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
444 This matches any character that has whitespace syntax |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
445 (@pxref{Syntax Class Table}). |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
446 @item [:upper:] |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
447 This matches any upper-case letter, as determined by |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
448 the current case table (@pxref{Case Tables}). |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
449 @item [:word:] |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
450 This matches any character that has word syntax (@pxref{Syntax Class |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
451 Table}). |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
452 @item [:xdigit:] |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
453 This matches the hexadecimal digits: @samp{0} through @samp{9}, @samp{a} |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
454 through @samp{f} and @samp{A} through @samp{F}. |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
455 @end table |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
456 |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
457 @node Regexp Backslash |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
458 @subsubsection Backslash Constructs in Regular Expressions |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
459 |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
460 For the most part, @samp{\} followed by any character matches only |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
461 that character. However, there are several exceptions: certain |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
462 two-character sequences starting with @samp{\} that have special |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
463 meanings. (The character after the @samp{\} in such a sequence is |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
464 always ordinary when used on its own.) Here is a table of the special |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
465 @samp{\} constructs. |
6552 | 466 |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
467 @table @samp |
6552 | 468 @item \| |
469 @cindex @samp{|} in regexp | |
470 @cindex regexp alternative | |
471 specifies an alternative. | |
472 Two regular expressions @var{a} and @var{b} with @samp{\|} in | |
473 between form an expression that matches anything that either @var{a} or | |
474 @var{b} matches.@refill | |
475 | |
476 Thus, @samp{foo\|bar} matches either @samp{foo} or @samp{bar} | |
477 but no other string.@refill | |
478 | |
479 @samp{\|} applies to the largest possible surrounding expressions. Only a | |
480 surrounding @samp{\( @dots{} \)} grouping can limit the grouping power of | |
481 @samp{\|}.@refill | |
482 | |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
483 Full backtracking capability exists to handle multiple uses of |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
484 @samp{\|}, if you use the POSIX regular expression functions |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
485 (@pxref{POSIX Regexps}). |
6552 | 486 |
27780
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
487 @item \@{@var{m}\@} |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
488 is a postfix operator that repeats the previous pattern exactly @var{m} |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
489 times. Thus, @samp{x\@{5\@}} matches the string @samp{xxxxx} |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
490 and nothing else. @samp{c[ad]\@{3\@}r} matches string such as |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
491 @samp{caaar}, @samp{cdddr}, @samp{cadar}, and so on. |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
492 |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
493 @item \@{@var{m},@var{n}\@} |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
494 is more general postfix operator that specifies repetition with a |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
495 minimum of @var{m} repeats and a maximum of @var{n} repeats. If @var{m} |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
496 is omitted, the minimum is 0; if @var{n} is omitted, there is no |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
497 maximum. |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
498 |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
499 For example, @samp{c[ad]\@{1,2\@}r} matches the strings @samp{car}, |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
500 @samp{cdr}, @samp{caar}, @samp{cadr}, @samp{cdar}, and @samp{cddr}, and |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
501 nothing else.@* |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
502 @samp{\@{0,1\@}} or @samp{\@{,1\@}} is equivalent to @samp{?}. @* |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
503 @samp{\@{0,\@}} or @samp{\@{,\@}} is equivalent to @samp{*}. @* |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
504 @samp{\@{1,\@}} is equivalent to @samp{+}. |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
505 |
6552 | 506 @item \( @dots{} \) |
507 @cindex @samp{(} in regexp | |
508 @cindex @samp{)} in regexp | |
509 @cindex regexp grouping | |
510 is a grouping construct that serves three purposes: | |
511 | |
512 @enumerate | |
513 @item | |
16736
981e116b4ac6
Minor cleanups for overfull hboxes.
Richard M. Stallman <rms@gnu.org>
parents:
12805
diff
changeset
|
514 To enclose a set of @samp{\|} alternatives for other operations. Thus, |
981e116b4ac6
Minor cleanups for overfull hboxes.
Richard M. Stallman <rms@gnu.org>
parents:
12805
diff
changeset
|
515 the regular expression @samp{\(foo\|bar\)x} matches either @samp{foox} |
981e116b4ac6
Minor cleanups for overfull hboxes.
Richard M. Stallman <rms@gnu.org>
parents:
12805
diff
changeset
|
516 or @samp{barx}. |
6552 | 517 |
518 @item | |
17884
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
519 To enclose a complicated expression for the postfix operators @samp{*}, |
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
520 @samp{+} and @samp{?} to operate on. Thus, @samp{ba\(na\)*} matches |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
521 @samp{ba}, @samp{bana}, @samp{banana}, @samp{bananana}, etc., with any |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
522 number (zero or more) of @samp{na} strings. |
6552 | 523 |
524 @item | |
27780
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
525 To record a matched substring for future reference with |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
526 @samp{\@var{digit}} (see below). |
6552 | 527 @end enumerate |
528 | |
529 This last application is not a consequence of the idea of a | |
27780
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
530 parenthetical grouping; it is a separate feature that was assigned as a |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
531 second meaning to the same @samp{\( @dots{} \)} construct because, in |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
532 pratice, there was usually no conflict between the two meanings. But |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
533 occasionally there is a conflict, and that led to the introduction of |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
534 shy groups. |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
535 |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
536 @item \(?: @dots{} \) |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
537 is the @dfn{shy group} construct. A shy group serves the first two |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
538 purposes of an ordinary group (controlling the nesting of other |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
539 operators), but it does not get a number, so you cannot refer back to |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
540 its value with @samp{\@var{digit}}. |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
541 |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
542 Shy groups are particulary useful for mechanically-constructed regular |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
543 expressions because they can be added automatically without altering the |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
544 numbering of any ordinary, non-shy groups. |
6552 | 545 |
546 @item \@var{digit} | |
8427
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
547 matches the same text that matched the @var{digit}th occurrence of a |
6552 | 548 @samp{\( @dots{} \)} construct. |
549 | |
17884
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
550 In other words, after the end of a @samp{\( @dots{} \)} construct, the |
6552 | 551 matcher remembers the beginning and end of the text matched by that |
552 construct. Then, later on in the regular expression, you can use | |
553 @samp{\} followed by @var{digit} to match that same text, whatever it | |
554 may have been. | |
555 | |
556 The strings matching the first nine @samp{\( @dots{} \)} constructs | |
557 appearing in a regular expression are assigned numbers 1 through 9 in | |
558 the order that the open parentheses appear in the regular expression. | |
559 So you can use @samp{\1} through @samp{\9} to refer to the text matched | |
560 by the corresponding @samp{\( @dots{} \)} constructs. | |
561 | |
562 For example, @samp{\(.*\)\1} matches any newline-free string that is | |
563 composed of two identical halves. The @samp{\(.*\)} matches the first | |
564 half, which may be anything, but the @samp{\1} that follows must match | |
565 the same exact text. | |
566 | |
567 @item \w | |
568 @cindex @samp{\w} in regexp | |
569 matches any word-constituent character. The editor syntax table | |
570 determines which characters these are. @xref{Syntax Tables}. | |
571 | |
572 @item \W | |
573 @cindex @samp{\W} in regexp | |
8427
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
574 matches any character that is not a word constituent. |
6552 | 575 |
576 @item \s@var{code} | |
577 @cindex @samp{\s} in regexp | |
578 matches any character whose syntax is @var{code}. Here @var{code} is a | |
8427
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
579 character that represents a syntax code: thus, @samp{w} for word |
6552 | 580 constituent, @samp{-} for whitespace, @samp{(} for open parenthesis, |
21007
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
581 etc. To represent whitespace syntax, use either @samp{-} or a space |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
582 character. @xref{Syntax Class Table}, for a list of syntax codes and |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
583 the characters that stand for them. |
6552 | 584 |
585 @item \S@var{code} | |
586 @cindex @samp{\S} in regexp | |
587 matches any character whose syntax is not @var{code}. | |
588 @end table | |
589 | |
8427
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
590 The following regular expression constructs match the empty string---that is, |
6552 | 591 they don't use up any characters---but whether they match depends on the |
592 context. | |
593 | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
594 @table @samp |
6552 | 595 @item \` |
596 @cindex @samp{\`} in regexp | |
597 matches the empty string, but only at the beginning | |
598 of the buffer or string being matched against. | |
599 | |
600 @item \' | |
601 @cindex @samp{\'} in regexp | |
602 matches the empty string, but only at the end of | |
603 the buffer or string being matched against. | |
604 | |
605 @item \= | |
606 @cindex @samp{\=} in regexp | |
607 matches the empty string, but only at point. | |
608 (This construct is not defined when matching against a string.) | |
609 | |
610 @item \b | |
611 @cindex @samp{\b} in regexp | |
612 matches the empty string, but only at the beginning or | |
613 end of a word. Thus, @samp{\bfoo\b} matches any occurrence of | |
614 @samp{foo} as a separate word. @samp{\bballs?\b} matches | |
615 @samp{ball} or @samp{balls} as a separate word.@refill | |
616 | |
17884
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
617 @samp{\b} matches at the beginning or end of the buffer |
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
618 regardless of what text appears next to it. |
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
619 |
6552 | 620 @item \B |
621 @cindex @samp{\B} in regexp | |
622 matches the empty string, but @emph{not} at the beginning or | |
623 end of a word. | |
624 | |
625 @item \< | |
626 @cindex @samp{\<} in regexp | |
627 matches the empty string, but only at the beginning of a word. | |
17884
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
628 @samp{\<} matches at the beginning of the buffer only if a |
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
629 word-constituent character follows. |
6552 | 630 |
631 @item \> | |
632 @cindex @samp{\>} in regexp | |
17884
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
633 matches the empty string, but only at the end of a word. @samp{\>} |
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
634 matches at the end of the buffer only if the contents end with a |
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
635 word-constituent character. |
6552 | 636 @end table |
637 | |
638 @kindex invalid-regexp | |
639 Not every string is a valid regular expression. For example, a string | |
640 with unbalanced square brackets is invalid (with a few exceptions, such | |
8427
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
641 as @samp{[]]}), and so is a string that ends with a single @samp{\}. If |
6552 | 642 an invalid regular expression is passed to any of the search functions, |
643 an @code{invalid-regexp} error is signaled. | |
644 | |
645 @node Regexp Example | |
646 @comment node-name, next, previous, up | |
647 @subsection Complex Regexp Example | |
648 | |
649 Here is a complicated regexp, used by Emacs to recognize the end of a | |
650 sentence together with any whitespace that follows. It is the value of | |
651 the variable @code{sentence-end}. | |
652 | |
653 First, we show the regexp as a string in Lisp syntax to distinguish | |
654 spaces from tab characters. The string constant begins and ends with a | |
655 double-quote. @samp{\"} stands for a double-quote as part of the | |
656 string, @samp{\\} for a backslash as part of the string, @samp{\t} for a | |
657 tab and @samp{\n} for a newline. | |
658 | |
659 @example | |
660 "[.?!][]\"')@}]*\\($\\| $\\|\t\\| \\)[ \t\n]*" | |
661 @end example | |
662 | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
663 @noindent |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
664 In contrast, if you evaluate the variable @code{sentence-end}, you |
6552 | 665 will see the following: |
666 | |
667 @example | |
668 @group | |
669 sentence-end | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
670 @result{} "[.?!][]\"')@}]*\\($\\| $\\| \\| \\)[ |
6552 | 671 ]*" |
672 @end group | |
673 @end example | |
674 | |
675 @noindent | |
676 In this output, tab and newline appear as themselves. | |
677 | |
678 This regular expression contains four parts in succession and can be | |
679 deciphered as follows: | |
680 | |
681 @table @code | |
682 @item [.?!] | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
683 The first part of the pattern is a character alternative that matches |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
684 any one of three characters: period, question mark, and exclamation |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
685 mark. The match must begin with one of these three characters. |
6552 | 686 |
687 @item []\"')@}]* | |
688 The second part of the pattern matches any closing braces and quotation | |
689 marks, zero or more of them, that may follow the period, question mark | |
690 or exclamation mark. The @code{\"} is Lisp syntax for a double-quote in | |
691 a string. The @samp{*} at the end indicates that the immediately | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
692 preceding regular expression (a character alternative, in this case) may be |
6552 | 693 repeated zero or more times. |
694 | |
8469 | 695 @item \\($\\|@ $\\|\t\\|@ @ \\) |
6552 | 696 The third part of the pattern matches the whitespace that follows the |
21007
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
697 end of a sentence: the end of a line (optionally with a space), or a |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
698 tab, or two spaces. The double backslashes mark the parentheses and |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
699 vertical bars as regular expression syntax; the parentheses delimit a |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
700 group and the vertical bars separate alternatives. The dollar sign is |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
701 used to match the end of a line. |
6552 | 702 |
703 @item [ \t\n]* | |
704 Finally, the last part of the pattern matches any additional whitespace | |
705 beyond the minimum needed to end a sentence. | |
706 @end table | |
707 | |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
708 @node Regexp Functions |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
709 @subsection Regular Expression Functions |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
710 |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
711 These functions operate on regular expressions. |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
712 |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
713 @defun regexp-quote string |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
714 This function returns a regular expression whose only exact match is |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
715 @var{string}. Using this regular expression in @code{looking-at} will |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
716 succeed only if the next characters in the buffer are @var{string}; |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
717 using it in a search function will succeed if the text being searched |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
718 contains @var{string}. |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
719 |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
720 This allows you to request an exact string match or search when calling |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
721 a function that wants a regular expression. |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
722 |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
723 @example |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
724 @group |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
725 (regexp-quote "^The cat$") |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
726 @result{} "\\^The cat\\$" |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
727 @end group |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
728 @end example |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
729 |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
730 One use of @code{regexp-quote} is to combine an exact string match with |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
731 context described as a regular expression. For example, this searches |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
732 for the string that is the value of @var{string}, surrounded by |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
733 whitespace: |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
734 |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
735 @example |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
736 @group |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
737 (re-search-forward |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
738 (concat "\\s-" (regexp-quote string) "\\s-")) |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
739 @end group |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
740 @end example |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
741 @end defun |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
742 |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
743 @defun regexp-opt strings &optional paren |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
744 This function returns an efficient regular expression that will match |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
745 any of the strings @var{strings}. This is useful when you need to make |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
746 matching or searching as fast as possible---for example, for Font Lock |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
747 mode. |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
748 |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
749 If the optional argument @var{paren} is non-@code{nil}, then the |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
750 returned regular expression is always enclosed by at least one |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
751 parentheses-grouping construct. |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
752 |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
753 This simplified definition of @code{regexp-opt} produces a |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
754 regular expression which is equivalent to the actual value |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
755 (but not as efficient): |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
756 |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
757 @example |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
758 (defun regexp-opt (strings paren) |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
759 (let ((open-paren (if paren "\\(" "")) |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
760 (close-paren (if paren "\\)" ""))) |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
761 (concat open-paren |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
762 (mapconcat 'regexp-quote strings "\\|") |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
763 close-paren))) |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
764 @end example |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
765 @end defun |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
766 |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
767 @defun regexp-opt-depth regexp |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
768 This function returns the total number of grouping constructs |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
769 (parenthesized expressions) in @var{regexp}. |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
770 @end defun |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
771 |
6552 | 772 @node Regexp Search |
773 @section Regular Expression Searching | |
774 @cindex regular expression searching | |
775 @cindex regexp searching | |
776 @cindex searching for regexp | |
777 | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
778 In GNU Emacs, you can search for the next match for a regular |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
779 expression either incrementally or not. For incremental search |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
780 commands, see @ref{Regexp Search, , Regular Expression Search, emacs, |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
781 The GNU Emacs Manual}. Here we describe only the search functions |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
782 useful in programs. The principal one is @code{re-search-forward}. |
6552 | 783 |
21007
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
784 These search functions convert the regular expression to multibyte if |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
785 the buffer is multibyte; they convert the regular expression to unibyte |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
786 if the buffer is unibyte. @xref{Text Representations}. |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
787 |
6552 | 788 @deffn Command re-search-forward regexp &optional limit noerror repeat |
789 This function searches forward in the current buffer for a string of | |
790 text that is matched by the regular expression @var{regexp}. The | |
791 function skips over any amount of text that is not matched by | |
792 @var{regexp}, and leaves point at the end of the first match found. | |
793 It returns the new value of point. | |
794 | |
795 If @var{limit} is non-@code{nil} (it must be a position in the current | |
796 buffer), then it is the upper bound to the search. No match extending | |
797 after that position is accepted. | |
798 | |
21007
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
799 If @var{repeat} is supplied (it must be a positive number), then the |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
800 search is repeated that many times (each time starting at the end of the |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
801 previous time's match). If all these successive searches succeed, the |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
802 function succeeds, moving point and returning its new value. Otherwise |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
803 the function fails. |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
804 |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
805 What happens when the function fails depends on the value of |
6552 | 806 @var{noerror}. If @var{noerror} is @code{nil}, a @code{search-failed} |
807 error is signaled. If @var{noerror} is @code{t}, | |
808 @code{re-search-forward} does nothing and returns @code{nil}. If | |
809 @var{noerror} is neither @code{nil} nor @code{t}, then | |
810 @code{re-search-forward} moves point to @var{limit} (or the end of the | |
811 buffer) and returns @code{nil}. | |
812 | |
813 In the following example, point is initially before the @samp{T}. | |
814 Evaluating the search call moves point to the end of that line (between | |
815 the @samp{t} of @samp{hat} and the newline). | |
816 | |
817 @example | |
818 @group | |
819 ---------- Buffer: foo ---------- | |
820 I read "@point{}The cat in the hat | |
821 comes back" twice. | |
822 ---------- Buffer: foo ---------- | |
823 @end group | |
824 | |
825 @group | |
826 (re-search-forward "[a-z]+" nil t 5) | |
827 @result{} 27 | |
828 | |
829 ---------- Buffer: foo ---------- | |
830 I read "The cat in the hat@point{} | |
831 comes back" twice. | |
832 ---------- Buffer: foo ---------- | |
833 @end group | |
834 @end example | |
835 @end deffn | |
836 | |
837 @deffn Command re-search-backward regexp &optional limit noerror repeat | |
838 This function searches backward in the current buffer for a string of | |
839 text that is matched by the regular expression @var{regexp}, leaving | |
840 point at the beginning of the first text found. | |
841 | |
8469 | 842 This function is analogous to @code{re-search-forward}, but they are not |
843 simple mirror images. @code{re-search-forward} finds the match whose | |
844 beginning is as close as possible to the starting point. If | |
845 @code{re-search-backward} were a perfect mirror image, it would find the | |
846 match whose end is as close as possible. However, in fact it finds the | |
25089 | 847 match whose beginning is as close as possible. The reason for this is that |
8469 | 848 matching a regular expression at a given spot always works from |
849 beginning to end, and starts at a specified beginning position. | |
6552 | 850 |
851 A true mirror-image of @code{re-search-forward} would require a special | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
852 feature for matching regular expressions from end to beginning. It's |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
853 not worth the trouble of implementing that. |
6552 | 854 @end deffn |
855 | |
856 @defun string-match regexp string &optional start | |
857 This function returns the index of the start of the first match for | |
858 the regular expression @var{regexp} in @var{string}, or @code{nil} if | |
859 there is no match. If @var{start} is non-@code{nil}, the search starts | |
860 at that index in @var{string}. | |
861 | |
862 For example, | |
863 | |
864 @example | |
865 @group | |
866 (string-match | |
867 "quick" "The quick brown fox jumped quickly.") | |
868 @result{} 4 | |
869 @end group | |
870 @group | |
871 (string-match | |
872 "quick" "The quick brown fox jumped quickly." 8) | |
873 @result{} 27 | |
874 @end group | |
875 @end example | |
876 | |
877 @noindent | |
878 The index of the first character of the | |
879 string is 0, the index of the second character is 1, and so on. | |
880 | |
881 After this function returns, the index of the first character beyond | |
882 the match is available as @code{(match-end 0)}. @xref{Match Data}. | |
883 | |
884 @example | |
885 @group | |
886 (string-match | |
887 "quick" "The quick brown fox jumped quickly." 8) | |
888 @result{} 27 | |
889 @end group | |
890 | |
891 @group | |
892 (match-end 0) | |
893 @result{} 32 | |
894 @end group | |
895 @end example | |
896 @end defun | |
897 | |
898 @defun looking-at regexp | |
899 This function determines whether the text in the current buffer directly | |
900 following point matches the regular expression @var{regexp}. ``Directly | |
901 following'' means precisely that: the search is ``anchored'' and it can | |
902 succeed only starting with the first character following point. The | |
903 result is @code{t} if so, @code{nil} otherwise. | |
904 | |
905 This function does not move point, but it updates the match data, which | |
906 you can access using @code{match-beginning} and @code{match-end}. | |
907 @xref{Match Data}. | |
908 | |
909 In this example, point is located directly before the @samp{T}. If it | |
910 were anywhere else, the result would be @code{nil}. | |
911 | |
912 @example | |
913 @group | |
914 ---------- Buffer: foo ---------- | |
915 I read "@point{}The cat in the hat | |
916 comes back" twice. | |
917 ---------- Buffer: foo ---------- | |
918 | |
919 (looking-at "The cat in the hat$") | |
920 @result{} t | |
921 @end group | |
922 @end example | |
923 @end defun | |
924 | |
12067 | 925 @node POSIX Regexps |
926 @section POSIX Regular Expression Searching | |
927 | |
928 The usual regular expression functions do backtracking when necessary | |
929 to handle the @samp{\|} and repetition constructs, but they continue | |
930 this only until they find @emph{some} match. Then they succeed and | |
931 report the first match found. | |
932 | |
933 This section describes alternative search functions which perform the | |
934 full backtracking specified by the POSIX standard for regular expression | |
935 matching. They continue backtracking until they have tried all | |
936 possibilities and found all matches, so they can report the longest | |
937 match, as required by POSIX. This is much slower, so use these | |
938 functions only when you really need the longest match. | |
939 | |
940 @defun posix-search-forward regexp &optional limit noerror repeat | |
941 This is like @code{re-search-forward} except that it performs the full | |
942 backtracking specified by the POSIX standard for regular expression | |
943 matching. | |
944 @end defun | |
945 | |
946 @defun posix-search-backward regexp &optional limit noerror repeat | |
947 This is like @code{re-search-backward} except that it performs the full | |
948 backtracking specified by the POSIX standard for regular expression | |
949 matching. | |
950 @end defun | |
951 | |
952 @defun posix-looking-at regexp | |
953 This is like @code{looking-at} except that it performs the full | |
954 backtracking specified by the POSIX standard for regular expression | |
955 matching. | |
956 @end defun | |
957 | |
958 @defun posix-string-match regexp string &optional start | |
959 This is like @code{string-match} except that it performs the full | |
960 backtracking specified by the POSIX standard for regular expression | |
961 matching. | |
962 @end defun | |
963 | |
6552 | 964 @ignore |
965 @deffn Command delete-matching-lines regexp | |
966 This function is identical to @code{delete-non-matching-lines}, save | |
967 that it deletes what @code{delete-non-matching-lines} keeps. | |
968 | |
969 In the example below, point is located on the first line of text. | |
970 | |
971 @example | |
972 @group | |
973 ---------- Buffer: foo ---------- | |
974 We hold these truths | |
975 to be self-evident, | |
976 that all men are created | |
977 equal, and that they are | |
978 ---------- Buffer: foo ---------- | |
979 @end group | |
980 | |
981 @group | |
982 (delete-matching-lines "the") | |
983 @result{} nil | |
984 | |
985 ---------- Buffer: foo ---------- | |
986 to be self-evident, | |
987 that all men are created | |
988 ---------- Buffer: foo ---------- | |
989 @end group | |
990 @end example | |
991 @end deffn | |
992 | |
993 @deffn Command flush-lines regexp | |
994 This function is the same as @code{delete-matching-lines}. | |
995 @end deffn | |
996 | |
997 @defun delete-non-matching-lines regexp | |
998 This function deletes all lines following point which don't | |
999 contain a match for the regular expression @var{regexp}. | |
1000 @end defun | |
1001 | |
1002 @deffn Command keep-lines regexp | |
1003 This function is the same as @code{delete-non-matching-lines}. | |
1004 @end deffn | |
1005 | |
1006 @deffn Command how-many regexp | |
1007 This function counts the number of matches for @var{regexp} there are in | |
1008 the current buffer following point. It prints this number in | |
1009 the echo area, returning the string printed. | |
1010 @end deffn | |
1011 | |
1012 @deffn Command count-matches regexp | |
1013 This function is a synonym of @code{how-many}. | |
1014 @end deffn | |
1015 | |
26288 | 1016 @deffn Command list-matching-lines regexp &optional nlines |
6552 | 1017 This function is a synonym of @code{occur}. |
1018 Show all lines following point containing a match for @var{regexp}. | |
1019 Display each line with @var{nlines} lines before and after, | |
1020 or @code{-}@var{nlines} before if @var{nlines} is negative. | |
1021 @var{nlines} defaults to @code{list-matching-lines-default-context-lines}. | |
1022 Interactively it is the prefix arg. | |
1023 | |
1024 The lines are shown in a buffer named @samp{*Occur*}. | |
1025 It serves as a menu to find any of the occurrences in this buffer. | |
24934 | 1026 @kbd{C-h m} (@code{describe-mode}) in that buffer gives help. |
6552 | 1027 @end deffn |
1028 | |
1029 @defopt list-matching-lines-default-context-lines | |
1030 Default value is 0. | |
1031 Default number of context lines to include around a @code{list-matching-lines} | |
1032 match. A negative number means to include that many lines before the match. | |
1033 A positive number means to include that many lines both before and after. | |
1034 @end defopt | |
1035 @end ignore | |
1036 | |
1037 @node Search and Replace | |
1038 @section Search and Replace | |
1039 @cindex replacement | |
1040 | |
1041 @defun perform-replace from-string replacements query-flag regexp-flag delimited-flag &optional repeat-count map | |
1042 This function is the guts of @code{query-replace} and related commands. | |
1043 It searches for occurrences of @var{from-string} and replaces some or | |
1044 all of them. If @var{query-flag} is @code{nil}, it replaces all | |
1045 occurrences; otherwise, it asks the user what to do about each one. | |
1046 | |
1047 If @var{regexp-flag} is non-@code{nil}, then @var{from-string} is | |
1048 considered a regular expression; otherwise, it must match literally. If | |
1049 @var{delimited-flag} is non-@code{nil}, then only replacements | |
1050 surrounded by word boundaries are considered. | |
1051 | |
1052 The argument @var{replacements} specifies what to replace occurrences | |
1053 with. If it is a string, that string is used. It can also be a list of | |
1054 strings, to be used in cyclic order. | |
1055 | |
26783 | 1056 If @var{replacements} is a cons cell, @code{(@var{function} |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
1057 . @var{data})}, this means to call @var{function} after each match to |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
1058 get the replacement text. This function is called with two arguments: |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
1059 @var{data}, and the number of replacements already made. |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
1060 |
12282
586e3ea81792
updates for version 19.29 made by melissa; also needed to check out files
Melissa Weisshaus <melissa@gnu.org>
parents:
12125
diff
changeset
|
1061 If @var{repeat-count} is non-@code{nil}, it should be an integer. Then |
586e3ea81792
updates for version 19.29 made by melissa; also needed to check out files
Melissa Weisshaus <melissa@gnu.org>
parents:
12125
diff
changeset
|
1062 it specifies how many times to use each of the strings in the |
586e3ea81792
updates for version 19.29 made by melissa; also needed to check out files
Melissa Weisshaus <melissa@gnu.org>
parents:
12125
diff
changeset
|
1063 @var{replacements} list before advancing cyclicly to the next one. |
6552 | 1064 |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
1065 If @var{from-string} contains upper-case letters, then |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
1066 @code{perform-replace} binds @code{case-fold-search} to @code{nil}, and |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
1067 it uses the @code{replacements} without altering the case of them. |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
1068 |
6552 | 1069 Normally, the keymap @code{query-replace-map} defines the possible user |
8469 | 1070 responses for queries. The argument @var{map}, if non-@code{nil}, is a |
1071 keymap to use instead of @code{query-replace-map}. | |
6552 | 1072 @end defun |
1073 | |
1074 @defvar query-replace-map | |
1075 This variable holds a special keymap that defines the valid user | |
1076 responses for @code{query-replace} and related functions, as well as | |
1077 @code{y-or-n-p} and @code{map-y-or-n-p}. It is unusual in two ways: | |
1078 | |
1079 @itemize @bullet | |
1080 @item | |
1081 The ``key bindings'' are not commands, just symbols that are meaningful | |
1082 to the functions that use this map. | |
1083 | |
1084 @item | |
21007
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
1085 Prefix keys are not supported; each key binding must be for a |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
1086 single-event key sequence. This is because the functions don't use |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
1087 @code{read-key-sequence} to get the input; instead, they read a single |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
1088 event and look it up ``by hand.'' |
6552 | 1089 @end itemize |
1090 @end defvar | |
1091 | |
1092 Here are the meaningful ``bindings'' for @code{query-replace-map}. | |
1093 Several of them are meaningful only for @code{query-replace} and | |
1094 friends. | |
1095 | |
1096 @table @code | |
1097 @item act | |
1098 Do take the action being considered---in other words, ``yes.'' | |
1099 | |
1100 @item skip | |
1101 Do not take action for this question---in other words, ``no.'' | |
1102 | |
1103 @item exit | |
8469 | 1104 Answer this question ``no,'' and give up on the entire series of |
1105 questions, assuming that the answers will be ``no.'' | |
6552 | 1106 |
1107 @item act-and-exit | |
8469 | 1108 Answer this question ``yes,'' and give up on the entire series of |
1109 questions, assuming that subsequent answers will be ``no.'' | |
6552 | 1110 |
1111 @item act-and-show | |
1112 Answer this question ``yes,'' but show the results---don't advance yet | |
1113 to the next question. | |
1114 | |
1115 @item automatic | |
1116 Answer this question and all subsequent questions in the series with | |
1117 ``yes,'' without further user interaction. | |
1118 | |
1119 @item backup | |
1120 Move back to the previous place that a question was asked about. | |
1121 | |
1122 @item edit | |
1123 Enter a recursive edit to deal with this question---instead of any | |
1124 other action that would normally be taken. | |
1125 | |
1126 @item delete-and-edit | |
1127 Delete the text being considered, then enter a recursive edit to replace | |
1128 it. | |
1129 | |
1130 @item recenter | |
1131 Redisplay and center the window, then ask the same question again. | |
1132 | |
1133 @item quit | |
1134 Perform a quit right away. Only @code{y-or-n-p} and related functions | |
1135 use this answer. | |
1136 | |
1137 @item help | |
1138 Display some help, then ask again. | |
1139 @end table | |
1140 | |
1141 @node Match Data | |
1142 @section The Match Data | |
1143 @cindex match data | |
1144 | |
25089 | 1145 Emacs keeps track of the start and end positions of the segments of |
6552 | 1146 text found during a regular expression search. This means, for example, |
1147 that you can search for a complex pattern, such as a date in an Rmail | |
1148 message, and then extract parts of the match under control of the | |
1149 pattern. | |
1150 | |
1151 Because the match data normally describe the most recent search only, | |
1152 you must be careful not to do another search inadvertently between the | |
1153 search you wish to refer back to and the use of the match data. If you | |
1154 can't avoid another intervening search, you must save and restore the | |
1155 match data around it, to prevent it from being overwritten. | |
1156 | |
1157 @menu | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1158 * Replacing Match:: Replacing a substring that was matched. |
6552 | 1159 * Simple Match Data:: Accessing single items of match data, |
1160 such as where a particular subexpression started. | |
1161 * Entire Match Data:: Accessing the entire match data at once, as a list. | |
1162 * Saving Match Data:: Saving and restoring the match data. | |
1163 @end menu | |
1164 | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1165 @node Replacing Match |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
1166 @subsection Replacing the Text that Matched |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1167 |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1168 This function replaces the text matched by the last search with |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1169 @var{replacement}. |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1170 |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1171 @cindex case in replacements |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1172 @defun replace-match replacement &optional fixedcase literal string subexp |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1173 This function replaces the text in the buffer (or in @var{string}) that |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1174 was matched by the last search. It replaces that text with |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1175 @var{replacement}. |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1176 |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1177 If you did the last search in a buffer, you should specify @code{nil} |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1178 for @var{string}. Then @code{replace-match} does the replacement by |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1179 editing the buffer; it leaves point at the end of the replacement text, |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1180 and returns @code{t}. |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1181 |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1182 If you did the search in a string, pass the same string as @var{string}. |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1183 Then @code{replace-match} does the replacement by constructing and |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1184 returning a new string. |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1185 |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1186 If @var{fixedcase} is non-@code{nil}, then the case of the replacement |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1187 text is not changed; otherwise, the replacement text is converted to a |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1188 different case depending upon the capitalization of the text to be |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1189 replaced. If the original text is all upper case, the replacement text |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1190 is converted to upper case. If the first word of the original text is |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1191 capitalized, then the first word of the replacement text is capitalized. |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1192 If the original text contains just one word, and that word is a capital |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1193 letter, @code{replace-match} considers this a capitalized first word |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1194 rather than all upper case. |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1195 |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1196 If @var{literal} is non-@code{nil}, then @var{replacement} is inserted |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1197 exactly as it is, the only alterations being case changes as needed. |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1198 If it is @code{nil} (the default), then the character @samp{\} is treated |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1199 specially. If a @samp{\} appears in @var{replacement}, then it must be |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1200 part of one of the following sequences: |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1201 |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1202 @table @asis |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1203 @item @samp{\&} |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1204 @cindex @samp{&} in replacement |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1205 @samp{\&} stands for the entire text being replaced. |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1206 |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1207 @item @samp{\@var{n}} |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1208 @cindex @samp{\@var{n}} in replacement |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1209 @samp{\@var{n}}, where @var{n} is a digit, stands for the text that |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1210 matched the @var{n}th subexpression in the original regexp. |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1211 Subexpressions are those expressions grouped inside @samp{\(@dots{}\)}. |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1212 |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1213 @item @samp{\\} |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1214 @cindex @samp{\} in replacement |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1215 @samp{\\} stands for a single @samp{\} in the replacement text. |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1216 @end table |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1217 |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1218 If @var{subexp} is non-@code{nil}, that says to replace just |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1219 subexpression number @var{subexp} of the regexp that was matched, not |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1220 the entire match. For example, after matching @samp{foo \(ba*r\)}, |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1221 calling @code{replace-match} with 1 as @var{subexp} means to replace |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1222 just the text that matched @samp{\(ba*r\)}. |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1223 @end defun |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1224 |
6552 | 1225 @node Simple Match Data |
1226 @subsection Simple Match Data Access | |
1227 | |
12067 | 1228 This section explains how to use the match data to find out what was |
1229 matched by the last search or match operation. | |
1230 | |
1231 You can ask about the entire matching text, or about a particular | |
1232 parenthetical subexpression of a regular expression. The @var{count} | |
1233 argument in the functions below specifies which. If @var{count} is | |
1234 zero, you are asking about the entire match. If @var{count} is | |
1235 positive, it specifies which subexpression you want. | |
1236 | |
1237 Recall that the subexpressions of a regular expression are those | |
1238 expressions grouped with escaped parentheses, @samp{\(@dots{}\)}. The | |
1239 @var{count}th subexpression is found by counting occurrences of | |
1240 @samp{\(} from the beginning of the whole regular expression. The first | |
1241 subexpression is numbered 1, the second 2, and so on. Only regular | |
1242 expressions can have subexpressions---after a simple string search, the | |
1243 only information available is about the entire match. | |
1244 | |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1245 A search which fails may or may not alter the match data. In the |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1246 past, a failing search did not do this, but we may change it in the |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1247 future. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1248 |
12067 | 1249 @defun match-string count &optional in-string |
1250 This function returns, as a string, the text matched in the last search | |
1251 or match operation. It returns the entire text if @var{count} is zero, | |
1252 or just the portion corresponding to the @var{count}th parenthetical | |
1253 subexpression, if @var{count} is positive. If @var{count} is out of | |
12098 | 1254 range, or if that subexpression didn't match anything, the value is |
1255 @code{nil}. | |
12067 | 1256 |
1257 If the last such operation was done against a string with | |
1258 @code{string-match}, then you should pass the same string as the | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1259 argument @var{in-string}. After a buffer search or match, |
12067 | 1260 you should omit @var{in-string} or pass @code{nil} for it; but you |
1261 should make sure that the current buffer when you call | |
1262 @code{match-string} is the one in which you did the searching or | |
1263 matching. | |
1264 @end defun | |
6552 | 1265 |
26288 | 1266 @defun match-string-no-properties count &optional in-string |
21007
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
1267 This function is like @code{match-string} except that the result |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
1268 has no text properties. |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
1269 @end defun |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
1270 |
6552 | 1271 @defun match-beginning count |
1272 This function returns the position of the start of text matched by the | |
1273 last regular expression searched for, or a subexpression of it. | |
1274 | |
8469 | 1275 If @var{count} is zero, then the value is the position of the start of |
12125
995be67f3fd1
updates for version 19.29.
Melissa Weisshaus <melissa@gnu.org>
parents:
12098
diff
changeset
|
1276 the entire match. Otherwise, @var{count} specifies a subexpression in |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1277 the regular expression, and the value of the function is the starting |
12067 | 1278 position of the match for that subexpression. |
6552 | 1279 |
12067 | 1280 The value is @code{nil} for a subexpression inside a @samp{\|} |
1281 alternative that wasn't used in the match. | |
6552 | 1282 @end defun |
1283 | |
1284 @defun match-end count | |
12067 | 1285 This function is like @code{match-beginning} except that it returns the |
1286 position of the end of the match, rather than the position of the | |
1287 beginning. | |
6552 | 1288 @end defun |
1289 | |
1290 Here is an example of using the match data, with a comment showing the | |
1291 positions within the text: | |
1292 | |
1293 @example | |
1294 @group | |
1295 (string-match "\\(qu\\)\\(ick\\)" | |
1296 "The quick fox jumped quickly.") | |
1297 ;0123456789 | |
1298 @result{} 4 | |
1299 @end group | |
1300 | |
1301 @group | |
12067 | 1302 (match-string 0 "The quick fox jumped quickly.") |
1303 @result{} "quick" | |
1304 (match-string 1 "The quick fox jumped quickly.") | |
1305 @result{} "qu" | |
1306 (match-string 2 "The quick fox jumped quickly.") | |
1307 @result{} "ick" | |
1308 @end group | |
1309 | |
1310 @group | |
6552 | 1311 (match-beginning 1) ; @r{The beginning of the match} |
1312 @result{} 4 ; @r{with @samp{qu} is at index 4.} | |
1313 @end group | |
1314 | |
1315 @group | |
1316 (match-beginning 2) ; @r{The beginning of the match} | |
1317 @result{} 6 ; @r{with @samp{ick} is at index 6.} | |
1318 @end group | |
1319 | |
1320 @group | |
1321 (match-end 1) ; @r{The end of the match} | |
1322 @result{} 6 ; @r{with @samp{qu} is at index 6.} | |
1323 | |
1324 (match-end 2) ; @r{The end of the match} | |
1325 @result{} 9 ; @r{with @samp{ick} is at index 9.} | |
1326 @end group | |
1327 @end example | |
1328 | |
1329 Here is another example. Point is initially located at the beginning | |
1330 of the line. Searching moves point to between the space and the word | |
1331 @samp{in}. The beginning of the entire match is at the 9th character of | |
1332 the buffer (@samp{T}), and the beginning of the match for the first | |
1333 subexpression is at the 13th character (@samp{c}). | |
1334 | |
1335 @example | |
1336 @group | |
1337 (list | |
1338 (re-search-forward "The \\(cat \\)") | |
1339 (match-beginning 0) | |
1340 (match-beginning 1)) | |
8469 | 1341 @result{} (9 9 13) |
6552 | 1342 @end group |
1343 | |
1344 @group | |
1345 ---------- Buffer: foo ---------- | |
1346 I read "The cat @point{}in the hat comes back" twice. | |
1347 ^ ^ | |
1348 9 13 | |
1349 ---------- Buffer: foo ---------- | |
1350 @end group | |
1351 @end example | |
1352 | |
1353 @noindent | |
1354 (In this case, the index returned is a buffer position; the first | |
1355 character of the buffer counts as 1.) | |
1356 | |
1357 @node Entire Match Data | |
1358 @subsection Accessing the Entire Match Data | |
1359 | |
1360 The functions @code{match-data} and @code{set-match-data} read or | |
1361 write the entire match data, all at once. | |
1362 | |
1363 @defun match-data | |
1364 This function returns a newly constructed list containing all the | |
1365 information on what text the last search matched. Element zero is the | |
1366 position of the beginning of the match for the whole expression; element | |
1367 one is the position of the end of the match for the expression. The | |
1368 next two elements are the positions of the beginning and end of the | |
1369 match for the first subexpression, and so on. In general, element | |
27193 | 1370 @ifnottex |
6552 | 1371 number 2@var{n} |
27193 | 1372 @end ifnottex |
6552 | 1373 @tex |
1374 number {\mathsurround=0pt $2n$} | |
1375 @end tex | |
1376 corresponds to @code{(match-beginning @var{n})}; and | |
1377 element | |
27193 | 1378 @ifnottex |
6552 | 1379 number 2@var{n} + 1 |
27193 | 1380 @end ifnottex |
6552 | 1381 @tex |
1382 number {\mathsurround=0pt $2n+1$} | |
1383 @end tex | |
1384 corresponds to @code{(match-end @var{n})}. | |
1385 | |
1386 All the elements are markers or @code{nil} if matching was done on a | |
1387 buffer, and all are integers or @code{nil} if matching was done on a | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1388 string with @code{string-match}. |
6552 | 1389 |
1390 As always, there must be no possibility of intervening searches between | |
1391 the call to a search function and the call to @code{match-data} that is | |
1392 intended to access the match data for that search. | |
1393 | |
1394 @example | |
1395 @group | |
1396 (match-data) | |
1397 @result{} (#<marker at 9 in foo> | |
1398 #<marker at 17 in foo> | |
1399 #<marker at 13 in foo> | |
1400 #<marker at 17 in foo>) | |
1401 @end group | |
1402 @end example | |
1403 @end defun | |
1404 | |
1405 @defun set-match-data match-list | |
1406 This function sets the match data from the elements of @var{match-list}, | |
1407 which should be a list that was the value of a previous call to | |
1408 @code{match-data}. | |
1409 | |
1410 If @var{match-list} refers to a buffer that doesn't exist, you don't get | |
1411 an error; that sets the match data in a meaningless but harmless way. | |
1412 | |
1413 @findex store-match-data | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1414 @code{store-match-data} is a semi-obsolete alias for @code{set-match-data}. |
6552 | 1415 @end defun |
1416 | |
1417 @node Saving Match Data | |
1418 @subsection Saving and Restoring the Match Data | |
1419 | |
10038
6b8e51b286c6
Filters and sentinels now save match data themselves.
Richard M. Stallman <rms@gnu.org>
parents:
8469
diff
changeset
|
1420 When you call a function that may do a search, you may need to save |
6b8e51b286c6
Filters and sentinels now save match data themselves.
Richard M. Stallman <rms@gnu.org>
parents:
8469
diff
changeset
|
1421 and restore the match data around that call, if you want to preserve the |
6b8e51b286c6
Filters and sentinels now save match data themselves.
Richard M. Stallman <rms@gnu.org>
parents:
8469
diff
changeset
|
1422 match data from an earlier search for later use. Here is an example |
6b8e51b286c6
Filters and sentinels now save match data themselves.
Richard M. Stallman <rms@gnu.org>
parents:
8469
diff
changeset
|
1423 that shows the problem that arises if you fail to save the match data: |
6552 | 1424 |
1425 @example | |
1426 @group | |
1427 (re-search-forward "The \\(cat \\)") | |
1428 @result{} 48 | |
1429 (foo) ; @r{Perhaps @code{foo} does} | |
1430 ; @r{more searching.} | |
1431 (match-end 0) | |
1432 @result{} 61 ; @r{Unexpected result---not 48!} | |
1433 @end group | |
1434 @end example | |
1435 | |
10038
6b8e51b286c6
Filters and sentinels now save match data themselves.
Richard M. Stallman <rms@gnu.org>
parents:
8469
diff
changeset
|
1436 You can save and restore the match data with @code{save-match-data}: |
6552 | 1437 |
12098 | 1438 @defmac save-match-data body@dots{} |
22252
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
1439 This macro executes @var{body}, saving and restoring the match |
10038
6b8e51b286c6
Filters and sentinels now save match data themselves.
Richard M. Stallman <rms@gnu.org>
parents:
8469
diff
changeset
|
1440 data around it. |
12098 | 1441 @end defmac |
6552 | 1442 |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1443 You could use @code{set-match-data} together with @code{match-data} to |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1444 imitate the effect of the special form @code{save-match-data}. Here is |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1445 how: |
6552 | 1446 |
1447 @example | |
1448 @group | |
1449 (let ((data (match-data))) | |
1450 (unwind-protect | |
21007
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
1451 @dots{} ; @r{Ok to change the original match data.} |
6552 | 1452 (set-match-data data))) |
1453 @end group | |
1454 @end example | |
1455 | |
10038
6b8e51b286c6
Filters and sentinels now save match data themselves.
Richard M. Stallman <rms@gnu.org>
parents:
8469
diff
changeset
|
1456 Emacs automatically saves and restores the match data when it runs |
6b8e51b286c6
Filters and sentinels now save match data themselves.
Richard M. Stallman <rms@gnu.org>
parents:
8469
diff
changeset
|
1457 process filter functions (@pxref{Filter Functions}) and process |
6b8e51b286c6
Filters and sentinels now save match data themselves.
Richard M. Stallman <rms@gnu.org>
parents:
8469
diff
changeset
|
1458 sentinels (@pxref{Sentinels}). |
6b8e51b286c6
Filters and sentinels now save match data themselves.
Richard M. Stallman <rms@gnu.org>
parents:
8469
diff
changeset
|
1459 |
6552 | 1460 @ignore |
1461 Here is a function which restores the match data provided the buffer | |
1462 associated with it still exists. | |
1463 | |
1464 @smallexample | |
1465 @group | |
1466 (defun restore-match-data (data) | |
1467 @c It is incorrect to split the first line of a doc string. | |
1468 @c If there's a problem here, it should be solved in some other way. | |
1469 "Restore the match data DATA unless the buffer is missing." | |
1470 (catch 'foo | |
1471 (let ((d data)) | |
1472 @end group | |
1473 (while d | |
1474 (and (car d) | |
1475 (null (marker-buffer (car d))) | |
1476 @group | |
1477 ;; @file{match-data} @r{buffer is deleted.} | |
1478 (throw 'foo nil)) | |
1479 (setq d (cdr d))) | |
1480 (set-match-data data)))) | |
1481 @end group | |
1482 @end smallexample | |
1483 @end ignore | |
1484 | |
1485 @node Searching and Case | |
1486 @section Searching and Case | |
1487 @cindex searching and case | |
1488 | |
1489 By default, searches in Emacs ignore the case of the text they are | |
1490 searching through; if you specify searching for @samp{FOO}, then | |
21007
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
1491 @samp{Foo} or @samp{foo} is also considered a match. This applies to |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
1492 regular expressions, too; thus, @samp{[aB]} would match @samp{a} or |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
1493 @samp{A} or @samp{b} or @samp{B}. |
6552 | 1494 |
1495 If you do not want this feature, set the variable | |
1496 @code{case-fold-search} to @code{nil}. Then all letters must match | |
8469 | 1497 exactly, including case. This is a buffer-local variable; altering the |
1498 variable affects only the current buffer. (@xref{Intro to | |
6552 | 1499 Buffer-Local}.) Alternatively, you may change the value of |
1500 @code{default-case-fold-search}, which is the default value of | |
1501 @code{case-fold-search} for buffers that do not override it. | |
1502 | |
1503 Note that the user-level incremental search feature handles case | |
1504 distinctions differently. When given a lower case letter, it looks for | |
1505 a match of either case, but when given an upper case letter, it looks | |
1506 for an upper case letter only. But this has nothing to do with the | |
21007
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
1507 searching functions used in Lisp code. |
6552 | 1508 |
1509 @defopt case-replace | |
8469 | 1510 This variable determines whether the replacement functions should |
1511 preserve case. If the variable is @code{nil}, that means to use the | |
1512 replacement text verbatim. A non-@code{nil} value means to convert the | |
1513 case of the replacement text according to the text being replaced. | |
1514 | |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
1515 This variable is used by passing it as an argument to the function |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
1516 @code{replace-match}. @xref{Replacing Match}. |
6552 | 1517 @end defopt |
1518 | |
1519 @defopt case-fold-search | |
1520 This buffer-local variable determines whether searches should ignore | |
1521 case. If the variable is @code{nil} they do not ignore case; otherwise | |
1522 they do ignore case. | |
1523 @end defopt | |
1524 | |
1525 @defvar default-case-fold-search | |
1526 The value of this variable is the default value for | |
1527 @code{case-fold-search} in buffers that do not override it. This is the | |
1528 same as @code{(default-value 'case-fold-search)}. | |
1529 @end defvar | |
1530 | |
1531 @node Standard Regexps | |
1532 @section Standard Regular Expressions Used in Editing | |
1533 @cindex regexps used standardly in editing | |
1534 @cindex standard regexps used in editing | |
1535 | |
1536 This section describes some variables that hold regular expressions | |
1537 used for certain purposes in editing: | |
1538 | |
1539 @defvar page-delimiter | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1540 This is the regular expression describing line-beginnings that separate |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1541 pages. The default value is @code{"^\014"} (i.e., @code{"^^L"} or |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1542 @code{"^\C-l"}); this matches a line that starts with a formfeed |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1543 character. |
6552 | 1544 @end defvar |
1545 | |
12067 | 1546 The following two regular expressions should @emph{not} assume the |
1547 match always starts at the beginning of a line; they should not use | |
1548 @samp{^} to anchor the match. Most often, the paragraph commands do | |
1549 check for a match only at the beginning of a line, which means that | |
12098 | 1550 @samp{^} would be superfluous. When there is a nonzero left margin, |
1551 they accept matches that start after the left margin. In that case, a | |
1552 @samp{^} would be incorrect. However, a @samp{^} is harmless in modes | |
1553 where a left margin is never used. | |
12067 | 1554 |
6552 | 1555 @defvar paragraph-separate |
1556 This is the regular expression for recognizing the beginning of a line | |
1557 that separates paragraphs. (If you change this, you may have to | |
8469 | 1558 change @code{paragraph-start} also.) The default value is |
12067 | 1559 @w{@code{"[@ \t\f]*$"}}, which matches a line that consists entirely of |
1560 spaces, tabs, and form feeds (after its left margin). | |
6552 | 1561 @end defvar |
1562 | |
1563 @defvar paragraph-start | |
1564 This is the regular expression for recognizing the beginning of a line | |
1565 that starts @emph{or} separates paragraphs. The default value is | |
12067 | 1566 @w{@code{"[@ \t\n\f]"}}, which matches a line starting with a space, tab, |
1567 newline, or form feed (after its left margin). | |
6552 | 1568 @end defvar |
1569 | |
1570 @defvar sentence-end | |
1571 This is the regular expression describing the end of a sentence. (All | |
1572 paragraph boundaries also end sentences, regardless.) The default value | |
1573 is: | |
1574 | |
1575 @example | |
8469 | 1576 "[.?!][]\"')@}]*\\($\\| $\\|\t\\| \\)[ \t\n]*" |
6552 | 1577 @end example |
1578 | |
8469 | 1579 This means a period, question mark or exclamation mark, followed |
1580 optionally by a closing parenthetical character, followed by tabs, | |
1581 spaces or new lines. | |
6552 | 1582 |
1583 For a detailed explanation of this regular expression, see @ref{Regexp | |
1584 Example}. | |
1585 @end defvar |