Mercurial > emacs
annotate lispref/searching.texi @ 49645:4e94855c037e
Change dates for the entries concerning the 2.0.29 Tramp commit such
that they all reflect the commit date, instead of the date of the
individual changes.
This is deemed better than keeping the original change date because
it makes sure that the ChangeLog dates have more or less sequential
order.
author | Kai Großjohann <kgrossjo@eu.uu.net> |
---|---|
date | Fri, 07 Feb 2003 17:53:05 +0000 |
parents | 23a1cea22d13 |
children | 4556482b5d22 d7ddb3e565de |
rev | line source |
---|---|
6552 | 1 @c -*-texinfo-*- |
2 @c This is part of the GNU Emacs Lisp Reference Manual. | |
27189 | 3 @c Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995, 1998, 1999 |
49600
23a1cea22d13
Trailing whitespace deleted.
Juanma Barranquero <lekktu@gmail.com>
parents:
48701
diff
changeset
|
4 @c Free Software Foundation, Inc. |
6552 | 5 @c See the file elisp.texi for copying conditions. |
6 @setfilename ../info/searching | |
21007
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
7 @node Searching and Matching, Syntax Tables, Non-ASCII Characters, Top |
6552 | 8 @chapter Searching and Matching |
9 @cindex searching | |
10 | |
11 GNU Emacs provides two ways to search through a buffer for specified | |
12 text: exact string searches and regular expression searches. After a | |
13 regular expression search, you can examine the @dfn{match data} to | |
14 determine which text matched the whole regular expression or various | |
15 portions of it. | |
16 | |
17 @menu | |
18 * String Search:: Search for an exact match. | |
19 * Regular Expressions:: Describing classes of strings. | |
20 * Regexp Search:: Searching for a match for a regexp. | |
12067 | 21 * POSIX Regexps:: Searching POSIX-style for the longest match. |
6552 | 22 * Search and Replace:: Internals of @code{query-replace}. |
23 * Match Data:: Finding out which part of the text matched | |
24 various parts of a regexp, after regexp search. | |
25 * Searching and Case:: Case-independent or case-significant searching. | |
26 * Standard Regexps:: Useful regexps for finding sentences, pages,... | |
27 @end menu | |
28 | |
29 The @samp{skip-chars@dots{}} functions also perform a kind of searching. | |
30 @xref{Skipping Characters}. | |
31 | |
32 @node String Search | |
33 @section Searching for Strings | |
34 @cindex string search | |
35 | |
36 These are the primitive functions for searching through the text in a | |
37 buffer. They are meant for use in programs, but you may call them | |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
38 interactively. If you do so, they prompt for the search string; the |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
39 arguments @var{limit} and @var{noerror} are @code{nil}, and @var{repeat} |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
40 is 1. |
6552 | 41 |
21007
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
42 These search functions convert the search string to multibyte if the |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
43 buffer is multibyte; they convert the search string to unibyte if the |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
44 buffer is unibyte. @xref{Text Representations}. |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
45 |
6552 | 46 @deffn Command search-forward string &optional limit noerror repeat |
21007
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
47 This function searches forward from point for an exact match for |
6552 | 48 @var{string}. If successful, it sets point to the end of the occurrence |
49 found, and returns the new value of point. If no match is found, the | |
50 value and side effects depend on @var{noerror} (see below). | |
51 @c Emacs 19 feature | |
52 | |
21007
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
53 In the following example, point is initially at the beginning of the |
6552 | 54 line. Then @code{(search-forward "fox")} moves point after the last |
55 letter of @samp{fox}: | |
56 | |
57 @example | |
58 @group | |
59 ---------- Buffer: foo ---------- | |
60 @point{}The quick brown fox jumped over the lazy dog. | |
61 ---------- Buffer: foo ---------- | |
62 @end group | |
63 | |
64 @group | |
65 (search-forward "fox") | |
66 @result{} 20 | |
67 | |
68 ---------- Buffer: foo ---------- | |
69 The quick brown fox@point{} jumped over the lazy dog. | |
70 ---------- Buffer: foo ---------- | |
71 @end group | |
72 @end example | |
73 | |
21007
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
74 The argument @var{limit} specifies the upper bound to the search. (It |
6552 | 75 must be a position in the current buffer.) No match extending after |
76 that position is accepted. If @var{limit} is omitted or @code{nil}, it | |
77 defaults to the end of the accessible portion of the buffer. | |
78 | |
79 @kindex search-failed | |
21007
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
80 What happens when the search fails depends on the value of |
6552 | 81 @var{noerror}. If @var{noerror} is @code{nil}, a @code{search-failed} |
82 error is signaled. If @var{noerror} is @code{t}, @code{search-forward} | |
83 returns @code{nil} and does nothing. If @var{noerror} is neither | |
84 @code{nil} nor @code{t}, then @code{search-forward} moves point to the | |
21007
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
85 upper bound and returns @code{nil}. (It would be more consistent now to |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
86 return the new position of point in that case, but some existing |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
87 programs may depend on a value of @code{nil}.) |
6552 | 88 |
8427
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
89 If @var{repeat} is supplied (it must be a positive number), then the |
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
90 search is repeated that many times (each time starting at the end of the |
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
91 previous time's match). If these successive searches succeed, the |
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
92 function succeeds, moving point and returning its new value. Otherwise |
41939
e9a4c1f03019
Minor clarifications for search-forward and set-match-data.
Richard M. Stallman <rms@gnu.org>
parents:
39166
diff
changeset
|
93 the search fails, leaving point where it started. |
6552 | 94 @end deffn |
95 | |
96 @deffn Command search-backward string &optional limit noerror repeat | |
97 This function searches backward from point for @var{string}. It is | |
98 just like @code{search-forward} except that it searches backwards and | |
99 leaves point at the beginning of the match. | |
100 @end deffn | |
101 | |
102 @deffn Command word-search-forward string &optional limit noerror repeat | |
103 @cindex word search | |
104 This function searches forward from point for a ``word'' match for | |
105 @var{string}. If it finds a match, it sets point to the end of the | |
106 match found, and returns the new value of point. | |
107 @c Emacs 19 feature | |
108 | |
109 Word matching regards @var{string} as a sequence of words, disregarding | |
110 punctuation that separates them. It searches the buffer for the same | |
111 sequence of words. Each word must be distinct in the buffer (searching | |
112 for the word @samp{ball} does not match the word @samp{balls}), but the | |
113 details of punctuation and spacing are ignored (searching for @samp{ball | |
114 boy} does match @samp{ball. Boy!}). | |
115 | |
116 In this example, point is initially at the beginning of the buffer; the | |
117 search leaves it between the @samp{y} and the @samp{!}. | |
118 | |
119 @example | |
120 @group | |
121 ---------- Buffer: foo ---------- | |
122 @point{}He said "Please! Find | |
123 the ball boy!" | |
124 ---------- Buffer: foo ---------- | |
125 @end group | |
126 | |
127 @group | |
128 (word-search-forward "Please find the ball, boy.") | |
129 @result{} 35 | |
130 | |
131 ---------- Buffer: foo ---------- | |
132 He said "Please! Find | |
133 the ball boy@point{}!" | |
134 ---------- Buffer: foo ---------- | |
135 @end group | |
136 @end example | |
137 | |
138 If @var{limit} is non-@code{nil} (it must be a position in the current | |
139 buffer), then it is the upper bound to the search. The match found must | |
140 not extend after that position. | |
141 | |
142 If @var{noerror} is @code{nil}, then @code{word-search-forward} signals | |
143 an error if the search fails. If @var{noerror} is @code{t}, then it | |
144 returns @code{nil} instead of signaling an error. If @var{noerror} is | |
145 neither @code{nil} nor @code{t}, it moves point to @var{limit} (or the | |
146 end of the buffer) and returns @code{nil}. | |
147 | |
148 If @var{repeat} is non-@code{nil}, then the search is repeated that many | |
149 times. Point is positioned at the end of the last match. | |
150 @end deffn | |
151 | |
152 @deffn Command word-search-backward string &optional limit noerror repeat | |
153 This function searches backward from point for a word match to | |
154 @var{string}. This function is just like @code{word-search-forward} | |
155 except that it searches backward and normally leaves point at the | |
156 beginning of the match. | |
157 @end deffn | |
158 | |
159 @node Regular Expressions | |
160 @section Regular Expressions | |
161 @cindex regular expression | |
162 @cindex regexp | |
163 | |
164 A @dfn{regular expression} (@dfn{regexp}, for short) is a pattern that | |
165 denotes a (possibly infinite) set of strings. Searching for matches for | |
166 a regexp is a very powerful operation. This section explains how to write | |
167 regexps; the following section says how to search for them. | |
168 | |
169 @menu | |
170 * Syntax of Regexps:: Rules for writing regular expressions. | |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
171 * Regexp Functions:: Functions for operating on regular expressions. |
6552 | 172 * Regexp Example:: Illustrates regular expression syntax. |
173 @end menu | |
174 | |
175 @node Syntax of Regexps | |
176 @subsection Syntax of Regular Expressions | |
177 | |
8427
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
178 Regular expressions have a syntax in which a few characters are |
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
179 special constructs and the rest are @dfn{ordinary}. An ordinary |
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
180 character is a simple regular expression that matches that character and |
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
181 nothing else. The special characters are @samp{.}, @samp{*}, @samp{+}, |
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
182 @samp{?}, @samp{[}, @samp{]}, @samp{^}, @samp{$}, and @samp{\}; no new |
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
183 special characters will be defined in the future. Any other character |
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
184 appearing in a regular expression is ordinary, unless a @samp{\} |
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
185 precedes it. |
6552 | 186 |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
187 For example, @samp{f} is not a special character, so it is ordinary, and |
6552 | 188 therefore @samp{f} is a regular expression that matches the string |
189 @samp{f} and no other string. (It does @emph{not} match the string | |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
190 @samp{fg}, but it does match a @emph{part} of that string.) Likewise, |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
191 @samp{o} is a regular expression that matches only @samp{o}.@refill |
6552 | 192 |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
193 Any two regular expressions @var{a} and @var{b} can be concatenated. The |
8427
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
194 result is a regular expression that matches a string if @var{a} matches |
6552 | 195 some amount of the beginning of that string and @var{b} matches the rest of |
196 the string.@refill | |
197 | |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
198 As a simple example, we can concatenate the regular expressions @samp{f} |
6552 | 199 and @samp{o} to get the regular expression @samp{fo}, which matches only |
200 the string @samp{fo}. Still trivial. To do something more powerful, you | |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
201 need to use one of the special regular expression constructs. |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
202 |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
203 @menu |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
204 * Regexp Special:: Special characters in regular expressions. |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
205 * Char Classes:: Character classes used in regular expressions. |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
206 * Regexp Backslash:: Backslash-sequences in regular expressions. |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
207 @end menu |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
208 |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
209 @node Regexp Special |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
210 @subsubsection Special Characters in Regular Expressions |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
211 |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
212 Here is a list of the characters that are special in a regular |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
213 expression. |
6552 | 214 |
22274
f0cd03a7dac9
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22252
diff
changeset
|
215 @need 800 |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
216 @table @asis |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
217 @item @samp{.}@: @r{(Period)} |
6552 | 218 @cindex @samp{.} in regexp |
219 is a special character that matches any single character except a newline. | |
220 Using concatenation, we can make regular expressions like @samp{a.b}, which | |
221 matches any three-character string that begins with @samp{a} and ends with | |
222 @samp{b}.@refill | |
223 | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
224 @item @samp{*} |
6552 | 225 @cindex @samp{*} in regexp |
17884
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
226 is not a construct by itself; it is a postfix operator that means to |
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
227 match the preceding regular expression repetitively as many times as |
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
228 possible. Thus, @samp{o*} matches any number of @samp{o}s (including no |
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
229 @samp{o}s). |
6552 | 230 |
231 @samp{*} always applies to the @emph{smallest} possible preceding | |
17884
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
232 expression. Thus, @samp{fo*} has a repeating @samp{o}, not a repeating |
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
233 @samp{fo}. It matches @samp{f}, @samp{fo}, @samp{foo}, and so on. |
6552 | 234 |
21007
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
235 The matcher processes a @samp{*} construct by matching, immediately, as |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
236 many repetitions as can be found. Then it continues with the rest of |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
237 the pattern. If that fails, backtracking occurs, discarding some of the |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
238 matches of the @samp{*}-modified construct in the hope that that will |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
239 make it possible to match the rest of the pattern. For example, in |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
240 matching @samp{ca*ar} against the string @samp{caaar}, the @samp{a*} |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
241 first tries to match all three @samp{a}s; but the rest of the pattern is |
6552 | 242 @samp{ar} and there is only @samp{r} left to match, so this try fails. |
21007
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
243 The next alternative is for @samp{a*} to match only two @samp{a}s. With |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
244 this choice, the rest of the regexp matches successfully.@refill |
6552 | 245 |
11651
f43818d3bbd8
Warn about nested repetition.
Richard M. Stallman <rms@gnu.org>
parents:
10038
diff
changeset
|
246 Nested repetition operators can be extremely slow if they specify |
12067 | 247 backtracking loops. For example, it could take hours for the regular |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
248 expression @samp{\(x+y*\)*a} to try to match the sequence |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
249 @samp{xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxz}, before it ultimately fails. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
250 The slowness is because Emacs must try each imaginable way of grouping |
22252
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
251 the 35 @samp{x}s before concluding that none of them can work. To make |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
252 sure your regular expressions run fast, check nested repetitions |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
253 carefully. |
11651
f43818d3bbd8
Warn about nested repetition.
Richard M. Stallman <rms@gnu.org>
parents:
10038
diff
changeset
|
254 |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
255 @item @samp{+} |
6552 | 256 @cindex @samp{+} in regexp |
17884
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
257 is a postfix operator, similar to @samp{*} except that it must match |
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
258 the preceding expression at least once. So, for example, @samp{ca+r} |
6552 | 259 matches the strings @samp{car} and @samp{caaaar} but not the string |
260 @samp{cr}, whereas @samp{ca*r} matches all three strings. | |
261 | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
262 @item @samp{?} |
6552 | 263 @cindex @samp{?} in regexp |
21007
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
264 is a postfix operator, similar to @samp{*} except that it must match the |
17884
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
265 preceding expression either once or not at all. For example, |
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
266 @samp{ca?r} matches @samp{car} or @samp{cr}; nothing else. |
6552 | 267 |
27095
7cc86d68ccf8
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
26783
diff
changeset
|
268 @item @samp{*?}, @samp{+?}, @samp{??} |
7cc86d68ccf8
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
26783
diff
changeset
|
269 These are ``non-greedy'' variants of the operators @samp{*}, @samp{+} |
7cc86d68ccf8
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
26783
diff
changeset
|
270 and @samp{?}. Where those operators match the largest possible |
7cc86d68ccf8
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
26783
diff
changeset
|
271 substring (consistent with matching the entire containing expression), |
7cc86d68ccf8
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
26783
diff
changeset
|
272 the non-greedy variants match the smallest possible substring |
7cc86d68ccf8
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
26783
diff
changeset
|
273 (consistent with matching the entire containing expression). |
7cc86d68ccf8
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
26783
diff
changeset
|
274 |
7cc86d68ccf8
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
26783
diff
changeset
|
275 For example, the regular expression @samp{c[ad]*a} when applied to the |
7cc86d68ccf8
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
26783
diff
changeset
|
276 string @samp{cdaaada} matches the whole string; but the regular |
7cc86d68ccf8
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
26783
diff
changeset
|
277 expression @samp{c[ad]*?a}, applied to that same string, matches just |
7cc86d68ccf8
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
26783
diff
changeset
|
278 @samp{cda}. (The smallest possible match here for @samp{[ad]*?} that |
7cc86d68ccf8
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
26783
diff
changeset
|
279 permits the whole expression to match is @samp{d}.) |
7cc86d68ccf8
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
26783
diff
changeset
|
280 |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
281 @item @samp{[ @dots{} ]} |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
282 @cindex character alternative (in regexp) |
6552 | 283 @cindex @samp{[} in regexp |
284 @cindex @samp{]} in regexp | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
285 is a @dfn{character alternative}, which begins with @samp{[} and is |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
286 terminated by @samp{]}. In the simplest case, the characters between |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
287 the two brackets are what this character alternative can match. |
6552 | 288 |
17884
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
289 Thus, @samp{[ad]} matches either one @samp{a} or one @samp{d}, and |
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
290 @samp{[ad]*} matches any string composed of just @samp{a}s and @samp{d}s |
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
291 (including the empty string), from which it follows that @samp{c[ad]*r} |
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
292 matches @samp{cr}, @samp{car}, @samp{cdr}, @samp{caddaar}, etc. |
6552 | 293 |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
294 You can also include character ranges in a character alternative, by |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
295 writing the starting and ending characters with a @samp{-} between them. |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
296 Thus, @samp{[a-z]} matches any lower-case @sc{ascii} letter. Ranges may be |
17884
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
297 intermixed freely with individual characters, as in @samp{[a-z$%.]}, |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
298 which matches any lower case @sc{ascii} letter or @samp{$}, @samp{%} or |
17884
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
299 period. |
6552 | 300 |
17884
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
301 Note that the usual regexp special characters are not special inside a |
24934 | 302 character alternative. A completely different set of characters is |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
303 special inside character alternatives: @samp{]}, @samp{-} and @samp{^}. |
17884
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
304 |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
305 To include a @samp{]} in a character alternative, you must make it the |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
306 first character. For example, @samp{[]a]} matches @samp{]} or @samp{a}. |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
307 To include a @samp{-}, write @samp{-} as the first or last character of |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
308 the character alternative, or put it after a range. Thus, @samp{[]-]} |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
309 matches both @samp{]} and @samp{-}. |
6552 | 310 |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
311 To include @samp{^} in a character alternative, put it anywhere but at |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
312 the beginning. |
6552 | 313 |
37842
2b1f94f72990
Use Lisp escape sequences only inside string syntax.
Richard M. Stallman <rms@gnu.org>
parents:
35796
diff
changeset
|
314 The beginning and end of a range of multibyte characters must be in |
2b1f94f72990
Use Lisp escape sequences only inside string syntax.
Richard M. Stallman <rms@gnu.org>
parents:
35796
diff
changeset
|
315 the same character set (@pxref{Character Sets}). Thus, |
2b1f94f72990
Use Lisp escape sequences only inside string syntax.
Richard M. Stallman <rms@gnu.org>
parents:
35796
diff
changeset
|
316 @code{"[\x8e0-\x97c]"} is invalid because character 0x8e0 (@samp{a} |
2b1f94f72990
Use Lisp escape sequences only inside string syntax.
Richard M. Stallman <rms@gnu.org>
parents:
35796
diff
changeset
|
317 with grave accent) is in the Emacs character set for Latin-1 but the |
2b1f94f72990
Use Lisp escape sequences only inside string syntax.
Richard M. Stallman <rms@gnu.org>
parents:
35796
diff
changeset
|
318 character 0x97c (@samp{u} with diaeresis) is in the Emacs character |
2b1f94f72990
Use Lisp escape sequences only inside string syntax.
Richard M. Stallman <rms@gnu.org>
parents:
35796
diff
changeset
|
319 set for Latin-2. (We use Lisp string syntax to write that example, |
2b1f94f72990
Use Lisp escape sequences only inside string syntax.
Richard M. Stallman <rms@gnu.org>
parents:
35796
diff
changeset
|
320 and a few others in the next few paragraphs, in order to include hex |
2b1f94f72990
Use Lisp escape sequences only inside string syntax.
Richard M. Stallman <rms@gnu.org>
parents:
35796
diff
changeset
|
321 escape sequences in them.) |
32464 | 322 |
323 If a range starts with a unibyte character @var{c} and ends with a | |
324 multibyte character @var{c2}, the range is divided into two parts: one | |
325 is @samp{@var{c}..?\377}, the other is @samp{@var{c1}..@var{c2}}, where | |
326 @var{c1} is the first character of the charset to which @var{c2} | |
327 belongs. | |
49600
23a1cea22d13
Trailing whitespace deleted.
Juanma Barranquero <lekktu@gmail.com>
parents:
48701
diff
changeset
|
328 |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
329 You cannot always match all non-@sc{ascii} characters with the regular |
37842
2b1f94f72990
Use Lisp escape sequences only inside string syntax.
Richard M. Stallman <rms@gnu.org>
parents:
35796
diff
changeset
|
330 expression @code{"[\200-\377]"}. This works when searching a unibyte |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
331 buffer or string (@pxref{Text Representations}), but not in a multibyte |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
332 buffer or string, because many non-@sc{ascii} characters have codes |
37842
2b1f94f72990
Use Lisp escape sequences only inside string syntax.
Richard M. Stallman <rms@gnu.org>
parents:
35796
diff
changeset
|
333 above octal 0377. However, the regular expression @code{"[^\000-\177]"} |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
334 does match all non-@sc{ascii} characters (see below regarding @samp{^}), |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
335 in both multibyte and unibyte representations, because only the |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
336 @sc{ascii} characters are excluded. |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
337 |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
338 Starting in Emacs 21, a character alternative can also specify named |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
339 character classes (@pxref{Char Classes}). This is a POSIX feature whose |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
340 syntax is @samp{[:@var{class}:]}. Using a character class is equivalent |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
341 to mentioning each of the characters in that class; but the latter is |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
342 not feasible in practice, since some classes include thousands of |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
343 different characters. |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
344 |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
345 @item @samp{[^ @dots{} ]} |
6552 | 346 @cindex @samp{^} in regexp |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
347 @samp{[^} begins a @dfn{complemented character alternative}, which matches any |
17884
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
348 character except the ones specified. Thus, @samp{[^a-z0-9A-Z]} matches |
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
349 all characters @emph{except} letters and digits. |
6552 | 350 |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
351 @samp{^} is not special in a character alternative unless it is the first |
6552 | 352 character. The character following the @samp{^} is treated as if it |
17884
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
353 were first (in other words, @samp{-} and @samp{]} are not special there). |
6552 | 354 |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
355 A complemented character alternative can match a newline, unless newline is |
17884
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
356 mentioned as one of the characters not to match. This is in contrast to |
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
357 the handling of regexps in programs such as @code{grep}. |
6552 | 358 |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
359 @item @samp{^} |
6552 | 360 @cindex beginning of line in regexp |
8427
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
361 is a special character that matches the empty string, but only at the |
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
362 beginning of a line in the text being matched. Otherwise it fails to |
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
363 match anything. Thus, @samp{^foo} matches a @samp{foo} that occurs at |
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
364 the beginning of a line. |
6552 | 365 |
8427
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
366 When matching a string instead of a buffer, @samp{^} matches at the |
37842
2b1f94f72990
Use Lisp escape sequences only inside string syntax.
Richard M. Stallman <rms@gnu.org>
parents:
35796
diff
changeset
|
367 beginning of the string or after a newline character. |
6552 | 368 |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
369 For historical compatibility reasons, @samp{^} can be used only at the |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
370 beginning of the regular expression, or after @samp{\(} or @samp{\|}. |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
371 |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
372 @item @samp{$} |
6552 | 373 @cindex @samp{$} in regexp |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
374 @cindex end of line in regexp |
6552 | 375 is similar to @samp{^} but matches only at the end of a line. Thus, |
376 @samp{x+$} matches a string of one @samp{x} or more at the end of a line. | |
377 | |
8427
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
378 When matching a string instead of a buffer, @samp{$} matches at the end |
37842
2b1f94f72990
Use Lisp escape sequences only inside string syntax.
Richard M. Stallman <rms@gnu.org>
parents:
35796
diff
changeset
|
379 of the string or before a newline character. |
6552 | 380 |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
381 For historical compatibility reasons, @samp{$} can be used only at the |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
382 end of the regular expression, or before @samp{\)} or @samp{\|}. |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
383 |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
384 @item @samp{\} |
6552 | 385 @cindex @samp{\} in regexp |
386 has two functions: it quotes the special characters (including | |
387 @samp{\}), and it introduces additional special constructs. | |
388 | |
389 Because @samp{\} quotes special characters, @samp{\$} is a regular | |
8427
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
390 expression that matches only @samp{$}, and @samp{\[} is a regular |
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
391 expression that matches only @samp{[}, and so on. |
6552 | 392 |
393 Note that @samp{\} also has special meaning in the read syntax of Lisp | |
394 strings (@pxref{String Type}), and must be quoted with @samp{\}. For | |
395 example, the regular expression that matches the @samp{\} character is | |
396 @samp{\\}. To write a Lisp string that contains the characters | |
397 @samp{\\}, Lisp syntax requires you to quote each @samp{\} with another | |
398 @samp{\}. Therefore, the read syntax for a regular expression matching | |
399 @samp{\} is @code{"\\\\"}.@refill | |
400 @end table | |
401 | |
7735
7db892210924
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7086
diff
changeset
|
402 @strong{Please note:} For historical compatibility, special characters |
6552 | 403 are treated as ordinary ones if they are in contexts where their special |
404 meanings make no sense. For example, @samp{*foo} treats @samp{*} as | |
405 ordinary since there is no preceding expression on which the @samp{*} | |
8427
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
406 can act. It is poor practice to depend on this behavior; quote the |
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
407 special character anyway, regardless of where it appears.@refill |
6552 | 408 |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
409 @node Char Classes |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
410 @subsubsection Character Classes |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
411 @cindex character classes in regexp |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
412 |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
413 Here is a table of the classes you can use in a character alternative, |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
414 in Emacs 21, and what they mean: |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
415 |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
416 @table @samp |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
417 @item [:ascii:] |
27373
a6d5729aef1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27193
diff
changeset
|
418 This matches any @sc{ascii} (unibyte) character. |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
419 @item [:alnum:] |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
420 This matches any letter or digit. (At present, for multibyte |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
421 characters, it matches anything that has word syntax.) |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
422 @item [:alpha:] |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
423 This matches any letter. (At present, for multibyte characters, it |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
424 matches anything that has word syntax.) |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
425 @item [:blank:] |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
426 This matches space and tab only. |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
427 @item [:cntrl:] |
27373
a6d5729aef1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27193
diff
changeset
|
428 This matches any @sc{ascii} control character. |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
429 @item [:digit:] |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
430 This matches @samp{0} through @samp{9}. Thus, @samp{[-+[:digit:]]} |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
431 matches any digit, as well as @samp{+} and @samp{-}. |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
432 @item [:graph:] |
27374
0f5edee5242b
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27373
diff
changeset
|
433 This matches graphic characters---everything except @sc{ascii} control |
0f5edee5242b
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27373
diff
changeset
|
434 characters, space, and the delete character. |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
435 @item [:lower:] |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
436 This matches any lower-case letter, as determined by |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
437 the current case table (@pxref{Case Tables}). |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
438 @item [:nonascii:] |
27373
a6d5729aef1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27193
diff
changeset
|
439 This matches any non-@sc{ascii} (multibyte) character. |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
440 @item [:print:] |
27373
a6d5729aef1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27193
diff
changeset
|
441 This matches printing characters---everything except @sc{ascii} control |
a6d5729aef1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27193
diff
changeset
|
442 characters and the delete character. |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
443 @item [:punct:] |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
444 This matches any punctuation character. (At present, for multibyte |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
445 characters, it matches anything that has non-word syntax.) |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
446 @item [:space:] |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
447 This matches any character that has whitespace syntax |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
448 (@pxref{Syntax Class Table}). |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
449 @item [:upper:] |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
450 This matches any upper-case letter, as determined by |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
451 the current case table (@pxref{Case Tables}). |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
452 @item [:word:] |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
453 This matches any character that has word syntax (@pxref{Syntax Class |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
454 Table}). |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
455 @item [:xdigit:] |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
456 This matches the hexadecimal digits: @samp{0} through @samp{9}, @samp{a} |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
457 through @samp{f} and @samp{A} through @samp{F}. |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
458 @end table |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
459 |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
460 @node Regexp Backslash |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
461 @subsubsection Backslash Constructs in Regular Expressions |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
462 |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
463 For the most part, @samp{\} followed by any character matches only |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
464 that character. However, there are several exceptions: certain |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
465 two-character sequences starting with @samp{\} that have special |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
466 meanings. (The character after the @samp{\} in such a sequence is |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
467 always ordinary when used on its own.) Here is a table of the special |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
468 @samp{\} constructs. |
6552 | 469 |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
470 @table @samp |
6552 | 471 @item \| |
472 @cindex @samp{|} in regexp | |
473 @cindex regexp alternative | |
474 specifies an alternative. | |
475 Two regular expressions @var{a} and @var{b} with @samp{\|} in | |
476 between form an expression that matches anything that either @var{a} or | |
477 @var{b} matches.@refill | |
478 | |
479 Thus, @samp{foo\|bar} matches either @samp{foo} or @samp{bar} | |
480 but no other string.@refill | |
481 | |
482 @samp{\|} applies to the largest possible surrounding expressions. Only a | |
483 surrounding @samp{\( @dots{} \)} grouping can limit the grouping power of | |
484 @samp{\|}.@refill | |
485 | |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
486 Full backtracking capability exists to handle multiple uses of |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
487 @samp{\|}, if you use the POSIX regular expression functions |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
488 (@pxref{POSIX Regexps}). |
6552 | 489 |
27780
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
490 @item \@{@var{m}\@} |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
491 is a postfix operator that repeats the previous pattern exactly @var{m} |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
492 times. Thus, @samp{x\@{5\@}} matches the string @samp{xxxxx} |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
493 and nothing else. @samp{c[ad]\@{3\@}r} matches string such as |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
494 @samp{caaar}, @samp{cdddr}, @samp{cadar}, and so on. |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
495 |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
496 @item \@{@var{m},@var{n}\@} |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
497 is more general postfix operator that specifies repetition with a |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
498 minimum of @var{m} repeats and a maximum of @var{n} repeats. If @var{m} |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
499 is omitted, the minimum is 0; if @var{n} is omitted, there is no |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
500 maximum. |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
501 |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
502 For example, @samp{c[ad]\@{1,2\@}r} matches the strings @samp{car}, |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
503 @samp{cdr}, @samp{caar}, @samp{cadr}, @samp{cdar}, and @samp{cddr}, and |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
504 nothing else.@* |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
505 @samp{\@{0,1\@}} or @samp{\@{,1\@}} is equivalent to @samp{?}. @* |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
506 @samp{\@{0,\@}} or @samp{\@{,\@}} is equivalent to @samp{*}. @* |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
507 @samp{\@{1,\@}} is equivalent to @samp{+}. |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
508 |
6552 | 509 @item \( @dots{} \) |
510 @cindex @samp{(} in regexp | |
511 @cindex @samp{)} in regexp | |
512 @cindex regexp grouping | |
513 is a grouping construct that serves three purposes: | |
514 | |
515 @enumerate | |
516 @item | |
16736
981e116b4ac6
Minor cleanups for overfull hboxes.
Richard M. Stallman <rms@gnu.org>
parents:
12805
diff
changeset
|
517 To enclose a set of @samp{\|} alternatives for other operations. Thus, |
981e116b4ac6
Minor cleanups for overfull hboxes.
Richard M. Stallman <rms@gnu.org>
parents:
12805
diff
changeset
|
518 the regular expression @samp{\(foo\|bar\)x} matches either @samp{foox} |
981e116b4ac6
Minor cleanups for overfull hboxes.
Richard M. Stallman <rms@gnu.org>
parents:
12805
diff
changeset
|
519 or @samp{barx}. |
6552 | 520 |
521 @item | |
17884
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
522 To enclose a complicated expression for the postfix operators @samp{*}, |
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
523 @samp{+} and @samp{?} to operate on. Thus, @samp{ba\(na\)*} matches |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
524 @samp{ba}, @samp{bana}, @samp{banana}, @samp{bananana}, etc., with any |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
525 number (zero or more) of @samp{na} strings. |
6552 | 526 |
527 @item | |
27780
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
528 To record a matched substring for future reference with |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
529 @samp{\@var{digit}} (see below). |
6552 | 530 @end enumerate |
531 | |
532 This last application is not a consequence of the idea of a | |
27780
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
533 parenthetical grouping; it is a separate feature that was assigned as a |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
534 second meaning to the same @samp{\( @dots{} \)} construct because, in |
48701 | 535 practice, there was usually no conflict between the two meanings. But |
27780
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
536 occasionally there is a conflict, and that led to the introduction of |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
537 shy groups. |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
538 |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
539 @item \(?: @dots{} \) |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
540 is the @dfn{shy group} construct. A shy group serves the first two |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
541 purposes of an ordinary group (controlling the nesting of other |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
542 operators), but it does not get a number, so you cannot refer back to |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
543 its value with @samp{\@var{digit}}. |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
544 |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
545 Shy groups are particulary useful for mechanically-constructed regular |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
546 expressions because they can be added automatically without altering the |
72cae205b4f4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
27374
diff
changeset
|
547 numbering of any ordinary, non-shy groups. |
6552 | 548 |
549 @item \@var{digit} | |
8427
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
550 matches the same text that matched the @var{digit}th occurrence of a |
39166
f0bfa8a7d472
Explain clearly what \digit does when that grouping
Richard M. Stallman <rms@gnu.org>
parents:
38927
diff
changeset
|
551 grouping (@samp{\( @dots{} \)}) construct. |
f0bfa8a7d472
Explain clearly what \digit does when that grouping
Richard M. Stallman <rms@gnu.org>
parents:
38927
diff
changeset
|
552 |
f0bfa8a7d472
Explain clearly what \digit does when that grouping
Richard M. Stallman <rms@gnu.org>
parents:
38927
diff
changeset
|
553 In other words, after the end of a group, the matcher remembers the |
f0bfa8a7d472
Explain clearly what \digit does when that grouping
Richard M. Stallman <rms@gnu.org>
parents:
38927
diff
changeset
|
554 beginning and end of the text matched by that group. Later on in the |
f0bfa8a7d472
Explain clearly what \digit does when that grouping
Richard M. Stallman <rms@gnu.org>
parents:
38927
diff
changeset
|
555 regular expression you can use @samp{\} followed by @var{digit} to |
f0bfa8a7d472
Explain clearly what \digit does when that grouping
Richard M. Stallman <rms@gnu.org>
parents:
38927
diff
changeset
|
556 match that same text, whatever it may have been. |
6552 | 557 |
39166
f0bfa8a7d472
Explain clearly what \digit does when that grouping
Richard M. Stallman <rms@gnu.org>
parents:
38927
diff
changeset
|
558 The strings matching the first nine grouping constructs appearing in |
f0bfa8a7d472
Explain clearly what \digit does when that grouping
Richard M. Stallman <rms@gnu.org>
parents:
38927
diff
changeset
|
559 the entire regular expression passed to a search or matching function |
f0bfa8a7d472
Explain clearly what \digit does when that grouping
Richard M. Stallman <rms@gnu.org>
parents:
38927
diff
changeset
|
560 are assigned numbers 1 through 9 in the order that the open |
f0bfa8a7d472
Explain clearly what \digit does when that grouping
Richard M. Stallman <rms@gnu.org>
parents:
38927
diff
changeset
|
561 parentheses appear in the regular expression. So you can use |
f0bfa8a7d472
Explain clearly what \digit does when that grouping
Richard M. Stallman <rms@gnu.org>
parents:
38927
diff
changeset
|
562 @samp{\1} through @samp{\9} to refer to the text matched by the |
f0bfa8a7d472
Explain clearly what \digit does when that grouping
Richard M. Stallman <rms@gnu.org>
parents:
38927
diff
changeset
|
563 corresponding grouping constructs. |
6552 | 564 |
565 For example, @samp{\(.*\)\1} matches any newline-free string that is | |
566 composed of two identical halves. The @samp{\(.*\)} matches the first | |
567 half, which may be anything, but the @samp{\1} that follows must match | |
568 the same exact text. | |
569 | |
39166
f0bfa8a7d472
Explain clearly what \digit does when that grouping
Richard M. Stallman <rms@gnu.org>
parents:
38927
diff
changeset
|
570 If a particular grouping construct in the regular expression was never |
f0bfa8a7d472
Explain clearly what \digit does when that grouping
Richard M. Stallman <rms@gnu.org>
parents:
38927
diff
changeset
|
571 matched---for instance, if it appears inside of an alternative that |
f0bfa8a7d472
Explain clearly what \digit does when that grouping
Richard M. Stallman <rms@gnu.org>
parents:
38927
diff
changeset
|
572 wasn't used, or inside of a repetition that repeated zero times---then |
f0bfa8a7d472
Explain clearly what \digit does when that grouping
Richard M. Stallman <rms@gnu.org>
parents:
38927
diff
changeset
|
573 the corresponding @samp{\@var{digit}} construct never matches |
f0bfa8a7d472
Explain clearly what \digit does when that grouping
Richard M. Stallman <rms@gnu.org>
parents:
38927
diff
changeset
|
574 anything. To use an artificial example,, @samp{\(foo\(b*\)\|lose\)\2} |
f0bfa8a7d472
Explain clearly what \digit does when that grouping
Richard M. Stallman <rms@gnu.org>
parents:
38927
diff
changeset
|
575 cannot match @samp{lose}: the second alternative inside the larger |
f0bfa8a7d472
Explain clearly what \digit does when that grouping
Richard M. Stallman <rms@gnu.org>
parents:
38927
diff
changeset
|
576 group matches it, but then @samp{\2} is undefined and can't match |
f0bfa8a7d472
Explain clearly what \digit does when that grouping
Richard M. Stallman <rms@gnu.org>
parents:
38927
diff
changeset
|
577 anything. But it can match @samp{foobb}, because the first |
f0bfa8a7d472
Explain clearly what \digit does when that grouping
Richard M. Stallman <rms@gnu.org>
parents:
38927
diff
changeset
|
578 alternative matches @samp{foob} and @samp{\2} matches @samp{b}. |
f0bfa8a7d472
Explain clearly what \digit does when that grouping
Richard M. Stallman <rms@gnu.org>
parents:
38927
diff
changeset
|
579 |
6552 | 580 @item \w |
581 @cindex @samp{\w} in regexp | |
582 matches any word-constituent character. The editor syntax table | |
583 determines which characters these are. @xref{Syntax Tables}. | |
584 | |
585 @item \W | |
586 @cindex @samp{\W} in regexp | |
8427
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
587 matches any character that is not a word constituent. |
6552 | 588 |
589 @item \s@var{code} | |
590 @cindex @samp{\s} in regexp | |
591 matches any character whose syntax is @var{code}. Here @var{code} is a | |
8427
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
592 character that represents a syntax code: thus, @samp{w} for word |
6552 | 593 constituent, @samp{-} for whitespace, @samp{(} for open parenthesis, |
21007
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
594 etc. To represent whitespace syntax, use either @samp{-} or a space |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
595 character. @xref{Syntax Class Table}, for a list of syntax codes and |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
596 the characters that stand for them. |
6552 | 597 |
598 @item \S@var{code} | |
599 @cindex @samp{\S} in regexp | |
600 matches any character whose syntax is not @var{code}. | |
35796 | 601 |
602 @item \c@var{c} | |
603 matches any character whose category is @var{c}. Here @var{c} is a | |
604 character that represents a category: thus, @samp{c} for Chinese | |
605 characters or @samp{g} for Greek characters in the standard category | |
606 table. | |
607 | |
608 @item \C@var{c} | |
609 matches any character whose category is not @var{c}. | |
6552 | 610 @end table |
611 | |
8427
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
612 The following regular expression constructs match the empty string---that is, |
6552 | 613 they don't use up any characters---but whether they match depends on the |
614 context. | |
615 | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
616 @table @samp |
6552 | 617 @item \` |
618 @cindex @samp{\`} in regexp | |
619 matches the empty string, but only at the beginning | |
620 of the buffer or string being matched against. | |
621 | |
622 @item \' | |
623 @cindex @samp{\'} in regexp | |
624 matches the empty string, but only at the end of | |
625 the buffer or string being matched against. | |
626 | |
627 @item \= | |
628 @cindex @samp{\=} in regexp | |
629 matches the empty string, but only at point. | |
630 (This construct is not defined when matching against a string.) | |
631 | |
632 @item \b | |
633 @cindex @samp{\b} in regexp | |
634 matches the empty string, but only at the beginning or | |
635 end of a word. Thus, @samp{\bfoo\b} matches any occurrence of | |
636 @samp{foo} as a separate word. @samp{\bballs?\b} matches | |
637 @samp{ball} or @samp{balls} as a separate word.@refill | |
638 | |
17884
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
639 @samp{\b} matches at the beginning or end of the buffer |
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
640 regardless of what text appears next to it. |
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
641 |
6552 | 642 @item \B |
643 @cindex @samp{\B} in regexp | |
644 matches the empty string, but @emph{not} at the beginning or | |
645 end of a word. | |
646 | |
647 @item \< | |
648 @cindex @samp{\<} in regexp | |
649 matches the empty string, but only at the beginning of a word. | |
17884
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
650 @samp{\<} matches at the beginning of the buffer only if a |
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
651 word-constituent character follows. |
6552 | 652 |
653 @item \> | |
654 @cindex @samp{\>} in regexp | |
17884
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
655 matches the empty string, but only at the end of a word. @samp{\>} |
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
656 matches at the end of the buffer only if the contents end with a |
aa0b21b54684
Update regexp syntax from Emacs manual.
Richard M. Stallman <rms@gnu.org>
parents:
16736
diff
changeset
|
657 word-constituent character. |
6552 | 658 @end table |
659 | |
660 @kindex invalid-regexp | |
661 Not every string is a valid regular expression. For example, a string | |
662 with unbalanced square brackets is invalid (with a few exceptions, such | |
8427
bc548090f760
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
7735
diff
changeset
|
663 as @samp{[]]}), and so is a string that ends with a single @samp{\}. If |
6552 | 664 an invalid regular expression is passed to any of the search functions, |
665 an @code{invalid-regexp} error is signaled. | |
666 | |
667 @node Regexp Example | |
668 @comment node-name, next, previous, up | |
669 @subsection Complex Regexp Example | |
670 | |
671 Here is a complicated regexp, used by Emacs to recognize the end of a | |
672 sentence together with any whitespace that follows. It is the value of | |
49600
23a1cea22d13
Trailing whitespace deleted.
Juanma Barranquero <lekktu@gmail.com>
parents:
48701
diff
changeset
|
673 the variable @code{sentence-end}. |
6552 | 674 |
675 First, we show the regexp as a string in Lisp syntax to distinguish | |
676 spaces from tab characters. The string constant begins and ends with a | |
677 double-quote. @samp{\"} stands for a double-quote as part of the | |
678 string, @samp{\\} for a backslash as part of the string, @samp{\t} for a | |
679 tab and @samp{\n} for a newline. | |
680 | |
681 @example | |
682 "[.?!][]\"')@}]*\\($\\| $\\|\t\\| \\)[ \t\n]*" | |
683 @end example | |
684 | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
685 @noindent |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
686 In contrast, if you evaluate the variable @code{sentence-end}, you |
6552 | 687 will see the following: |
688 | |
689 @example | |
690 @group | |
691 sentence-end | |
49600
23a1cea22d13
Trailing whitespace deleted.
Juanma Barranquero <lekktu@gmail.com>
parents:
48701
diff
changeset
|
692 @result{} "[.?!][]\"')@}]*\\($\\| $\\| \\| \\)[ |
6552 | 693 ]*" |
694 @end group | |
695 @end example | |
696 | |
697 @noindent | |
698 In this output, tab and newline appear as themselves. | |
699 | |
700 This regular expression contains four parts in succession and can be | |
701 deciphered as follows: | |
702 | |
703 @table @code | |
704 @item [.?!] | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
705 The first part of the pattern is a character alternative that matches |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
706 any one of three characters: period, question mark, and exclamation |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
707 mark. The match must begin with one of these three characters. |
6552 | 708 |
709 @item []\"')@}]* | |
710 The second part of the pattern matches any closing braces and quotation | |
711 marks, zero or more of them, that may follow the period, question mark | |
712 or exclamation mark. The @code{\"} is Lisp syntax for a double-quote in | |
713 a string. The @samp{*} at the end indicates that the immediately | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
714 preceding regular expression (a character alternative, in this case) may be |
6552 | 715 repeated zero or more times. |
716 | |
8469 | 717 @item \\($\\|@ $\\|\t\\|@ @ \\) |
6552 | 718 The third part of the pattern matches the whitespace that follows the |
21007
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
719 end of a sentence: the end of a line (optionally with a space), or a |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
720 tab, or two spaces. The double backslashes mark the parentheses and |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
721 vertical bars as regular expression syntax; the parentheses delimit a |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
722 group and the vertical bars separate alternatives. The dollar sign is |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
723 used to match the end of a line. |
6552 | 724 |
725 @item [ \t\n]* | |
726 Finally, the last part of the pattern matches any additional whitespace | |
727 beyond the minimum needed to end a sentence. | |
728 @end table | |
729 | |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
730 @node Regexp Functions |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
731 @subsection Regular Expression Functions |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
732 |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
733 These functions operate on regular expressions. |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
734 |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
735 @defun regexp-quote string |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
736 This function returns a regular expression whose only exact match is |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
737 @var{string}. Using this regular expression in @code{looking-at} will |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
738 succeed only if the next characters in the buffer are @var{string}; |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
739 using it in a search function will succeed if the text being searched |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
740 contains @var{string}. |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
741 |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
742 This allows you to request an exact string match or search when calling |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
743 a function that wants a regular expression. |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
744 |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
745 @example |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
746 @group |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
747 (regexp-quote "^The cat$") |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
748 @result{} "\\^The cat\\$" |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
749 @end group |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
750 @end example |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
751 |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
752 One use of @code{regexp-quote} is to combine an exact string match with |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
753 context described as a regular expression. For example, this searches |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
754 for the string that is the value of @var{string}, surrounded by |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
755 whitespace: |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
756 |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
757 @example |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
758 @group |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
759 (re-search-forward |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
760 (concat "\\s-" (regexp-quote string) "\\s-")) |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
761 @end group |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
762 @end example |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
763 @end defun |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
764 |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
765 @defun regexp-opt strings &optional paren |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
766 This function returns an efficient regular expression that will match |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
767 any of the strings @var{strings}. This is useful when you need to make |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
768 matching or searching as fast as possible---for example, for Font Lock |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
769 mode. |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
770 |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
771 If the optional argument @var{paren} is non-@code{nil}, then the |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
772 returned regular expression is always enclosed by at least one |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
773 parentheses-grouping construct. |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
774 |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
775 This simplified definition of @code{regexp-opt} produces a |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
776 regular expression which is equivalent to the actual value |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
777 (but not as efficient): |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
778 |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
779 @example |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
780 (defun regexp-opt (strings paren) |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
781 (let ((open-paren (if paren "\\(" "")) |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
782 (close-paren (if paren "\\)" ""))) |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
783 (concat open-paren |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
784 (mapconcat 'regexp-quote strings "\\|") |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
785 close-paren))) |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
786 @end example |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
787 @end defun |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
788 |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
789 @defun regexp-opt-depth regexp |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
790 This function returns the total number of grouping constructs |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
791 (parenthesized expressions) in @var{regexp}. |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
792 @end defun |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
793 |
6552 | 794 @node Regexp Search |
795 @section Regular Expression Searching | |
796 @cindex regular expression searching | |
797 @cindex regexp searching | |
798 @cindex searching for regexp | |
799 | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
800 In GNU Emacs, you can search for the next match for a regular |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
801 expression either incrementally or not. For incremental search |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
802 commands, see @ref{Regexp Search, , Regular Expression Search, emacs, |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
803 The GNU Emacs Manual}. Here we describe only the search functions |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
804 useful in programs. The principal one is @code{re-search-forward}. |
6552 | 805 |
21007
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
806 These search functions convert the regular expression to multibyte if |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
807 the buffer is multibyte; they convert the regular expression to unibyte |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
808 if the buffer is unibyte. @xref{Text Representations}. |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
809 |
6552 | 810 @deffn Command re-search-forward regexp &optional limit noerror repeat |
811 This function searches forward in the current buffer for a string of | |
812 text that is matched by the regular expression @var{regexp}. The | |
813 function skips over any amount of text that is not matched by | |
814 @var{regexp}, and leaves point at the end of the first match found. | |
815 It returns the new value of point. | |
816 | |
817 If @var{limit} is non-@code{nil} (it must be a position in the current | |
818 buffer), then it is the upper bound to the search. No match extending | |
819 after that position is accepted. | |
820 | |
21007
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
821 If @var{repeat} is supplied (it must be a positive number), then the |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
822 search is repeated that many times (each time starting at the end of the |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
823 previous time's match). If all these successive searches succeed, the |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
824 function succeeds, moving point and returning its new value. Otherwise |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
825 the function fails. |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
826 |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
827 What happens when the function fails depends on the value of |
6552 | 828 @var{noerror}. If @var{noerror} is @code{nil}, a @code{search-failed} |
829 error is signaled. If @var{noerror} is @code{t}, | |
830 @code{re-search-forward} does nothing and returns @code{nil}. If | |
831 @var{noerror} is neither @code{nil} nor @code{t}, then | |
832 @code{re-search-forward} moves point to @var{limit} (or the end of the | |
833 buffer) and returns @code{nil}. | |
834 | |
835 In the following example, point is initially before the @samp{T}. | |
836 Evaluating the search call moves point to the end of that line (between | |
837 the @samp{t} of @samp{hat} and the newline). | |
838 | |
839 @example | |
840 @group | |
841 ---------- Buffer: foo ---------- | |
842 I read "@point{}The cat in the hat | |
843 comes back" twice. | |
844 ---------- Buffer: foo ---------- | |
845 @end group | |
846 | |
847 @group | |
848 (re-search-forward "[a-z]+" nil t 5) | |
849 @result{} 27 | |
850 | |
851 ---------- Buffer: foo ---------- | |
852 I read "The cat in the hat@point{} | |
853 comes back" twice. | |
854 ---------- Buffer: foo ---------- | |
855 @end group | |
856 @end example | |
857 @end deffn | |
858 | |
859 @deffn Command re-search-backward regexp &optional limit noerror repeat | |
860 This function searches backward in the current buffer for a string of | |
861 text that is matched by the regular expression @var{regexp}, leaving | |
862 point at the beginning of the first text found. | |
863 | |
8469 | 864 This function is analogous to @code{re-search-forward}, but they are not |
865 simple mirror images. @code{re-search-forward} finds the match whose | |
866 beginning is as close as possible to the starting point. If | |
867 @code{re-search-backward} were a perfect mirror image, it would find the | |
868 match whose end is as close as possible. However, in fact it finds the | |
25089 | 869 match whose beginning is as close as possible. The reason for this is that |
8469 | 870 matching a regular expression at a given spot always works from |
871 beginning to end, and starts at a specified beginning position. | |
6552 | 872 |
873 A true mirror-image of @code{re-search-forward} would require a special | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
874 feature for matching regular expressions from end to beginning. It's |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
875 not worth the trouble of implementing that. |
6552 | 876 @end deffn |
877 | |
878 @defun string-match regexp string &optional start | |
879 This function returns the index of the start of the first match for | |
880 the regular expression @var{regexp} in @var{string}, or @code{nil} if | |
881 there is no match. If @var{start} is non-@code{nil}, the search starts | |
882 at that index in @var{string}. | |
883 | |
884 For example, | |
885 | |
886 @example | |
887 @group | |
888 (string-match | |
889 "quick" "The quick brown fox jumped quickly.") | |
890 @result{} 4 | |
891 @end group | |
892 @group | |
893 (string-match | |
894 "quick" "The quick brown fox jumped quickly." 8) | |
895 @result{} 27 | |
896 @end group | |
897 @end example | |
898 | |
899 @noindent | |
900 The index of the first character of the | |
901 string is 0, the index of the second character is 1, and so on. | |
902 | |
903 After this function returns, the index of the first character beyond | |
904 the match is available as @code{(match-end 0)}. @xref{Match Data}. | |
905 | |
906 @example | |
907 @group | |
908 (string-match | |
909 "quick" "The quick brown fox jumped quickly." 8) | |
910 @result{} 27 | |
911 @end group | |
912 | |
913 @group | |
914 (match-end 0) | |
915 @result{} 32 | |
916 @end group | |
917 @end example | |
918 @end defun | |
919 | |
920 @defun looking-at regexp | |
921 This function determines whether the text in the current buffer directly | |
922 following point matches the regular expression @var{regexp}. ``Directly | |
923 following'' means precisely that: the search is ``anchored'' and it can | |
924 succeed only starting with the first character following point. The | |
925 result is @code{t} if so, @code{nil} otherwise. | |
926 | |
927 This function does not move point, but it updates the match data, which | |
928 you can access using @code{match-beginning} and @code{match-end}. | |
929 @xref{Match Data}. | |
930 | |
931 In this example, point is located directly before the @samp{T}. If it | |
932 were anywhere else, the result would be @code{nil}. | |
933 | |
934 @example | |
935 @group | |
936 ---------- Buffer: foo ---------- | |
937 I read "@point{}The cat in the hat | |
938 comes back" twice. | |
939 ---------- Buffer: foo ---------- | |
940 | |
941 (looking-at "The cat in the hat$") | |
942 @result{} t | |
943 @end group | |
944 @end example | |
945 @end defun | |
946 | |
12067 | 947 @node POSIX Regexps |
948 @section POSIX Regular Expression Searching | |
949 | |
950 The usual regular expression functions do backtracking when necessary | |
951 to handle the @samp{\|} and repetition constructs, but they continue | |
952 this only until they find @emph{some} match. Then they succeed and | |
953 report the first match found. | |
954 | |
955 This section describes alternative search functions which perform the | |
956 full backtracking specified by the POSIX standard for regular expression | |
957 matching. They continue backtracking until they have tried all | |
958 possibilities and found all matches, so they can report the longest | |
959 match, as required by POSIX. This is much slower, so use these | |
960 functions only when you really need the longest match. | |
961 | |
962 @defun posix-search-forward regexp &optional limit noerror repeat | |
963 This is like @code{re-search-forward} except that it performs the full | |
964 backtracking specified by the POSIX standard for regular expression | |
965 matching. | |
966 @end defun | |
967 | |
968 @defun posix-search-backward regexp &optional limit noerror repeat | |
969 This is like @code{re-search-backward} except that it performs the full | |
970 backtracking specified by the POSIX standard for regular expression | |
971 matching. | |
972 @end defun | |
973 | |
974 @defun posix-looking-at regexp | |
975 This is like @code{looking-at} except that it performs the full | |
976 backtracking specified by the POSIX standard for regular expression | |
977 matching. | |
978 @end defun | |
979 | |
980 @defun posix-string-match regexp string &optional start | |
981 This is like @code{string-match} except that it performs the full | |
982 backtracking specified by the POSIX standard for regular expression | |
983 matching. | |
984 @end defun | |
985 | |
6552 | 986 @ignore |
987 @deffn Command delete-matching-lines regexp | |
988 This function is identical to @code{delete-non-matching-lines}, save | |
989 that it deletes what @code{delete-non-matching-lines} keeps. | |
990 | |
991 In the example below, point is located on the first line of text. | |
992 | |
993 @example | |
994 @group | |
995 ---------- Buffer: foo ---------- | |
996 We hold these truths | |
997 to be self-evident, | |
998 that all men are created | |
999 equal, and that they are | |
1000 ---------- Buffer: foo ---------- | |
1001 @end group | |
1002 | |
1003 @group | |
1004 (delete-matching-lines "the") | |
1005 @result{} nil | |
1006 | |
1007 ---------- Buffer: foo ---------- | |
1008 to be self-evident, | |
1009 that all men are created | |
1010 ---------- Buffer: foo ---------- | |
1011 @end group | |
1012 @end example | |
1013 @end deffn | |
1014 | |
1015 @deffn Command flush-lines regexp | |
1016 This function is the same as @code{delete-matching-lines}. | |
1017 @end deffn | |
1018 | |
1019 @defun delete-non-matching-lines regexp | |
1020 This function deletes all lines following point which don't | |
1021 contain a match for the regular expression @var{regexp}. | |
1022 @end defun | |
1023 | |
1024 @deffn Command keep-lines regexp | |
1025 This function is the same as @code{delete-non-matching-lines}. | |
1026 @end deffn | |
1027 | |
1028 @deffn Command how-many regexp | |
1029 This function counts the number of matches for @var{regexp} there are in | |
1030 the current buffer following point. It prints this number in | |
1031 the echo area, returning the string printed. | |
1032 @end deffn | |
1033 | |
1034 @deffn Command count-matches regexp | |
1035 This function is a synonym of @code{how-many}. | |
1036 @end deffn | |
1037 | |
26288 | 1038 @deffn Command list-matching-lines regexp &optional nlines |
6552 | 1039 This function is a synonym of @code{occur}. |
1040 Show all lines following point containing a match for @var{regexp}. | |
1041 Display each line with @var{nlines} lines before and after, | |
1042 or @code{-}@var{nlines} before if @var{nlines} is negative. | |
1043 @var{nlines} defaults to @code{list-matching-lines-default-context-lines}. | |
1044 Interactively it is the prefix arg. | |
1045 | |
1046 The lines are shown in a buffer named @samp{*Occur*}. | |
1047 It serves as a menu to find any of the occurrences in this buffer. | |
24934 | 1048 @kbd{C-h m} (@code{describe-mode}) in that buffer gives help. |
6552 | 1049 @end deffn |
1050 | |
1051 @defopt list-matching-lines-default-context-lines | |
1052 Default value is 0. | |
1053 Default number of context lines to include around a @code{list-matching-lines} | |
1054 match. A negative number means to include that many lines before the match. | |
1055 A positive number means to include that many lines both before and after. | |
1056 @end defopt | |
1057 @end ignore | |
1058 | |
1059 @node Search and Replace | |
1060 @section Search and Replace | |
1061 @cindex replacement | |
1062 | |
47435
9b2bd1816871
(Search and Replace): Fix arg order for perform-replace.
Richard M. Stallman <rms@gnu.org>
parents:
45104
diff
changeset
|
1063 @defun perform-replace from-string replacements query-flag regexp-flag delimited-flag &optional repeat-count map start end |
38927
53166863c34a
(Search and Replace): Add description of
Gerd Moellmann <gerd@gnu.org>
parents:
37842
diff
changeset
|
1064 This function is the guts of @code{query-replace} and related |
53166863c34a
(Search and Replace): Add description of
Gerd Moellmann <gerd@gnu.org>
parents:
37842
diff
changeset
|
1065 commands. It searches for occurrences of @var{from-string} in the |
53166863c34a
(Search and Replace): Add description of
Gerd Moellmann <gerd@gnu.org>
parents:
37842
diff
changeset
|
1066 text between positions @var{start} and @var{end} and replaces some or |
47435
9b2bd1816871
(Search and Replace): Fix arg order for perform-replace.
Richard M. Stallman <rms@gnu.org>
parents:
45104
diff
changeset
|
1067 all of them. If @var{start} is @code{nil} (or omitted), point is used |
9b2bd1816871
(Search and Replace): Fix arg order for perform-replace.
Richard M. Stallman <rms@gnu.org>
parents:
45104
diff
changeset
|
1068 instead, and the buffer's end is used for @var{end}. |
38927
53166863c34a
(Search and Replace): Add description of
Gerd Moellmann <gerd@gnu.org>
parents:
37842
diff
changeset
|
1069 |
53166863c34a
(Search and Replace): Add description of
Gerd Moellmann <gerd@gnu.org>
parents:
37842
diff
changeset
|
1070 If @var{query-flag} is @code{nil}, it replaces all |
6552 | 1071 occurrences; otherwise, it asks the user what to do about each one. |
1072 | |
1073 If @var{regexp-flag} is non-@code{nil}, then @var{from-string} is | |
1074 considered a regular expression; otherwise, it must match literally. If | |
1075 @var{delimited-flag} is non-@code{nil}, then only replacements | |
1076 surrounded by word boundaries are considered. | |
1077 | |
1078 The argument @var{replacements} specifies what to replace occurrences | |
1079 with. If it is a string, that string is used. It can also be a list of | |
1080 strings, to be used in cyclic order. | |
1081 | |
26783 | 1082 If @var{replacements} is a cons cell, @code{(@var{function} |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
1083 . @var{data})}, this means to call @var{function} after each match to |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
1084 get the replacement text. This function is called with two arguments: |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
1085 @var{data}, and the number of replacements already made. |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
1086 |
12282
586e3ea81792
updates for version 19.29 made by melissa; also needed to check out files
Melissa Weisshaus <melissa@gnu.org>
parents:
12125
diff
changeset
|
1087 If @var{repeat-count} is non-@code{nil}, it should be an integer. Then |
586e3ea81792
updates for version 19.29 made by melissa; also needed to check out files
Melissa Weisshaus <melissa@gnu.org>
parents:
12125
diff
changeset
|
1088 it specifies how many times to use each of the strings in the |
586e3ea81792
updates for version 19.29 made by melissa; also needed to check out files
Melissa Weisshaus <melissa@gnu.org>
parents:
12125
diff
changeset
|
1089 @var{replacements} list before advancing cyclicly to the next one. |
6552 | 1090 |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
1091 If @var{from-string} contains upper-case letters, then |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
1092 @code{perform-replace} binds @code{case-fold-search} to @code{nil}, and |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
1093 it uses the @code{replacements} without altering the case of them. |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
1094 |
6552 | 1095 Normally, the keymap @code{query-replace-map} defines the possible user |
8469 | 1096 responses for queries. The argument @var{map}, if non-@code{nil}, is a |
1097 keymap to use instead of @code{query-replace-map}. | |
6552 | 1098 @end defun |
1099 | |
1100 @defvar query-replace-map | |
1101 This variable holds a special keymap that defines the valid user | |
1102 responses for @code{query-replace} and related functions, as well as | |
1103 @code{y-or-n-p} and @code{map-y-or-n-p}. It is unusual in two ways: | |
1104 | |
1105 @itemize @bullet | |
1106 @item | |
1107 The ``key bindings'' are not commands, just symbols that are meaningful | |
1108 to the functions that use this map. | |
1109 | |
1110 @item | |
21007
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
1111 Prefix keys are not supported; each key binding must be for a |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
1112 single-event key sequence. This is because the functions don't use |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
1113 @code{read-key-sequence} to get the input; instead, they read a single |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
1114 event and look it up ``by hand.'' |
6552 | 1115 @end itemize |
1116 @end defvar | |
1117 | |
1118 Here are the meaningful ``bindings'' for @code{query-replace-map}. | |
1119 Several of them are meaningful only for @code{query-replace} and | |
1120 friends. | |
1121 | |
1122 @table @code | |
1123 @item act | |
1124 Do take the action being considered---in other words, ``yes.'' | |
1125 | |
1126 @item skip | |
1127 Do not take action for this question---in other words, ``no.'' | |
1128 | |
1129 @item exit | |
8469 | 1130 Answer this question ``no,'' and give up on the entire series of |
1131 questions, assuming that the answers will be ``no.'' | |
6552 | 1132 |
1133 @item act-and-exit | |
8469 | 1134 Answer this question ``yes,'' and give up on the entire series of |
1135 questions, assuming that subsequent answers will be ``no.'' | |
6552 | 1136 |
1137 @item act-and-show | |
1138 Answer this question ``yes,'' but show the results---don't advance yet | |
1139 to the next question. | |
1140 | |
1141 @item automatic | |
1142 Answer this question and all subsequent questions in the series with | |
1143 ``yes,'' without further user interaction. | |
1144 | |
1145 @item backup | |
1146 Move back to the previous place that a question was asked about. | |
1147 | |
1148 @item edit | |
1149 Enter a recursive edit to deal with this question---instead of any | |
1150 other action that would normally be taken. | |
1151 | |
1152 @item delete-and-edit | |
1153 Delete the text being considered, then enter a recursive edit to replace | |
1154 it. | |
1155 | |
1156 @item recenter | |
1157 Redisplay and center the window, then ask the same question again. | |
1158 | |
1159 @item quit | |
1160 Perform a quit right away. Only @code{y-or-n-p} and related functions | |
1161 use this answer. | |
1162 | |
1163 @item help | |
1164 Display some help, then ask again. | |
1165 @end table | |
1166 | |
1167 @node Match Data | |
1168 @section The Match Data | |
1169 @cindex match data | |
1170 | |
25089 | 1171 Emacs keeps track of the start and end positions of the segments of |
6552 | 1172 text found during a regular expression search. This means, for example, |
1173 that you can search for a complex pattern, such as a date in an Rmail | |
1174 message, and then extract parts of the match under control of the | |
1175 pattern. | |
1176 | |
1177 Because the match data normally describe the most recent search only, | |
1178 you must be careful not to do another search inadvertently between the | |
1179 search you wish to refer back to and the use of the match data. If you | |
1180 can't avoid another intervening search, you must save and restore the | |
1181 match data around it, to prevent it from being overwritten. | |
1182 | |
1183 @menu | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1184 * Replacing Match:: Replacing a substring that was matched. |
6552 | 1185 * Simple Match Data:: Accessing single items of match data, |
1186 such as where a particular subexpression started. | |
1187 * Entire Match Data:: Accessing the entire match data at once, as a list. | |
1188 * Saving Match Data:: Saving and restoring the match data. | |
1189 @end menu | |
1190 | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1191 @node Replacing Match |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
1192 @subsection Replacing the Text that Matched |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1193 |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1194 This function replaces the text matched by the last search with |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1195 @var{replacement}. |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1196 |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1197 @cindex case in replacements |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1198 @defun replace-match replacement &optional fixedcase literal string subexp |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1199 This function replaces the text in the buffer (or in @var{string}) that |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1200 was matched by the last search. It replaces that text with |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1201 @var{replacement}. |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1202 |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1203 If you did the last search in a buffer, you should specify @code{nil} |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1204 for @var{string}. Then @code{replace-match} does the replacement by |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1205 editing the buffer; it leaves point at the end of the replacement text, |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1206 and returns @code{t}. |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1207 |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1208 If you did the search in a string, pass the same string as @var{string}. |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1209 Then @code{replace-match} does the replacement by constructing and |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1210 returning a new string. |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1211 |
45104
441493d3bba0
Clarify how replace-match does case conversion.
Richard M. Stallman <rms@gnu.org>
parents:
41939
diff
changeset
|
1212 If @var{fixedcase} is non-@code{nil}, then @code{replace-match} uses |
441493d3bba0
Clarify how replace-match does case conversion.
Richard M. Stallman <rms@gnu.org>
parents:
41939
diff
changeset
|
1213 the replacement text without case conversion; otherwise, it converts |
441493d3bba0
Clarify how replace-match does case conversion.
Richard M. Stallman <rms@gnu.org>
parents:
41939
diff
changeset
|
1214 the replacement text depending upon the capitalization of the text to |
441493d3bba0
Clarify how replace-match does case conversion.
Richard M. Stallman <rms@gnu.org>
parents:
41939
diff
changeset
|
1215 be replaced. If the original text is all upper case, this converts |
441493d3bba0
Clarify how replace-match does case conversion.
Richard M. Stallman <rms@gnu.org>
parents:
41939
diff
changeset
|
1216 the replacement text to upper case. If all words of the original text |
441493d3bba0
Clarify how replace-match does case conversion.
Richard M. Stallman <rms@gnu.org>
parents:
41939
diff
changeset
|
1217 are capitalized, this capitalizes all the words of the replacement |
441493d3bba0
Clarify how replace-match does case conversion.
Richard M. Stallman <rms@gnu.org>
parents:
41939
diff
changeset
|
1218 text. If all the words are one-letter and they are all upper case, |
441493d3bba0
Clarify how replace-match does case conversion.
Richard M. Stallman <rms@gnu.org>
parents:
41939
diff
changeset
|
1219 they are treated as capitalized words rather than all-upper-case |
441493d3bba0
Clarify how replace-match does case conversion.
Richard M. Stallman <rms@gnu.org>
parents:
41939
diff
changeset
|
1220 words. |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1221 |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1222 If @var{literal} is non-@code{nil}, then @var{replacement} is inserted |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1223 exactly as it is, the only alterations being case changes as needed. |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1224 If it is @code{nil} (the default), then the character @samp{\} is treated |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1225 specially. If a @samp{\} appears in @var{replacement}, then it must be |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1226 part of one of the following sequences: |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1227 |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1228 @table @asis |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1229 @item @samp{\&} |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1230 @cindex @samp{&} in replacement |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1231 @samp{\&} stands for the entire text being replaced. |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1232 |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1233 @item @samp{\@var{n}} |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1234 @cindex @samp{\@var{n}} in replacement |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1235 @samp{\@var{n}}, where @var{n} is a digit, stands for the text that |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1236 matched the @var{n}th subexpression in the original regexp. |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1237 Subexpressions are those expressions grouped inside @samp{\(@dots{}\)}. |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1238 |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1239 @item @samp{\\} |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1240 @cindex @samp{\} in replacement |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1241 @samp{\\} stands for a single @samp{\} in the replacement text. |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1242 @end table |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1243 |
45104
441493d3bba0
Clarify how replace-match does case conversion.
Richard M. Stallman <rms@gnu.org>
parents:
41939
diff
changeset
|
1244 These substitutions occur after case conversion, if any, |
441493d3bba0
Clarify how replace-match does case conversion.
Richard M. Stallman <rms@gnu.org>
parents:
41939
diff
changeset
|
1245 so the strings they substitute are never case-converted. |
441493d3bba0
Clarify how replace-match does case conversion.
Richard M. Stallman <rms@gnu.org>
parents:
41939
diff
changeset
|
1246 |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1247 If @var{subexp} is non-@code{nil}, that says to replace just |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1248 subexpression number @var{subexp} of the regexp that was matched, not |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1249 the entire match. For example, after matching @samp{foo \(ba*r\)}, |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1250 calling @code{replace-match} with 1 as @var{subexp} means to replace |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1251 just the text that matched @samp{\(ba*r\)}. |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1252 @end defun |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1253 |
6552 | 1254 @node Simple Match Data |
1255 @subsection Simple Match Data Access | |
1256 | |
12067 | 1257 This section explains how to use the match data to find out what was |
1258 matched by the last search or match operation. | |
1259 | |
1260 You can ask about the entire matching text, or about a particular | |
1261 parenthetical subexpression of a regular expression. The @var{count} | |
1262 argument in the functions below specifies which. If @var{count} is | |
1263 zero, you are asking about the entire match. If @var{count} is | |
1264 positive, it specifies which subexpression you want. | |
1265 | |
1266 Recall that the subexpressions of a regular expression are those | |
1267 expressions grouped with escaped parentheses, @samp{\(@dots{}\)}. The | |
1268 @var{count}th subexpression is found by counting occurrences of | |
1269 @samp{\(} from the beginning of the whole regular expression. The first | |
1270 subexpression is numbered 1, the second 2, and so on. Only regular | |
1271 expressions can have subexpressions---after a simple string search, the | |
1272 only information available is about the entire match. | |
1273 | |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1274 A search which fails may or may not alter the match data. In the |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1275 past, a failing search did not do this, but we may change it in the |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1276 future. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1277 |
12067 | 1278 @defun match-string count &optional in-string |
1279 This function returns, as a string, the text matched in the last search | |
1280 or match operation. It returns the entire text if @var{count} is zero, | |
1281 or just the portion corresponding to the @var{count}th parenthetical | |
39166
f0bfa8a7d472
Explain clearly what \digit does when that grouping
Richard M. Stallman <rms@gnu.org>
parents:
38927
diff
changeset
|
1282 subexpression, if @var{count} is positive. |
12067 | 1283 |
1284 If the last such operation was done against a string with | |
1285 @code{string-match}, then you should pass the same string as the | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1286 argument @var{in-string}. After a buffer search or match, |
12067 | 1287 you should omit @var{in-string} or pass @code{nil} for it; but you |
1288 should make sure that the current buffer when you call | |
1289 @code{match-string} is the one in which you did the searching or | |
1290 matching. | |
39166
f0bfa8a7d472
Explain clearly what \digit does when that grouping
Richard M. Stallman <rms@gnu.org>
parents:
38927
diff
changeset
|
1291 |
f0bfa8a7d472
Explain clearly what \digit does when that grouping
Richard M. Stallman <rms@gnu.org>
parents:
38927
diff
changeset
|
1292 The value is @code{nil} if @var{count} is out of range, or for a |
f0bfa8a7d472
Explain clearly what \digit does when that grouping
Richard M. Stallman <rms@gnu.org>
parents:
38927
diff
changeset
|
1293 subexpression inside a @samp{\|} alternative that wasn't used or a |
f0bfa8a7d472
Explain clearly what \digit does when that grouping
Richard M. Stallman <rms@gnu.org>
parents:
38927
diff
changeset
|
1294 repetition that repeated zero times. |
12067 | 1295 @end defun |
6552 | 1296 |
26288 | 1297 @defun match-string-no-properties count &optional in-string |
21007
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
1298 This function is like @code{match-string} except that the result |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
1299 has no text properties. |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
1300 @end defun |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
1301 |
6552 | 1302 @defun match-beginning count |
1303 This function returns the position of the start of text matched by the | |
1304 last regular expression searched for, or a subexpression of it. | |
1305 | |
8469 | 1306 If @var{count} is zero, then the value is the position of the start of |
12125
995be67f3fd1
updates for version 19.29.
Melissa Weisshaus <melissa@gnu.org>
parents:
12098
diff
changeset
|
1307 the entire match. Otherwise, @var{count} specifies a subexpression in |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1308 the regular expression, and the value of the function is the starting |
12067 | 1309 position of the match for that subexpression. |
6552 | 1310 |
12067 | 1311 The value is @code{nil} for a subexpression inside a @samp{\|} |
39166
f0bfa8a7d472
Explain clearly what \digit does when that grouping
Richard M. Stallman <rms@gnu.org>
parents:
38927
diff
changeset
|
1312 alternative that wasn't used or a repetition that repeated zero times. |
6552 | 1313 @end defun |
1314 | |
1315 @defun match-end count | |
12067 | 1316 This function is like @code{match-beginning} except that it returns the |
1317 position of the end of the match, rather than the position of the | |
1318 beginning. | |
6552 | 1319 @end defun |
1320 | |
1321 Here is an example of using the match data, with a comment showing the | |
1322 positions within the text: | |
1323 | |
1324 @example | |
1325 @group | |
1326 (string-match "\\(qu\\)\\(ick\\)" | |
1327 "The quick fox jumped quickly.") | |
49600
23a1cea22d13
Trailing whitespace deleted.
Juanma Barranquero <lekktu@gmail.com>
parents:
48701
diff
changeset
|
1328 ;0123456789 |
6552 | 1329 @result{} 4 |
1330 @end group | |
1331 | |
1332 @group | |
12067 | 1333 (match-string 0 "The quick fox jumped quickly.") |
1334 @result{} "quick" | |
1335 (match-string 1 "The quick fox jumped quickly.") | |
1336 @result{} "qu" | |
1337 (match-string 2 "The quick fox jumped quickly.") | |
1338 @result{} "ick" | |
1339 @end group | |
1340 | |
1341 @group | |
6552 | 1342 (match-beginning 1) ; @r{The beginning of the match} |
1343 @result{} 4 ; @r{with @samp{qu} is at index 4.} | |
1344 @end group | |
1345 | |
1346 @group | |
1347 (match-beginning 2) ; @r{The beginning of the match} | |
1348 @result{} 6 ; @r{with @samp{ick} is at index 6.} | |
1349 @end group | |
1350 | |
1351 @group | |
1352 (match-end 1) ; @r{The end of the match} | |
1353 @result{} 6 ; @r{with @samp{qu} is at index 6.} | |
1354 | |
1355 (match-end 2) ; @r{The end of the match} | |
1356 @result{} 9 ; @r{with @samp{ick} is at index 9.} | |
1357 @end group | |
1358 @end example | |
1359 | |
1360 Here is another example. Point is initially located at the beginning | |
1361 of the line. Searching moves point to between the space and the word | |
1362 @samp{in}. The beginning of the entire match is at the 9th character of | |
1363 the buffer (@samp{T}), and the beginning of the match for the first | |
1364 subexpression is at the 13th character (@samp{c}). | |
1365 | |
1366 @example | |
1367 @group | |
1368 (list | |
1369 (re-search-forward "The \\(cat \\)") | |
1370 (match-beginning 0) | |
1371 (match-beginning 1)) | |
8469 | 1372 @result{} (9 9 13) |
6552 | 1373 @end group |
1374 | |
1375 @group | |
1376 ---------- Buffer: foo ---------- | |
1377 I read "The cat @point{}in the hat comes back" twice. | |
1378 ^ ^ | |
1379 9 13 | |
1380 ---------- Buffer: foo ---------- | |
1381 @end group | |
1382 @end example | |
1383 | |
1384 @noindent | |
1385 (In this case, the index returned is a buffer position; the first | |
1386 character of the buffer counts as 1.) | |
1387 | |
1388 @node Entire Match Data | |
1389 @subsection Accessing the Entire Match Data | |
1390 | |
1391 The functions @code{match-data} and @code{set-match-data} read or | |
1392 write the entire match data, all at once. | |
1393 | |
1394 @defun match-data | |
1395 This function returns a newly constructed list containing all the | |
1396 information on what text the last search matched. Element zero is the | |
1397 position of the beginning of the match for the whole expression; element | |
1398 one is the position of the end of the match for the expression. The | |
1399 next two elements are the positions of the beginning and end of the | |
1400 match for the first subexpression, and so on. In general, element | |
27193 | 1401 @ifnottex |
6552 | 1402 number 2@var{n} |
27193 | 1403 @end ifnottex |
6552 | 1404 @tex |
1405 number {\mathsurround=0pt $2n$} | |
1406 @end tex | |
1407 corresponds to @code{(match-beginning @var{n})}; and | |
1408 element | |
27193 | 1409 @ifnottex |
6552 | 1410 number 2@var{n} + 1 |
27193 | 1411 @end ifnottex |
6552 | 1412 @tex |
1413 number {\mathsurround=0pt $2n+1$} | |
1414 @end tex | |
1415 corresponds to @code{(match-end @var{n})}. | |
1416 | |
1417 All the elements are markers or @code{nil} if matching was done on a | |
1418 buffer, and all are integers or @code{nil} if matching was done on a | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1419 string with @code{string-match}. |
6552 | 1420 |
1421 As always, there must be no possibility of intervening searches between | |
1422 the call to a search function and the call to @code{match-data} that is | |
1423 intended to access the match data for that search. | |
1424 | |
1425 @example | |
1426 @group | |
1427 (match-data) | |
1428 @result{} (#<marker at 9 in foo> | |
1429 #<marker at 17 in foo> | |
1430 #<marker at 13 in foo> | |
1431 #<marker at 17 in foo>) | |
1432 @end group | |
1433 @end example | |
1434 @end defun | |
1435 | |
1436 @defun set-match-data match-list | |
1437 This function sets the match data from the elements of @var{match-list}, | |
1438 which should be a list that was the value of a previous call to | |
41939
e9a4c1f03019
Minor clarifications for search-forward and set-match-data.
Richard M. Stallman <rms@gnu.org>
parents:
39166
diff
changeset
|
1439 @code{match-data}. (More precisely, anything that has the same format |
e9a4c1f03019
Minor clarifications for search-forward and set-match-data.
Richard M. Stallman <rms@gnu.org>
parents:
39166
diff
changeset
|
1440 will work.) |
6552 | 1441 |
1442 If @var{match-list} refers to a buffer that doesn't exist, you don't get | |
1443 an error; that sets the match data in a meaningless but harmless way. | |
1444 | |
1445 @findex store-match-data | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1446 @code{store-match-data} is a semi-obsolete alias for @code{set-match-data}. |
6552 | 1447 @end defun |
1448 | |
1449 @node Saving Match Data | |
1450 @subsection Saving and Restoring the Match Data | |
1451 | |
10038
6b8e51b286c6
Filters and sentinels now save match data themselves.
Richard M. Stallman <rms@gnu.org>
parents:
8469
diff
changeset
|
1452 When you call a function that may do a search, you may need to save |
6b8e51b286c6
Filters and sentinels now save match data themselves.
Richard M. Stallman <rms@gnu.org>
parents:
8469
diff
changeset
|
1453 and restore the match data around that call, if you want to preserve the |
6b8e51b286c6
Filters and sentinels now save match data themselves.
Richard M. Stallman <rms@gnu.org>
parents:
8469
diff
changeset
|
1454 match data from an earlier search for later use. Here is an example |
6b8e51b286c6
Filters and sentinels now save match data themselves.
Richard M. Stallman <rms@gnu.org>
parents:
8469
diff
changeset
|
1455 that shows the problem that arises if you fail to save the match data: |
6552 | 1456 |
1457 @example | |
1458 @group | |
1459 (re-search-forward "The \\(cat \\)") | |
1460 @result{} 48 | |
1461 (foo) ; @r{Perhaps @code{foo} does} | |
1462 ; @r{more searching.} | |
1463 (match-end 0) | |
1464 @result{} 61 ; @r{Unexpected result---not 48!} | |
1465 @end group | |
1466 @end example | |
1467 | |
10038
6b8e51b286c6
Filters and sentinels now save match data themselves.
Richard M. Stallman <rms@gnu.org>
parents:
8469
diff
changeset
|
1468 You can save and restore the match data with @code{save-match-data}: |
6552 | 1469 |
12098 | 1470 @defmac save-match-data body@dots{} |
22252
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
1471 This macro executes @var{body}, saving and restoring the match |
10038
6b8e51b286c6
Filters and sentinels now save match data themselves.
Richard M. Stallman <rms@gnu.org>
parents:
8469
diff
changeset
|
1472 data around it. |
12098 | 1473 @end defmac |
6552 | 1474 |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1475 You could use @code{set-match-data} together with @code{match-data} to |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1476 imitate the effect of the special form @code{save-match-data}. Here is |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1477 how: |
6552 | 1478 |
1479 @example | |
1480 @group | |
1481 (let ((data (match-data))) | |
1482 (unwind-protect | |
21007
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
1483 @dots{} ; @r{Ok to change the original match data.} |
6552 | 1484 (set-match-data data))) |
1485 @end group | |
1486 @end example | |
1487 | |
10038
6b8e51b286c6
Filters and sentinels now save match data themselves.
Richard M. Stallman <rms@gnu.org>
parents:
8469
diff
changeset
|
1488 Emacs automatically saves and restores the match data when it runs |
6b8e51b286c6
Filters and sentinels now save match data themselves.
Richard M. Stallman <rms@gnu.org>
parents:
8469
diff
changeset
|
1489 process filter functions (@pxref{Filter Functions}) and process |
6b8e51b286c6
Filters and sentinels now save match data themselves.
Richard M. Stallman <rms@gnu.org>
parents:
8469
diff
changeset
|
1490 sentinels (@pxref{Sentinels}). |
6b8e51b286c6
Filters and sentinels now save match data themselves.
Richard M. Stallman <rms@gnu.org>
parents:
8469
diff
changeset
|
1491 |
6552 | 1492 @ignore |
1493 Here is a function which restores the match data provided the buffer | |
1494 associated with it still exists. | |
1495 | |
1496 @smallexample | |
1497 @group | |
1498 (defun restore-match-data (data) | |
1499 @c It is incorrect to split the first line of a doc string. | |
1500 @c If there's a problem here, it should be solved in some other way. | |
1501 "Restore the match data DATA unless the buffer is missing." | |
1502 (catch 'foo | |
1503 (let ((d data)) | |
1504 @end group | |
1505 (while d | |
1506 (and (car d) | |
1507 (null (marker-buffer (car d))) | |
1508 @group | |
1509 ;; @file{match-data} @r{buffer is deleted.} | |
1510 (throw 'foo nil)) | |
1511 (setq d (cdr d))) | |
1512 (set-match-data data)))) | |
1513 @end group | |
1514 @end smallexample | |
1515 @end ignore | |
1516 | |
1517 @node Searching and Case | |
1518 @section Searching and Case | |
1519 @cindex searching and case | |
1520 | |
1521 By default, searches in Emacs ignore the case of the text they are | |
1522 searching through; if you specify searching for @samp{FOO}, then | |
21007
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
1523 @samp{Foo} or @samp{foo} is also considered a match. This applies to |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
1524 regular expressions, too; thus, @samp{[aB]} would match @samp{a} or |
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
1525 @samp{A} or @samp{b} or @samp{B}. |
6552 | 1526 |
1527 If you do not want this feature, set the variable | |
1528 @code{case-fold-search} to @code{nil}. Then all letters must match | |
8469 | 1529 exactly, including case. This is a buffer-local variable; altering the |
1530 variable affects only the current buffer. (@xref{Intro to | |
6552 | 1531 Buffer-Local}.) Alternatively, you may change the value of |
1532 @code{default-case-fold-search}, which is the default value of | |
1533 @code{case-fold-search} for buffers that do not override it. | |
1534 | |
1535 Note that the user-level incremental search feature handles case | |
1536 distinctions differently. When given a lower case letter, it looks for | |
1537 a match of either case, but when given an upper case letter, it looks | |
1538 for an upper case letter only. But this has nothing to do with the | |
21007
66d807bdc5b4
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
17886
diff
changeset
|
1539 searching functions used in Lisp code. |
6552 | 1540 |
1541 @defopt case-replace | |
8469 | 1542 This variable determines whether the replacement functions should |
1543 preserve case. If the variable is @code{nil}, that means to use the | |
1544 replacement text verbatim. A non-@code{nil} value means to convert the | |
1545 case of the replacement text according to the text being replaced. | |
1546 | |
25751
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
1547 This variable is used by passing it as an argument to the function |
467b88fab665
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
25089
diff
changeset
|
1548 @code{replace-match}. @xref{Replacing Match}. |
6552 | 1549 @end defopt |
1550 | |
1551 @defopt case-fold-search | |
1552 This buffer-local variable determines whether searches should ignore | |
1553 case. If the variable is @code{nil} they do not ignore case; otherwise | |
1554 they do ignore case. | |
1555 @end defopt | |
1556 | |
1557 @defvar default-case-fold-search | |
1558 The value of this variable is the default value for | |
1559 @code{case-fold-search} in buffers that do not override it. This is the | |
1560 same as @code{(default-value 'case-fold-search)}. | |
1561 @end defvar | |
1562 | |
1563 @node Standard Regexps | |
1564 @section Standard Regular Expressions Used in Editing | |
1565 @cindex regexps used standardly in editing | |
1566 @cindex standard regexps used in editing | |
1567 | |
1568 This section describes some variables that hold regular expressions | |
1569 used for certain purposes in editing: | |
1570 | |
1571 @defvar page-delimiter | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1572 This is the regular expression describing line-beginnings that separate |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1573 pages. The default value is @code{"^\014"} (i.e., @code{"^^L"} or |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1574 @code{"^\C-l"}); this matches a line that starts with a formfeed |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21007
diff
changeset
|
1575 character. |
6552 | 1576 @end defvar |
1577 | |
12067 | 1578 The following two regular expressions should @emph{not} assume the |
1579 match always starts at the beginning of a line; they should not use | |
1580 @samp{^} to anchor the match. Most often, the paragraph commands do | |
1581 check for a match only at the beginning of a line, which means that | |
12098 | 1582 @samp{^} would be superfluous. When there is a nonzero left margin, |
1583 they accept matches that start after the left margin. In that case, a | |
1584 @samp{^} would be incorrect. However, a @samp{^} is harmless in modes | |
1585 where a left margin is never used. | |
12067 | 1586 |
6552 | 1587 @defvar paragraph-separate |
1588 This is the regular expression for recognizing the beginning of a line | |
1589 that separates paragraphs. (If you change this, you may have to | |
8469 | 1590 change @code{paragraph-start} also.) The default value is |
12067 | 1591 @w{@code{"[@ \t\f]*$"}}, which matches a line that consists entirely of |
1592 spaces, tabs, and form feeds (after its left margin). | |
6552 | 1593 @end defvar |
1594 | |
1595 @defvar paragraph-start | |
1596 This is the regular expression for recognizing the beginning of a line | |
1597 that starts @emph{or} separates paragraphs. The default value is | |
12067 | 1598 @w{@code{"[@ \t\n\f]"}}, which matches a line starting with a space, tab, |
1599 newline, or form feed (after its left margin). | |
6552 | 1600 @end defvar |
1601 | |
1602 @defvar sentence-end | |
1603 This is the regular expression describing the end of a sentence. (All | |
1604 paragraph boundaries also end sentences, regardless.) The default value | |
1605 is: | |
1606 | |
1607 @example | |
8469 | 1608 "[.?!][]\"')@}]*\\($\\| $\\|\t\\| \\)[ \t\n]*" |
6552 | 1609 @end example |
1610 | |
8469 | 1611 This means a period, question mark or exclamation mark, followed |
1612 optionally by a closing parenthetical character, followed by tabs, | |
1613 spaces or new lines. | |
6552 | 1614 |
1615 For a detailed explanation of this regular expression, see @ref{Regexp | |
1616 Example}. | |
1617 @end defvar |