comparison lispref/searching.texi @ 6552:3b84ed22f747

Initial revision
author Richard M. Stallman <rms@gnu.org>
date Mon, 28 Mar 1994 05:41:05 +0000
parents
children 075343a6b32b
comparison
equal deleted inserted replaced
6551:99ca8123a3ca 6552:3b84ed22f747
1 @c -*-texinfo-*-
2 @c This is part of the GNU Emacs Lisp Reference Manual.
3 @c Copyright (C) 1990, 1991, 1992, 1993, 1994 Free Software Foundation, Inc.
4 @c See the file elisp.texi for copying conditions.
5 @setfilename ../info/searching
6 @node Searching and Matching, Syntax Tables, Text, Top
7 @chapter Searching and Matching
8 @cindex searching
9
10 GNU Emacs provides two ways to search through a buffer for specified
11 text: exact string searches and regular expression searches. After a
12 regular expression search, you can examine the @dfn{match data} to
13 determine which text matched the whole regular expression or various
14 portions of it.
15
16 @menu
17 * String Search:: Search for an exact match.
18 * Regular Expressions:: Describing classes of strings.
19 * Regexp Search:: Searching for a match for a regexp.
20 * Search and Replace:: Internals of @code{query-replace}.
21 * Match Data:: Finding out which part of the text matched
22 various parts of a regexp, after regexp search.
23 * Searching and Case:: Case-independent or case-significant searching.
24 * Standard Regexps:: Useful regexps for finding sentences, pages,...
25 @end menu
26
27 The @samp{skip-chars@dots{}} functions also perform a kind of searching.
28 @xref{Skipping Characters}.
29
30 @node String Search
31 @section Searching for Strings
32 @cindex string search
33
34 These are the primitive functions for searching through the text in a
35 buffer. They are meant for use in programs, but you may call them
36 interactively. If you do so, they prompt for the search string;
37 @var{limit} and @var{noerror} are set to @code{nil}, and @var{repeat}
38 is set to 1.
39
40 @deffn Command search-forward string &optional limit noerror repeat
41 This function searches forward from point for an exact match for
42 @var{string}. If successful, it sets point to the end of the occurrence
43 found, and returns the new value of point. If no match is found, the
44 value and side effects depend on @var{noerror} (see below).
45 @c Emacs 19 feature
46
47 In the following example, point is initially at the beginning of the
48 line. Then @code{(search-forward "fox")} moves point after the last
49 letter of @samp{fox}:
50
51 @example
52 @group
53 ---------- Buffer: foo ----------
54 @point{}The quick brown fox jumped over the lazy dog.
55 ---------- Buffer: foo ----------
56 @end group
57
58 @group
59 (search-forward "fox")
60 @result{} 20
61
62 ---------- Buffer: foo ----------
63 The quick brown fox@point{} jumped over the lazy dog.
64 ---------- Buffer: foo ----------
65 @end group
66 @end example
67
68 The argument @var{limit} specifies the upper bound to the search. (It
69 must be a position in the current buffer.) No match extending after
70 that position is accepted. If @var{limit} is omitted or @code{nil}, it
71 defaults to the end of the accessible portion of the buffer.
72
73 @kindex search-failed
74 What happens when the search fails depends on the value of
75 @var{noerror}. If @var{noerror} is @code{nil}, a @code{search-failed}
76 error is signaled. If @var{noerror} is @code{t}, @code{search-forward}
77 returns @code{nil} and does nothing. If @var{noerror} is neither
78 @code{nil} nor @code{t}, then @code{search-forward} moves point to the
79 upper bound and returns @code{nil}. (It would be more consistent now
80 to return the new position of point in that case, but some programs
81 may depend on a value of @code{nil}.)
82
83 If @var{repeat} is non-@code{nil}, then the search is repeated that
84 many times. Point is positioned at the end of the last match.
85 @end deffn
86
87 @deffn Command search-backward string &optional limit noerror repeat
88 This function searches backward from point for @var{string}. It is
89 just like @code{search-forward} except that it searches backwards and
90 leaves point at the beginning of the match.
91 @end deffn
92
93 @deffn Command word-search-forward string &optional limit noerror repeat
94 @cindex word search
95 This function searches forward from point for a ``word'' match for
96 @var{string}. If it finds a match, it sets point to the end of the
97 match found, and returns the new value of point.
98 @c Emacs 19 feature
99
100 Word matching regards @var{string} as a sequence of words, disregarding
101 punctuation that separates them. It searches the buffer for the same
102 sequence of words. Each word must be distinct in the buffer (searching
103 for the word @samp{ball} does not match the word @samp{balls}), but the
104 details of punctuation and spacing are ignored (searching for @samp{ball
105 boy} does match @samp{ball. Boy!}).
106
107 In this example, point is initially at the beginning of the buffer; the
108 search leaves it between the @samp{y} and the @samp{!}.
109
110 @example
111 @group
112 ---------- Buffer: foo ----------
113 @point{}He said "Please! Find
114 the ball boy!"
115 ---------- Buffer: foo ----------
116 @end group
117
118 @group
119 (word-search-forward "Please find the ball, boy.")
120 @result{} 35
121
122 ---------- Buffer: foo ----------
123 He said "Please! Find
124 the ball boy@point{}!"
125 ---------- Buffer: foo ----------
126 @end group
127 @end example
128
129 If @var{limit} is non-@code{nil} (it must be a position in the current
130 buffer), then it is the upper bound to the search. The match found must
131 not extend after that position.
132
133 If @var{noerror} is @code{nil}, then @code{word-search-forward} signals
134 an error if the search fails. If @var{noerror} is @code{t}, then it
135 returns @code{nil} instead of signaling an error. If @var{noerror} is
136 neither @code{nil} nor @code{t}, it moves point to @var{limit} (or the
137 end of the buffer) and returns @code{nil}.
138
139 If @var{repeat} is non-@code{nil}, then the search is repeated that many
140 times. Point is positioned at the end of the last match.
141 @end deffn
142
143 @deffn Command word-search-backward string &optional limit noerror repeat
144 This function searches backward from point for a word match to
145 @var{string}. This function is just like @code{word-search-forward}
146 except that it searches backward and normally leaves point at the
147 beginning of the match.
148 @end deffn
149
150 @node Regular Expressions
151 @section Regular Expressions
152 @cindex regular expression
153 @cindex regexp
154
155 A @dfn{regular expression} (@dfn{regexp}, for short) is a pattern that
156 denotes a (possibly infinite) set of strings. Searching for matches for
157 a regexp is a very powerful operation. This section explains how to write
158 regexps; the following section says how to search for them.
159
160 @menu
161 * Syntax of Regexps:: Rules for writing regular expressions.
162 * Regexp Example:: Illustrates regular expression syntax.
163 @end menu
164
165 @node Syntax of Regexps
166 @subsection Syntax of Regular Expressions
167
168 Regular expressions have a syntax in which a few characters are special
169 constructs and the rest are @dfn{ordinary}. An ordinary character is a
170 simple regular expression which matches that character and nothing else.
171 The special characters are @samp{$}, @samp{^}, @samp{.}, @samp{*},
172 @samp{+}, @samp{?}, @samp{[}, @samp{]} and @samp{\}; no new special
173 characters will be defined in the future. Any other character appearing
174 in a regular expression is ordinary, unless a @samp{\} precedes it.
175
176 For example, @samp{f} is not a special character, so it is ordinary, and
177 therefore @samp{f} is a regular expression that matches the string
178 @samp{f} and no other string. (It does @emph{not} match the string
179 @samp{ff}.) Likewise, @samp{o} is a regular expression that matches
180 only @samp{o}.@refill
181
182 Any two regular expressions @var{a} and @var{b} can be concatenated. The
183 result is a regular expression which matches a string if @var{a} matches
184 some amount of the beginning of that string and @var{b} matches the rest of
185 the string.@refill
186
187 As a simple example, we can concatenate the regular expressions @samp{f}
188 and @samp{o} to get the regular expression @samp{fo}, which matches only
189 the string @samp{fo}. Still trivial. To do something more powerful, you
190 need to use one of the special characters. Here is a list of them:
191
192 @need 1200
193 @table @kbd
194 @item .@: @r{(Period)}
195 @cindex @samp{.} in regexp
196 is a special character that matches any single character except a newline.
197 Using concatenation, we can make regular expressions like @samp{a.b}, which
198 matches any three-character string that begins with @samp{a} and ends with
199 @samp{b}.@refill
200
201 @item *
202 @cindex @samp{*} in regexp
203 is not a construct by itself; it is a suffix operator that means to
204 repeat the preceding regular expression as many times as possible. In
205 @samp{fo*}, the @samp{*} applies to the @samp{o}, so @samp{fo*} matches
206 one @samp{f} followed by any number of @samp{o}s. The case of zero
207 @samp{o}s is allowed: @samp{fo*} does match @samp{f}.@refill
208
209 @samp{*} always applies to the @emph{smallest} possible preceding
210 expression. Thus, @samp{fo*} has a repeating @samp{o}, not a
211 repeating @samp{fo}.@refill
212
213 The matcher processes a @samp{*} construct by matching, immediately,
214 as many repetitions as can be found. Then it continues with the rest
215 of the pattern. If that fails, backtracking occurs, discarding some
216 of the matches of the @samp{*}-modified construct in case that makes
217 it possible to match the rest of the pattern. For example, in matching
218 @samp{ca*ar} against the string @samp{caaar}, the @samp{a*} first
219 tries to match all three @samp{a}s; but the rest of the pattern is
220 @samp{ar} and there is only @samp{r} left to match, so this try fails.
221 The next alternative is for @samp{a*} to match only two @samp{a}s.
222 With this choice, the rest of the regexp matches successfully.@refill
223
224 @item +
225 @cindex @samp{+} in regexp
226 is a suffix operator similar to @samp{*} except that the preceding
227 expression must match at least once. So, for example, @samp{ca+r}
228 matches the strings @samp{car} and @samp{caaaar} but not the string
229 @samp{cr}, whereas @samp{ca*r} matches all three strings.
230
231 @item ?
232 @cindex @samp{?} in regexp
233 is a suffix operator similar to @samp{*} except that the preceding
234 expression can match either once or not at all. For example,
235 @samp{ca?r} matches @samp{car} or @samp{cr}, but does not match anyhing
236 else.
237
238 @item [ @dots{} ]
239 @cindex character set (in regexp)
240 @cindex @samp{[} in regexp
241 @cindex @samp{]} in regexp
242 @samp{[} begins a @dfn{character set}, which is terminated by a
243 @samp{]}. In the simplest case, the characters between the two brackets
244 form the set. Thus, @samp{[ad]} matches either one @samp{a} or one
245 @samp{d}, and @samp{[ad]*} matches any string composed of just @samp{a}s
246 and @samp{d}s (including the empty string), from which it follows that
247 @samp{c[ad]*r} matches @samp{cr}, @samp{car}, @samp{cdr},
248 @samp{caddaar}, etc.@refill
249
250 The usual regular expression special characters are not special inside a
251 character set. A completely different set of special characters exists
252 inside character sets: @samp{]}, @samp{-} and @samp{^}.@refill
253
254 @samp{-} is used for ranges of characters. To write a range, write two
255 characters with a @samp{-} between them. Thus, @samp{[a-z]} matches any
256 lower case letter. Ranges may be intermixed freely with individual
257 characters, as in @samp{[a-z$%.]}, which matches any lower case letter
258 or @samp{$}, @samp{%} or a period.@refill
259
260 To include a @samp{]} in a character set, make it the first character.
261 For example, @samp{[]a]} matches @samp{]} or @samp{a}. To include a
262 @samp{-}, write @samp{-} as the first character in the set, or put
263 immediately after a range. (You can replace one individual character
264 @var{c} with the range @samp{@var{c}-@var{c}} to make a place to put the
265 @samp{-}). There is no way to write a set containing just @samp{-} and
266 @samp{]}.
267
268 To include @samp{^} in a set, put it anywhere but at the beginning of
269 the set.
270
271 @item [^ @dots{} ]
272 @cindex @samp{^} in regexp
273 @samp{[^} begins a @dfn{complement character set}, which matches any
274 character except the ones specified. Thus, @samp{[^a-z0-9A-Z]}
275 matches all characters @emph{except} letters and digits.@refill
276
277 @samp{^} is not special in a character set unless it is the first
278 character. The character following the @samp{^} is treated as if it
279 were first (thus, @samp{-} and @samp{]} are not special there).
280
281 Note that a complement character set can match a newline, unless
282 newline is mentioned as one of the characters not to match.
283
284 @item ^
285 @cindex @samp{^} in regexp
286 @cindex beginning of line in regexp
287 is a special character that matches the empty string, but only at
288 the beginning of a line in the text being matched. Otherwise it fails
289 to match anything. Thus, @samp{^foo} matches a @samp{foo} which occurs
290 at the beginning of a line.
291
292 When matching a string, @samp{^} matches at the beginning of the string
293 or after a newline character @samp{\n}.
294
295 @item $
296 @cindex @samp{$} in regexp
297 is similar to @samp{^} but matches only at the end of a line. Thus,
298 @samp{x+$} matches a string of one @samp{x} or more at the end of a line.
299
300 When matching a string, @samp{$} matches at the end of the string
301 or before a newline character @samp{\n}.
302
303 @item \
304 @cindex @samp{\} in regexp
305 has two functions: it quotes the special characters (including
306 @samp{\}), and it introduces additional special constructs.
307
308 Because @samp{\} quotes special characters, @samp{\$} is a regular
309 expression which matches only @samp{$}, and @samp{\[} is a regular
310 expression which matches only @samp{[}, and so on.
311
312 Note that @samp{\} also has special meaning in the read syntax of Lisp
313 strings (@pxref{String Type}), and must be quoted with @samp{\}. For
314 example, the regular expression that matches the @samp{\} character is
315 @samp{\\}. To write a Lisp string that contains the characters
316 @samp{\\}, Lisp syntax requires you to quote each @samp{\} with another
317 @samp{\}. Therefore, the read syntax for a regular expression matching
318 @samp{\} is @code{"\\\\"}.@refill
319 @end table
320
321 @strong{Please note:} for historical compatibility, special characters
322 are treated as ordinary ones if they are in contexts where their special
323 meanings make no sense. For example, @samp{*foo} treats @samp{*} as
324 ordinary since there is no preceding expression on which the @samp{*}
325 can act. It is poor practice to depend on this behavior; better to
326 quote the special character anyway, regardless of where it
327 appears.@refill
328
329 For the most part, @samp{\} followed by any character matches only
330 that character. However, there are several exceptions: characters
331 which, when preceded by @samp{\}, are special constructs. Such
332 characters are always ordinary when encountered on their own. Here
333 is a table of @samp{\} constructs:
334
335 @table @kbd
336 @item \|
337 @cindex @samp{|} in regexp
338 @cindex regexp alternative
339 specifies an alternative.
340 Two regular expressions @var{a} and @var{b} with @samp{\|} in
341 between form an expression that matches anything that either @var{a} or
342 @var{b} matches.@refill
343
344 Thus, @samp{foo\|bar} matches either @samp{foo} or @samp{bar}
345 but no other string.@refill
346
347 @samp{\|} applies to the largest possible surrounding expressions. Only a
348 surrounding @samp{\( @dots{} \)} grouping can limit the grouping power of
349 @samp{\|}.@refill
350
351 Full backtracking capability exists to handle multiple uses of @samp{\|}.
352
353 @item \( @dots{} \)
354 @cindex @samp{(} in regexp
355 @cindex @samp{)} in regexp
356 @cindex regexp grouping
357 is a grouping construct that serves three purposes:
358
359 @enumerate
360 @item
361 To enclose a set of @samp{\|} alternatives for other operations.
362 Thus, @samp{\(foo\|bar\)x} matches either @samp{foox} or @samp{barx}.
363
364 @item
365 To enclose an expression for a suffix operator such as @samp{*} to act
366 on. Thus, @samp{ba\(na\)*} matches @samp{bananana}, etc., with any
367 (zero or more) number of @samp{na} strings.@refill
368
369 @item
370 To record a matched substring for future reference.
371 @end enumerate
372
373 This last application is not a consequence of the idea of a
374 parenthetical grouping; it is a separate feature which happens to be
375 assigned as a second meaning to the same @samp{\( @dots{} \)} construct
376 because there is no conflict in practice between the two meanings.
377 Here is an explanation of this feature:
378
379 @item \@var{digit}
380 matches the same text which matched the @var{digit}th occurrence of a
381 @samp{\( @dots{} \)} construct.
382
383 In other words, after the end of a @samp{\( @dots{} \)} construct. the
384 matcher remembers the beginning and end of the text matched by that
385 construct. Then, later on in the regular expression, you can use
386 @samp{\} followed by @var{digit} to match that same text, whatever it
387 may have been.
388
389 The strings matching the first nine @samp{\( @dots{} \)} constructs
390 appearing in a regular expression are assigned numbers 1 through 9 in
391 the order that the open parentheses appear in the regular expression.
392 So you can use @samp{\1} through @samp{\9} to refer to the text matched
393 by the corresponding @samp{\( @dots{} \)} constructs.
394
395 For example, @samp{\(.*\)\1} matches any newline-free string that is
396 composed of two identical halves. The @samp{\(.*\)} matches the first
397 half, which may be anything, but the @samp{\1} that follows must match
398 the same exact text.
399
400 @item \w
401 @cindex @samp{\w} in regexp
402 matches any word-constituent character. The editor syntax table
403 determines which characters these are. @xref{Syntax Tables}.
404
405 @item \W
406 @cindex @samp{\W} in regexp
407 matches any character that is not a word-constituent.
408
409 @item \s@var{code}
410 @cindex @samp{\s} in regexp
411 matches any character whose syntax is @var{code}. Here @var{code} is a
412 character which represents a syntax code: thus, @samp{w} for word
413 constituent, @samp{-} for whitespace, @samp{(} for open parenthesis,
414 etc. @xref{Syntax Tables}, for a list of syntax codes and the
415 characters that stand for them.
416
417 @item \S@var{code}
418 @cindex @samp{\S} in regexp
419 matches any character whose syntax is not @var{code}.
420 @end table
421
422 These regular expression constructs match the empty string---that is,
423 they don't use up any characters---but whether they match depends on the
424 context.
425
426 @table @kbd
427 @item \`
428 @cindex @samp{\`} in regexp
429 matches the empty string, but only at the beginning
430 of the buffer or string being matched against.
431
432 @item \'
433 @cindex @samp{\'} in regexp
434 matches the empty string, but only at the end of
435 the buffer or string being matched against.
436
437 @item \=
438 @cindex @samp{\=} in regexp
439 matches the empty string, but only at point.
440 (This construct is not defined when matching against a string.)
441
442 @item \b
443 @cindex @samp{\b} in regexp
444 matches the empty string, but only at the beginning or
445 end of a word. Thus, @samp{\bfoo\b} matches any occurrence of
446 @samp{foo} as a separate word. @samp{\bballs?\b} matches
447 @samp{ball} or @samp{balls} as a separate word.@refill
448
449 @item \B
450 @cindex @samp{\B} in regexp
451 matches the empty string, but @emph{not} at the beginning or
452 end of a word.
453
454 @item \<
455 @cindex @samp{\<} in regexp
456 matches the empty string, but only at the beginning of a word.
457
458 @item \>
459 @cindex @samp{\>} in regexp
460 matches the empty string, but only at the end of a word.
461 @end table
462
463 @kindex invalid-regexp
464 Not every string is a valid regular expression. For example, a string
465 with unbalanced square brackets is invalid (with a few exceptions, such
466 as @samp{[]]}, and so is a string that ends with a single @samp{\}. If
467 an invalid regular expression is passed to any of the search functions,
468 an @code{invalid-regexp} error is signaled.
469
470 @defun regexp-quote string
471 This function returns a regular expression string that matches exactly
472 @var{string} and nothing else. This allows you to request an exact
473 string match when calling a function that wants a regular expression.
474
475 @example
476 @group
477 (regexp-quote "^The cat$")
478 @result{} "\\^The cat\\$"
479 @end group
480 @end example
481
482 One use of @code{regexp-quote} is to combine an exact string match with
483 context described as a regular expression. For example, this searches
484 for the string which is the value of @code{string}, surrounded by
485 whitespace:
486
487 @example
488 @group
489 (re-search-forward
490 (concat "\\s " (regexp-quote string) "\\s "))
491 @end group
492 @end example
493 @end defun
494
495 @node Regexp Example
496 @comment node-name, next, previous, up
497 @subsection Complex Regexp Example
498
499 Here is a complicated regexp, used by Emacs to recognize the end of a
500 sentence together with any whitespace that follows. It is the value of
501 the variable @code{sentence-end}.
502
503 First, we show the regexp as a string in Lisp syntax to distinguish
504 spaces from tab characters. The string constant begins and ends with a
505 double-quote. @samp{\"} stands for a double-quote as part of the
506 string, @samp{\\} for a backslash as part of the string, @samp{\t} for a
507 tab and @samp{\n} for a newline.
508
509 @example
510 "[.?!][]\"')@}]*\\($\\| $\\|\t\\| \\)[ \t\n]*"
511 @end example
512
513 In contrast, if you evaluate the variable @code{sentence-end}, you
514 will see the following:
515
516 @example
517 @group
518 sentence-end
519 @result{}
520 "[.?!][]\"')@}]*\\($\\| $\\| \\| \\)[
521 ]*"
522 @end group
523 @end example
524
525 @noindent
526 In this output, tab and newline appear as themselves.
527
528 This regular expression contains four parts in succession and can be
529 deciphered as follows:
530
531 @table @code
532 @item [.?!]
533 The first part of the pattern consists of three characters, a period, a
534 question mark and an exclamation mark, within square brackets. The
535 match must begin with one of these three characters.
536
537 @item []\"')@}]*
538 The second part of the pattern matches any closing braces and quotation
539 marks, zero or more of them, that may follow the period, question mark
540 or exclamation mark. The @code{\"} is Lisp syntax for a double-quote in
541 a string. The @samp{*} at the end indicates that the immediately
542 preceding regular expression (a character set, in this case) may be
543 repeated zero or more times.
544
545 @item \\($\\|@ \\|\t\\|@ @ \\)
546 The third part of the pattern matches the whitespace that follows the
547 end of a sentence: the end of a line, or a tab, or two spaces. The
548 double backslashes mark the parentheses and vertical bars as regular
549 expression syntax; the parentheses mark the group and the vertical bars
550 separate alternatives. The dollar sign is used to match the end of a
551 line.
552
553 @item [ \t\n]*
554 Finally, the last part of the pattern matches any additional whitespace
555 beyond the minimum needed to end a sentence.
556 @end table
557
558 @node Regexp Search
559 @section Regular Expression Searching
560 @cindex regular expression searching
561 @cindex regexp searching
562 @cindex searching for regexp
563
564 In GNU Emacs, you can search for the next match for a regexp either
565 incrementally or not. For incremental search commands, see @ref{Regexp
566 Search, , Regular Expression Search, emacs, The GNU Emacs Manual}. Here
567 we describe only the search functions useful in programs. The principal
568 one is @code{re-search-forward}.
569
570 @deffn Command re-search-forward regexp &optional limit noerror repeat
571 This function searches forward in the current buffer for a string of
572 text that is matched by the regular expression @var{regexp}. The
573 function skips over any amount of text that is not matched by
574 @var{regexp}, and leaves point at the end of the first match found.
575 It returns the new value of point.
576
577 If @var{limit} is non-@code{nil} (it must be a position in the current
578 buffer), then it is the upper bound to the search. No match extending
579 after that position is accepted.
580
581 What happens when the search fails depends on the value of
582 @var{noerror}. If @var{noerror} is @code{nil}, a @code{search-failed}
583 error is signaled. If @var{noerror} is @code{t},
584 @code{re-search-forward} does nothing and returns @code{nil}. If
585 @var{noerror} is neither @code{nil} nor @code{t}, then
586 @code{re-search-forward} moves point to @var{limit} (or the end of the
587 buffer) and returns @code{nil}.
588
589 If @var{repeat} is supplied (it must be a positive number), then the
590 search is repeated that many times (each time starting at the end of the
591 previous time's match). If these successive searches succeed, the
592 function succeeds, moving point and returning its new value. Otherwise
593 the search fails.
594
595 In the following example, point is initially before the @samp{T}.
596 Evaluating the search call moves point to the end of that line (between
597 the @samp{t} of @samp{hat} and the newline).
598
599 @example
600 @group
601 ---------- Buffer: foo ----------
602 I read "@point{}The cat in the hat
603 comes back" twice.
604 ---------- Buffer: foo ----------
605 @end group
606
607 @group
608 (re-search-forward "[a-z]+" nil t 5)
609 @result{} 27
610
611 ---------- Buffer: foo ----------
612 I read "The cat in the hat@point{}
613 comes back" twice.
614 ---------- Buffer: foo ----------
615 @end group
616 @end example
617 @end deffn
618
619 @deffn Command re-search-backward regexp &optional limit noerror repeat
620 This function searches backward in the current buffer for a string of
621 text that is matched by the regular expression @var{regexp}, leaving
622 point at the beginning of the first text found.
623
624 This function is analogous to @code{re-search-forward}, but they are
625 not simple mirror images. @code{re-search-forward} finds the match
626 whose beginning is as close as possible. If @code{re-search-backward}
627 were a perfect mirror image, it would find the match whose end is as
628 close as possible. However, in fact it finds the match whose beginning
629 is as close as possible. The reason is that matching a regular
630 expression at a given spot always works from beginning to end, and is
631 done at a specified beginning position.
632
633 A true mirror-image of @code{re-search-forward} would require a special
634 feature for matching regexps from end to beginning. It's not worth the
635 trouble of implementing that.
636 @end deffn
637
638 @defun string-match regexp string &optional start
639 This function returns the index of the start of the first match for
640 the regular expression @var{regexp} in @var{string}, or @code{nil} if
641 there is no match. If @var{start} is non-@code{nil}, the search starts
642 at that index in @var{string}.
643
644 For example,
645
646 @example
647 @group
648 (string-match
649 "quick" "The quick brown fox jumped quickly.")
650 @result{} 4
651 @end group
652 @group
653 (string-match
654 "quick" "The quick brown fox jumped quickly." 8)
655 @result{} 27
656 @end group
657 @end example
658
659 @noindent
660 The index of the first character of the
661 string is 0, the index of the second character is 1, and so on.
662
663 After this function returns, the index of the first character beyond
664 the match is available as @code{(match-end 0)}. @xref{Match Data}.
665
666 @example
667 @group
668 (string-match
669 "quick" "The quick brown fox jumped quickly." 8)
670 @result{} 27
671 @end group
672
673 @group
674 (match-end 0)
675 @result{} 32
676 @end group
677 @end example
678 @end defun
679
680 @defun looking-at regexp
681 This function determines whether the text in the current buffer directly
682 following point matches the regular expression @var{regexp}. ``Directly
683 following'' means precisely that: the search is ``anchored'' and it can
684 succeed only starting with the first character following point. The
685 result is @code{t} if so, @code{nil} otherwise.
686
687 This function does not move point, but it updates the match data, which
688 you can access using @code{match-beginning} and @code{match-end}.
689 @xref{Match Data}.
690
691 In this example, point is located directly before the @samp{T}. If it
692 were anywhere else, the result would be @code{nil}.
693
694 @example
695 @group
696 ---------- Buffer: foo ----------
697 I read "@point{}The cat in the hat
698 comes back" twice.
699 ---------- Buffer: foo ----------
700
701 (looking-at "The cat in the hat$")
702 @result{} t
703 @end group
704 @end example
705 @end defun
706
707 @ignore
708 @deffn Command delete-matching-lines regexp
709 This function is identical to @code{delete-non-matching-lines}, save
710 that it deletes what @code{delete-non-matching-lines} keeps.
711
712 In the example below, point is located on the first line of text.
713
714 @example
715 @group
716 ---------- Buffer: foo ----------
717 We hold these truths
718 to be self-evident,
719 that all men are created
720 equal, and that they are
721 ---------- Buffer: foo ----------
722 @end group
723
724 @group
725 (delete-matching-lines "the")
726 @result{} nil
727
728 ---------- Buffer: foo ----------
729 to be self-evident,
730 that all men are created
731 ---------- Buffer: foo ----------
732 @end group
733 @end example
734 @end deffn
735
736 @deffn Command flush-lines regexp
737 This function is the same as @code{delete-matching-lines}.
738 @end deffn
739
740 @defun delete-non-matching-lines regexp
741 This function deletes all lines following point which don't
742 contain a match for the regular expression @var{regexp}.
743 @end defun
744
745 @deffn Command keep-lines regexp
746 This function is the same as @code{delete-non-matching-lines}.
747 @end deffn
748
749 @deffn Command how-many regexp
750 This function counts the number of matches for @var{regexp} there are in
751 the current buffer following point. It prints this number in
752 the echo area, returning the string printed.
753 @end deffn
754
755 @deffn Command count-matches regexp
756 This function is a synonym of @code{how-many}.
757 @end deffn
758
759 @deffn Command list-matching-lines regexp nlines
760 This function is a synonym of @code{occur}.
761 Show all lines following point containing a match for @var{regexp}.
762 Display each line with @var{nlines} lines before and after,
763 or @code{-}@var{nlines} before if @var{nlines} is negative.
764 @var{nlines} defaults to @code{list-matching-lines-default-context-lines}.
765 Interactively it is the prefix arg.
766
767 The lines are shown in a buffer named @samp{*Occur*}.
768 It serves as a menu to find any of the occurrences in this buffer.
769 @kbd{C-h m} (@code{describe-mode} in that buffer gives help.
770 @end deffn
771
772 @defopt list-matching-lines-default-context-lines
773 Default value is 0.
774 Default number of context lines to include around a @code{list-matching-lines}
775 match. A negative number means to include that many lines before the match.
776 A positive number means to include that many lines both before and after.
777 @end defopt
778 @end ignore
779
780 @node Search and Replace
781 @section Search and Replace
782 @cindex replacement
783
784 @defun perform-replace from-string replacements query-flag regexp-flag delimited-flag &optional repeat-count map
785 This function is the guts of @code{query-replace} and related commands.
786 It searches for occurrences of @var{from-string} and replaces some or
787 all of them. If @var{query-flag} is @code{nil}, it replaces all
788 occurrences; otherwise, it asks the user what to do about each one.
789
790 If @var{regexp-flag} is non-@code{nil}, then @var{from-string} is
791 considered a regular expression; otherwise, it must match literally. If
792 @var{delimited-flag} is non-@code{nil}, then only replacements
793 surrounded by word boundaries are considered.
794
795 The argument @var{replacements} specifies what to replace occurrences
796 with. If it is a string, that string is used. It can also be a list of
797 strings, to be used in cyclic order.
798
799 If @var{repeat-count} is non-@code{nil}, it should be an integer, the
800 number of occurrences to consider. In this case, @code{perform-replace}
801 returns after considering that many occurrences.
802
803 Normally, the keymap @code{query-replace-map} defines the possible user
804 responses. The argument @var{map}, if non-@code{nil}, is a keymap to
805 use instead of @code{query-replace-map}.
806 @end defun
807
808 @defvar query-replace-map
809 This variable holds a special keymap that defines the valid user
810 responses for @code{query-replace} and related functions, as well as
811 @code{y-or-n-p} and @code{map-y-or-n-p}. It is unusual in two ways:
812
813 @itemize @bullet
814 @item
815 The ``key bindings'' are not commands, just symbols that are meaningful
816 to the functions that use this map.
817
818 @item
819 Prefix keys are not supported; each key binding must be for a single event
820 key sequence. This is because the functions don't use read key sequence to
821 get the input; instead, they read a single event and look it up ``by hand.''
822 @end itemize
823 @end defvar
824
825 Here are the meaningful ``bindings'' for @code{query-replace-map}.
826 Several of them are meaningful only for @code{query-replace} and
827 friends.
828
829 @table @code
830 @item act
831 Do take the action being considered---in other words, ``yes.''
832
833 @item skip
834 Do not take action for this question---in other words, ``no.''
835
836 @item exit
837 Answer this question ``no,'' and don't ask any more.
838
839 @item act-and-exit
840 Answer this question ``yes,'' and don't ask any more.
841
842 @item act-and-show
843 Answer this question ``yes,'' but show the results---don't advance yet
844 to the next question.
845
846 @item automatic
847 Answer this question and all subsequent questions in the series with
848 ``yes,'' without further user interaction.
849
850 @item backup
851 Move back to the previous place that a question was asked about.
852
853 @item edit
854 Enter a recursive edit to deal with this question---instead of any
855 other action that would normally be taken.
856
857 @item delete-and-edit
858 Delete the text being considered, then enter a recursive edit to replace
859 it.
860
861 @item recenter
862 Redisplay and center the window, then ask the same question again.
863
864 @item quit
865 Perform a quit right away. Only @code{y-or-n-p} and related functions
866 use this answer.
867
868 @item help
869 Display some help, then ask again.
870 @end table
871
872 @node Match Data
873 @section The Match Data
874 @cindex match data
875
876 Emacs keeps track of the positions of the start and end of segments of
877 text found during a regular expression search. This means, for example,
878 that you can search for a complex pattern, such as a date in an Rmail
879 message, and then extract parts of the match under control of the
880 pattern.
881
882 Because the match data normally describe the most recent search only,
883 you must be careful not to do another search inadvertently between the
884 search you wish to refer back to and the use of the match data. If you
885 can't avoid another intervening search, you must save and restore the
886 match data around it, to prevent it from being overwritten.
887
888 @menu
889 * Simple Match Data:: Accessing single items of match data,
890 such as where a particular subexpression started.
891 * Replacing Match:: Replacing a substring that was matched.
892 * Entire Match Data:: Accessing the entire match data at once, as a list.
893 * Saving Match Data:: Saving and restoring the match data.
894 @end menu
895
896 @node Simple Match Data
897 @subsection Simple Match Data Access
898
899 This section explains how to use the match data to find the starting
900 point or ending point of the text that was matched by a particular
901 search, or by a particular parenthetical subexpression of a regular
902 expression.
903
904 @defun match-beginning count
905 This function returns the position of the start of text matched by the
906 last regular expression searched for, or a subexpression of it.
907
908 The argument @var{count}, a number, specifies a subexpression whose
909 start position is the value. If @var{count} is zero, then the value is
910 the position of the text matched by the whole regexp. If @var{count} is
911 greater than zero, then the value is the position of the beginning of
912 the text matched by the @var{count}th subexpression.
913
914 Subexpressions of a regular expression are those expressions grouped
915 inside of parentheses, @samp{\(@dots{}\)}. The @var{count}th
916 subexpression is found by counting occurrences of @samp{\(} from the
917 beginning of the whole regular expression. The first subexpression is
918 numbered 1, the second 2, and so on.
919
920 The value is @code{nil} for a parenthetical grouping inside of a
921 @samp{\|} alternative that wasn't used in the match.
922 @end defun
923
924 @defun match-end count
925 This function returns the position of the end of the text that matched
926 the last regular expression searched for, or a subexpression of it.
927 This function is otherwise similar to @code{match-beginning}.
928 @end defun
929
930 Here is an example of using the match data, with a comment showing the
931 positions within the text:
932
933 @example
934 @group
935 (string-match "\\(qu\\)\\(ick\\)"
936 "The quick fox jumped quickly.")
937 ;0123456789
938 @result{} 4
939 @end group
940
941 @group
942 (match-beginning 1) ; @r{The beginning of the match}
943 @result{} 4 ; @r{with @samp{qu} is at index 4.}
944 @end group
945
946 @group
947 (match-beginning 2) ; @r{The beginning of the match}
948 @result{} 6 ; @r{with @samp{ick} is at index 6.}
949 @end group
950
951 @group
952 (match-end 1) ; @r{The end of the match}
953 @result{} 6 ; @r{with @samp{qu} is at index 6.}
954
955 (match-end 2) ; @r{The end of the match}
956 @result{} 9 ; @r{with @samp{ick} is at index 9.}
957 @end group
958 @end example
959
960 Here is another example. Point is initially located at the beginning
961 of the line. Searching moves point to between the space and the word
962 @samp{in}. The beginning of the entire match is at the 9th character of
963 the buffer (@samp{T}), and the beginning of the match for the first
964 subexpression is at the 13th character (@samp{c}).
965
966 @example
967 @group
968 (list
969 (re-search-forward "The \\(cat \\)")
970 (match-beginning 0)
971 (match-beginning 1))
972 @result{} (t 9 13)
973 @end group
974
975 @group
976 ---------- Buffer: foo ----------
977 I read "The cat @point{}in the hat comes back" twice.
978 ^ ^
979 9 13
980 ---------- Buffer: foo ----------
981 @end group
982 @end example
983
984 @noindent
985 (In this case, the index returned is a buffer position; the first
986 character of the buffer counts as 1.)
987
988 @node Replacing Match
989 @subsection Replacing the Text That Matched
990
991 This function replaces the text matched by the last search with
992 @var{replacement}.
993
994 @cindex case in replacements
995 @defun replace-match replacement &optional fixedcase literal
996 This function replaces the buffer text matched by the last search, with
997 @var{replacement}. It applies only to buffers; you can't use
998 @code{replace-match} to replace a substring found with
999 @code{string-match}.
1000
1001 If @var{fixedcase} is non-@code{nil}, then the case of the replacement
1002 text is not changed; otherwise, the replacement text is converted to a
1003 different case depending upon the capitalization of the text to be
1004 replaced. If the original text is all upper case, the replacement text
1005 is converted to upper case, except when all of the words in the original
1006 text are only one character long. In that event, the replacement text
1007 is capitalized. If @emph{any} of the words in the original text is
1008 capitalized, then all of the words in the replacement text are
1009 capitalized.
1010
1011 If @var{literal} is non-@code{nil}, then @var{replacement} is inserted
1012 exactly as it is, the only alterations being case changes as needed.
1013 If it is @code{nil} (the default), then the character @samp{\} is treated
1014 specially. If a @samp{\} appears in @var{replacement}, then it must be
1015 part of one of the following sequences:
1016
1017 @table @asis
1018 @item @samp{\&}
1019 @cindex @samp{&} in replacement
1020 @samp{\&} stands for the entire text being replaced.
1021
1022 @item @samp{\@var{n}}
1023 @cindex @samp{\@var{n}} in replacement
1024 @samp{\@var{n}} stands for the text that matched the @var{n}th
1025 subexpression in the original regexp. Subexpressions are those
1026 expressions grouped inside of @samp{\(@dots{}\)}. @var{n} is a digit.
1027
1028 @item @samp{\\}
1029 @cindex @samp{\} in replacement
1030 @samp{\\} stands for a single @samp{\} in the replacement text.
1031 @end table
1032
1033 @code{replace-match} leaves point at the end of the replacement text,
1034 and returns @code{t}.
1035 @end defun
1036
1037 @node Entire Match Data
1038 @subsection Accessing the Entire Match Data
1039
1040 The functions @code{match-data} and @code{set-match-data} read or
1041 write the entire match data, all at once.
1042
1043 @defun match-data
1044 This function returns a newly constructed list containing all the
1045 information on what text the last search matched. Element zero is the
1046 position of the beginning of the match for the whole expression; element
1047 one is the position of the end of the match for the expression. The
1048 next two elements are the positions of the beginning and end of the
1049 match for the first subexpression, and so on. In general, element
1050 @ifinfo
1051 number 2@var{n}
1052 @end ifinfo
1053 @tex
1054 number {\mathsurround=0pt $2n$}
1055 @end tex
1056 corresponds to @code{(match-beginning @var{n})}; and
1057 element
1058 @ifinfo
1059 number 2@var{n} + 1
1060 @end ifinfo
1061 @tex
1062 number {\mathsurround=0pt $2n+1$}
1063 @end tex
1064 corresponds to @code{(match-end @var{n})}.
1065
1066 All the elements are markers or @code{nil} if matching was done on a
1067 buffer, and all are integers or @code{nil} if matching was done on a
1068 string with @code{string-match}. (In Emacs 18 and earlier versions,
1069 markers were used even for matching on a string, except in the case
1070 of the integer 0.)
1071
1072 As always, there must be no possibility of intervening searches between
1073 the call to a search function and the call to @code{match-data} that is
1074 intended to access the match data for that search.
1075
1076 @example
1077 @group
1078 (match-data)
1079 @result{} (#<marker at 9 in foo>
1080 #<marker at 17 in foo>
1081 #<marker at 13 in foo>
1082 #<marker at 17 in foo>)
1083 @end group
1084 @end example
1085 @end defun
1086
1087 @defun set-match-data match-list
1088 This function sets the match data from the elements of @var{match-list},
1089 which should be a list that was the value of a previous call to
1090 @code{match-data}.
1091
1092 If @var{match-list} refers to a buffer that doesn't exist, you don't get
1093 an error; that sets the match data in a meaningless but harmless way.
1094
1095 @findex store-match-data
1096 @code{store-match-data} is an alias for @code{set-match-data}.
1097 @end defun
1098
1099 @node Saving Match Data
1100 @subsection Saving and Restoring the Match Data
1101
1102 All asynchronous process functions (filters and sentinels) and
1103 functions that use @code{recursive-edit} should save and restore the
1104 match data if they do a search or if they let the user type arbitrary
1105 commands. Saving the match data is useful in other cases as
1106 well---whenever you want to access the match data resulting from an
1107 earlier search, notwithstanding another intervening search.
1108
1109 This example shows the problem that can arise if you fail to
1110 attend to this requirement:
1111
1112 @example
1113 @group
1114 (re-search-forward "The \\(cat \\)")
1115 @result{} 48
1116 (foo) ; @r{Perhaps @code{foo} does}
1117 ; @r{more searching.}
1118 (match-end 0)
1119 @result{} 61 ; @r{Unexpected result---not 48!}
1120 @end group
1121 @end example
1122
1123 In Emacs versions 19 and later, you can save and restore the match
1124 data with @code{save-match-data}:
1125
1126 @defspec save-match-data body@dots{}
1127 This special form executes @var{body}, saving and restoring the match
1128 data around it. This is useful if you wish to do a search without
1129 altering the match data that resulted from an earlier search.
1130 @end defspec
1131
1132 You can use @code{set-match-data} together with @code{match-data} to
1133 imitate the effect of the special form @code{save-match-data}. This is
1134 useful for writing code that can run in Emacs 18. Here is how:
1135
1136 @example
1137 @group
1138 (let ((data (match-data)))
1139 (unwind-protect
1140 @dots{} ; @r{May change the original match data.}
1141 (set-match-data data)))
1142 @end group
1143 @end example
1144
1145 @ignore
1146 Here is a function which restores the match data provided the buffer
1147 associated with it still exists.
1148
1149 @smallexample
1150 @group
1151 (defun restore-match-data (data)
1152 @c It is incorrect to split the first line of a doc string.
1153 @c If there's a problem here, it should be solved in some other way.
1154 "Restore the match data DATA unless the buffer is missing."
1155 (catch 'foo
1156 (let ((d data))
1157 @end group
1158 (while d
1159 (and (car d)
1160 (null (marker-buffer (car d)))
1161 @group
1162 ;; @file{match-data} @r{buffer is deleted.}
1163 (throw 'foo nil))
1164 (setq d (cdr d)))
1165 (set-match-data data))))
1166 @end group
1167 @end smallexample
1168 @end ignore
1169
1170 @node Searching and Case
1171 @section Searching and Case
1172 @cindex searching and case
1173
1174 By default, searches in Emacs ignore the case of the text they are
1175 searching through; if you specify searching for @samp{FOO}, then
1176 @samp{Foo} or @samp{foo} is also considered a match. Regexps, and in
1177 particular character sets, are included: thus, @samp{[aB]} would match
1178 @samp{a} or @samp{A} or @samp{b} or @samp{B}.
1179
1180 If you do not want this feature, set the variable
1181 @code{case-fold-search} to @code{nil}. Then all letters must match
1182 exactly, including case. This is a per-buffer-local variable; altering
1183 the variable affects only the current buffer. (@xref{Intro to
1184 Buffer-Local}.) Alternatively, you may change the value of
1185 @code{default-case-fold-search}, which is the default value of
1186 @code{case-fold-search} for buffers that do not override it.
1187
1188 Note that the user-level incremental search feature handles case
1189 distinctions differently. When given a lower case letter, it looks for
1190 a match of either case, but when given an upper case letter, it looks
1191 for an upper case letter only. But this has nothing to do with the
1192 searching functions Lisp functions use.
1193
1194 @defopt case-replace
1195 This variable determines whether @code{query-replace} should preserve
1196 case in replacements. If the variable is @code{nil}, then
1197 @code{replace-match} should not try to convert case.
1198 @end defopt
1199
1200 @defopt case-fold-search
1201 This buffer-local variable determines whether searches should ignore
1202 case. If the variable is @code{nil} they do not ignore case; otherwise
1203 they do ignore case.
1204 @end defopt
1205
1206 @defvar default-case-fold-search
1207 The value of this variable is the default value for
1208 @code{case-fold-search} in buffers that do not override it. This is the
1209 same as @code{(default-value 'case-fold-search)}.
1210 @end defvar
1211
1212 @node Standard Regexps
1213 @section Standard Regular Expressions Used in Editing
1214 @cindex regexps used standardly in editing
1215 @cindex standard regexps used in editing
1216
1217 This section describes some variables that hold regular expressions
1218 used for certain purposes in editing:
1219
1220 @defvar page-delimiter
1221 This is the regexp describing line-beginnings that separate pages. The
1222 default value is @code{"^\014"} (i.e., @code{"^^L"} or @code{"^\C-l"}).
1223 @end defvar
1224
1225 @defvar paragraph-separate
1226 This is the regular expression for recognizing the beginning of a line
1227 that separates paragraphs. (If you change this, you may have to
1228 change @code{paragraph-start} also.) The default value is @code{"^[
1229 \t\f]*$"}, which is a line that consists entirely of spaces, tabs, and
1230 form feeds.
1231 @end defvar
1232
1233 @defvar paragraph-start
1234 This is the regular expression for recognizing the beginning of a line
1235 that starts @emph{or} separates paragraphs. The default value is
1236 @code{"^[ \t\n\f]"}, which matches a line starting with a space, tab,
1237 newline, or form feed.
1238 @end defvar
1239
1240 @defvar sentence-end
1241 This is the regular expression describing the end of a sentence. (All
1242 paragraph boundaries also end sentences, regardless.) The default value
1243 is:
1244
1245 @example
1246 "[.?!][]\"')@}]*\\($\\|\t\\| \\)[ \t\n]*"
1247 @end example
1248
1249 This means a period, question mark or exclamation mark, followed by a
1250 closing brace, followed by tabs, spaces or new lines.
1251
1252 For a detailed explanation of this regular expression, see @ref{Regexp
1253 Example}.
1254 @end defvar