emacs: lispref/searching.texi annotate

author	Richard M. Stallman <rms@gnu.org>
date	Fri, 20 Jan 1995 23:36:07 +0000 (1995-01-20)
parents	6b8e51b286c6
children	f43818d3bbd8

rev	line source
6552 3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	1 @c --texinfo--
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	2 @c This is part of the GNU Emacs Lisp Reference Manual.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	3 @c Copyright (C) 1990, 1991, 1992, 1993, 1994 Free Software Foundation, Inc.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	4 @c See the file elisp.texi for copying conditions.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	5 @setfilename ../info/searching
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	6 @node Searching and Matching, Syntax Tables, Text, Top
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	7 @chapter Searching and Matching
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	8 @cindex searching
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	9
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	10 GNU Emacs provides two ways to search through a buffer for specified
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	11 text: exact string searches and regular expression searches. After a
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	12 regular expression search, you can examine the @dfn{match data} to
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	13 determine which text matched the whole regular expression or various
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	14 portions of it.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	15
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	16 @menu
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	17 * String Search:: Search for an exact match.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	18 * Regular Expressions:: Describing classes of strings.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	19 * Regexp Search:: Searching for a match for a regexp.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	20 * Search and Replace:: Internals of @code{query-replace}.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	21 * Match Data:: Finding out which part of the text matched
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	22 various parts of a regexp, after regexp search.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	23 * Searching and Case:: Case-independent or case-significant searching.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	24 * Standard Regexps:: Useful regexps for finding sentences, pages,...
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	25 @end menu
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	26
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	27 The @samp{skip-chars@dots{}} functions also perform a kind of searching.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	28 @xref{Skipping Characters}.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	29
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	30 @node String Search
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	31 @section Searching for Strings
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	32 @cindex string search
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	33
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	34 These are the primitive functions for searching through the text in a
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	35 buffer. They are meant for use in programs, but you may call them
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	36 interactively. If you do so, they prompt for the search string;
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	37 @var{limit} and @var{noerror} are set to @code{nil}, and @var{repeat}
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	38 is set to 1.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	39
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	40 @deffn Command search-forward string &optional limit noerror repeat
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	41 This function searches forward from point for an exact match for
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	42 @var{string}. If successful, it sets point to the end of the occurrence
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	43 found, and returns the new value of point. If no match is found, the
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	44 value and side effects depend on @var{noerror} (see below).
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	45 @c Emacs 19 feature
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	46
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	47 In the following example, point is initially at the beginning of the
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	48 line. Then @code{(search-forward "fox")} moves point after the last
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	49 letter of @samp{fox}:
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	50
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	51 @example
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	52 @group
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	53 ---------- Buffer: foo ----------
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	54 @point{}The quick brown fox jumped over the lazy dog.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	55 ---------- Buffer: foo ----------
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	56 @end group
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	57
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	58 @group
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	59 (search-forward "fox")
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	60 @result{} 20
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	61
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	62 ---------- Buffer: foo ----------
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	63 The quick brown fox@point{} jumped over the lazy dog.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	64 ---------- Buffer: foo ----------
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	65 @end group
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	66 @end example
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	67
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	68 The argument @var{limit} specifies the upper bound to the search. (It
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	69 must be a position in the current buffer.) No match extending after
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	70 that position is accepted. If @var{limit} is omitted or @code{nil}, it
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	71 defaults to the end of the accessible portion of the buffer.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	72
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	73 @kindex search-failed
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	74 What happens when the search fails depends on the value of
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	75 @var{noerror}. If @var{noerror} is @code{nil}, a @code{search-failed}
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	76 error is signaled. If @var{noerror} is @code{t}, @code{search-forward}
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	77 returns @code{nil} and does nothing. If @var{noerror} is neither
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	78 @code{nil} nor @code{t}, then @code{search-forward} moves point to the
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	79 upper bound and returns @code{nil}. (It would be more consistent now
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	80 to return the new position of point in that case, but some programs
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	81 may depend on a value of @code{nil}.)
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	82
8427 bc548090f760 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 7735 diff changeset	83 If @var{repeat} is supplied (it must be a positive number), then the
bc548090f760 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 7735 diff changeset	84 search is repeated that many times (each time starting at the end of the
bc548090f760 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 7735 diff changeset	85 previous time's match). If these successive searches succeed, the
bc548090f760 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 7735 diff changeset	86 function succeeds, moving point and returning its new value. Otherwise
bc548090f760 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 7735 diff changeset	87 the search fails.
6552 3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	88 @end deffn
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	89
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	90 @deffn Command search-backward string &optional limit noerror repeat
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	91 This function searches backward from point for @var{string}. It is
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	92 just like @code{search-forward} except that it searches backwards and
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	93 leaves point at the beginning of the match.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	94 @end deffn
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	95
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	96 @deffn Command word-search-forward string &optional limit noerror repeat
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	97 @cindex word search
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	98 This function searches forward from point for a ``word'' match for
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	99 @var{string}. If it finds a match, it sets point to the end of the
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	100 match found, and returns the new value of point.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	101 @c Emacs 19 feature
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	102
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	103 Word matching regards @var{string} as a sequence of words, disregarding
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	104 punctuation that separates them. It searches the buffer for the same
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	105 sequence of words. Each word must be distinct in the buffer (searching
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	106 for the word @samp{ball} does not match the word @samp{balls}), but the
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	107 details of punctuation and spacing are ignored (searching for @samp{ball
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	108 boy} does match @samp{ball. Boy!}).
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	109
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	110 In this example, point is initially at the beginning of the buffer; the
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	111 search leaves it between the @samp{y} and the @samp{!}.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	112
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	113 @example
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	114 @group
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	115 ---------- Buffer: foo ----------
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	116 @point{}He said "Please! Find
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	117 the ball boy!"
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	118 ---------- Buffer: foo ----------
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	119 @end group
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	120
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	121 @group
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	122 (word-search-forward "Please find the ball, boy.")
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	123 @result{} 35
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	124
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	125 ---------- Buffer: foo ----------
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	126 He said "Please! Find
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	127 the ball boy@point{}!"
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	128 ---------- Buffer: foo ----------
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	129 @end group
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	130 @end example
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	131
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	132 If @var{limit} is non-@code{nil} (it must be a position in the current
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	133 buffer), then it is the upper bound to the search. The match found must
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	134 not extend after that position.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	135
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	136 If @var{noerror} is @code{nil}, then @code{word-search-forward} signals
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	137 an error if the search fails. If @var{noerror} is @code{t}, then it
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	138 returns @code{nil} instead of signaling an error. If @var{noerror} is
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	139 neither @code{nil} nor @code{t}, it moves point to @var{limit} (or the
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	140 end of the buffer) and returns @code{nil}.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	141
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	142 If @var{repeat} is non-@code{nil}, then the search is repeated that many
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	143 times. Point is positioned at the end of the last match.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	144 @end deffn
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	145
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	146 @deffn Command word-search-backward string &optional limit noerror repeat
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	147 This function searches backward from point for a word match to
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	148 @var{string}. This function is just like @code{word-search-forward}
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	149 except that it searches backward and normally leaves point at the
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	150 beginning of the match.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	151 @end deffn
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	152
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	153 @node Regular Expressions
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	154 @section Regular Expressions
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	155 @cindex regular expression
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	156 @cindex regexp
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	157
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	158 A @dfn{regular expression} (@dfn{regexp}, for short) is a pattern that
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	159 denotes a (possibly infinite) set of strings. Searching for matches for
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	160 a regexp is a very powerful operation. This section explains how to write
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	161 regexps; the following section says how to search for them.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	162
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	163 @menu
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	164 * Syntax of Regexps:: Rules for writing regular expressions.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	165 * Regexp Example:: Illustrates regular expression syntax.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	166 @end menu
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	167
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	168 @node Syntax of Regexps
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	169 @subsection Syntax of Regular Expressions
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	170
8427 bc548090f760 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 7735 diff changeset	171 Regular expressions have a syntax in which a few characters are
bc548090f760 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 7735 diff changeset	172 special constructs and the rest are @dfn{ordinary}. An ordinary
bc548090f760 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 7735 diff changeset	173 character is a simple regular expression that matches that character and
bc548090f760 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 7735 diff changeset	174 nothing else. The special characters are @samp{.}, @samp{*}, @samp{+},
bc548090f760 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 7735 diff changeset	175 @samp{?}, @samp{[}, @samp{]}, @samp{^}, @samp{$}, and @samp{\}; no new
bc548090f760 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 7735 diff changeset	176 special characters will be defined in the future. Any other character
bc548090f760 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 7735 diff changeset	177 appearing in a regular expression is ordinary, unless a @samp{\}
bc548090f760 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 7735 diff changeset	178 precedes it.
6552 3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	179
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	180 For example, @samp{f} is not a special character, so it is ordinary, and
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	181 therefore @samp{f} is a regular expression that matches the string
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	182 @samp{f} and no other string. (It does @emph{not} match the string
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	183 @samp{ff}.) Likewise, @samp{o} is a regular expression that matches
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	184 only @samp{o}.@refill
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	185
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	186 Any two regular expressions @var{a} and @var{b} can be concatenated. The
8427 bc548090f760 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 7735 diff changeset	187 result is a regular expression that matches a string if @var{a} matches
6552 3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	188 some amount of the beginning of that string and @var{b} matches the rest of
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	189 the string.@refill
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	190
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	191 As a simple example, we can concatenate the regular expressions @samp{f}
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	192 and @samp{o} to get the regular expression @samp{fo}, which matches only
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	193 the string @samp{fo}. Still trivial. To do something more powerful, you
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	194 need to use one of the special characters. Here is a list of them:
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	195
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	196 @need 1200
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	197 @table @kbd
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	198 @item .@: @r{(Period)}
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	199 @cindex @samp{.} in regexp
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	200 is a special character that matches any single character except a newline.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	201 Using concatenation, we can make regular expressions like @samp{a.b}, which
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	202 matches any three-character string that begins with @samp{a} and ends with
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	203 @samp{b}.@refill
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	204
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	205 @item *
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	206 @cindex @samp{*} in regexp
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	207 is not a construct by itself; it is a suffix operator that means to
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	208 repeat the preceding regular expression as many times as possible. In
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	209 @samp{fo}, the @samp{} applies to the @samp{o}, so @samp{fo*} matches
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	210 one @samp{f} followed by any number of @samp{o}s. The case of zero
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	211 @samp{o}s is allowed: @samp{fo*} does match @samp{f}.@refill
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	212
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	213 @samp{*} always applies to the @emph{smallest} possible preceding
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	214 expression. Thus, @samp{fo*} has a repeating @samp{o}, not a
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	215 repeating @samp{fo}.@refill
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	216
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	217 The matcher processes a @samp{*} construct by matching, immediately,
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	218 as many repetitions as can be found. Then it continues with the rest
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	219 of the pattern. If that fails, backtracking occurs, discarding some
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	220 of the matches of the @samp{*}-modified construct in case that makes
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	221 it possible to match the rest of the pattern. For example, in matching
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	222 @samp{caar} against the string @samp{caaar}, the @samp{a} first
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	223 tries to match all three @samp{a}s; but the rest of the pattern is
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	224 @samp{ar} and there is only @samp{r} left to match, so this try fails.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	225 The next alternative is for @samp{a*} to match only two @samp{a}s.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	226 With this choice, the rest of the regexp matches successfully.@refill
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	227
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	228 @item +
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	229 @cindex @samp{+} in regexp
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	230 is a suffix operator similar to @samp{*} except that the preceding
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	231 expression must match at least once. So, for example, @samp{ca+r}
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	232 matches the strings @samp{car} and @samp{caaaar} but not the string
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	233 @samp{cr}, whereas @samp{ca*r} matches all three strings.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	234
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	235 @item ?
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	236 @cindex @samp{?} in regexp
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	237 is a suffix operator similar to @samp{*} except that the preceding
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	238 expression can match either once or not at all. For example,
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	239 @samp{ca?r} matches @samp{car} or @samp{cr}, but does not match anyhing
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	240 else.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	241
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	242 @item [ @dots{} ]
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	243 @cindex character set (in regexp)
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	244 @cindex @samp{[} in regexp
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	245 @cindex @samp{]} in regexp
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	246 @samp{[} begins a @dfn{character set}, which is terminated by a
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	247 @samp{]}. In the simplest case, the characters between the two brackets
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	248 form the set. Thus, @samp{[ad]} matches either one @samp{a} or one
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	249 @samp{d}, and @samp{[ad]*} matches any string composed of just @samp{a}s
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	250 and @samp{d}s (including the empty string), from which it follows that
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	251 @samp{c[ad]*r} matches @samp{cr}, @samp{car}, @samp{cdr},
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	252 @samp{caddaar}, etc.@refill
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	253
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	254 The usual regular expression special characters are not special inside a
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	255 character set. A completely different set of special characters exists
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	256 inside character sets: @samp{]}, @samp{-} and @samp{^}.@refill
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	257
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	258 @samp{-} is used for ranges of characters. To write a range, write two
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	259 characters with a @samp{-} between them. Thus, @samp{[a-z]} matches any
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	260 lower case letter. Ranges may be intermixed freely with individual
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	261 characters, as in @samp{[a-z$%.]}, which matches any lower case letter
8427 bc548090f760 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 7735 diff changeset	262 or @samp{$}, @samp{%}, or a period.@refill
6552 3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	263
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	264 To include a @samp{]} in a character set, make it the first character.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	265 For example, @samp{[]a]} matches @samp{]} or @samp{a}. To include a
8427 bc548090f760 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 7735 diff changeset	266 @samp{-}, write @samp{-} as the first character in the set, or put it
6552 3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	267 immediately after a range. (You can replace one individual character
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	268 @var{c} with the range @samp{@var{c}-@var{c}} to make a place to put the
8427 bc548090f760 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 7735 diff changeset	269 @samp{-}.) There is no way to write a set containing just @samp{-} and
6552 3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	270 @samp{]}.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	271
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	272 To include @samp{^} in a set, put it anywhere but at the beginning of
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	273 the set.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	274
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	275 @item [^ @dots{} ]
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	276 @cindex @samp{^} in regexp
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	277 @samp{[^} begins a @dfn{complement character set}, which matches any
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	278 character except the ones specified. Thus, @samp{[^a-z0-9A-Z]}
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	279 matches all characters @emph{except} letters and digits.@refill
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	280
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	281 @samp{^} is not special in a character set unless it is the first
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	282 character. The character following the @samp{^} is treated as if it
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	283 were first (thus, @samp{-} and @samp{]} are not special there).
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	284
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	285 Note that a complement character set can match a newline, unless
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	286 newline is mentioned as one of the characters not to match.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	287
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	288 @item ^
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	289 @cindex @samp{^} in regexp
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	290 @cindex beginning of line in regexp
8427 bc548090f760 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 7735 diff changeset	291 is a special character that matches the empty string, but only at the
bc548090f760 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 7735 diff changeset	292 beginning of a line in the text being matched. Otherwise it fails to
bc548090f760 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 7735 diff changeset	293 match anything. Thus, @samp{^foo} matches a @samp{foo} that occurs at
bc548090f760 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 7735 diff changeset	294 the beginning of a line.
6552 3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	295
8427 bc548090f760 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 7735 diff changeset	296 When matching a string instead of a buffer, @samp{^} matches at the
bc548090f760 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 7735 diff changeset	297 beginning of the string or after a newline character @samp{\n}.
6552 3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	298
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	299 @item $
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	300 @cindex @samp{$} in regexp
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	301 is similar to @samp{^} but matches only at the end of a line. Thus,
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	302 @samp{x+$} matches a string of one @samp{x} or more at the end of a line.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	303
8427 bc548090f760 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 7735 diff changeset	304 When matching a string instead of a buffer, @samp{$} matches at the end
bc548090f760 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 7735 diff changeset	305 of the string or before a newline character @samp{\n}.
6552 3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	306
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	307 @item \
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	308 @cindex @samp{\} in regexp
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	309 has two functions: it quotes the special characters (including
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	310 @samp{\}), and it introduces additional special constructs.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	311
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	312 Because @samp{\} quotes special characters, @samp{\$} is a regular
8427 bc548090f760 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 7735 diff changeset	313 expression that matches only @samp{$}, and @samp{\[} is a regular
bc548090f760 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 7735 diff changeset	314 expression that matches only @samp{[}, and so on.
6552 3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	315
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	316 Note that @samp{\} also has special meaning in the read syntax of Lisp
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	317 strings (@pxref{String Type}), and must be quoted with @samp{\}. For
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	318 example, the regular expression that matches the @samp{\} character is
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	319 @samp{\\}. To write a Lisp string that contains the characters
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	320 @samp{\\}, Lisp syntax requires you to quote each @samp{\} with another
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	321 @samp{\}. Therefore, the read syntax for a regular expression matching
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	322 @samp{\} is @code{"\\\\"}.@refill
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	323 @end table
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	324
7735 7db892210924 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 7086 diff changeset	325 @strong{Please note:} For historical compatibility, special characters
6552 3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	326 are treated as ordinary ones if they are in contexts where their special
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	327 meanings make no sense. For example, @samp{foo} treats @samp{} as
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	328 ordinary since there is no preceding expression on which the @samp{*}
8427 bc548090f760 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 7735 diff changeset	329 can act. It is poor practice to depend on this behavior; quote the
bc548090f760 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 7735 diff changeset	330 special character anyway, regardless of where it appears.@refill
6552 3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	331
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	332 For the most part, @samp{\} followed by any character matches only
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	333 that character. However, there are several exceptions: characters
8427 bc548090f760 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 7735 diff changeset	334 that, when preceded by @samp{\}, are special constructs. Such
6552 3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	335 characters are always ordinary when encountered on their own. Here
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	336 is a table of @samp{\} constructs:
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	337
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	338 @table @kbd
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	339 @item \\|
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	340 @cindex @samp{\|} in regexp
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	341 @cindex regexp alternative
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	342 specifies an alternative.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	343 Two regular expressions @var{a} and @var{b} with @samp{\\|} in
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	344 between form an expression that matches anything that either @var{a} or
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	345 @var{b} matches.@refill
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	346
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	347 Thus, @samp{foo\\|bar} matches either @samp{foo} or @samp{bar}
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	348 but no other string.@refill
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	349
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	350 @samp{\\|} applies to the largest possible surrounding expressions. Only a
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	351 surrounding @samp{\( @dots{} \)} grouping can limit the grouping power of
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	352 @samp{\\|}.@refill
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	353
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	354 Full backtracking capability exists to handle multiple uses of @samp{\\|}.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	355
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	356 @item \( @dots{} \)
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	357 @cindex @samp{(} in regexp
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	358 @cindex @samp{)} in regexp
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	359 @cindex regexp grouping
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	360 is a grouping construct that serves three purposes:
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	361
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	362 @enumerate
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	363 @item
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	364 To enclose a set of @samp{\\|} alternatives for other operations.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	365 Thus, @samp{\(foo\\|bar\)x} matches either @samp{foox} or @samp{barx}.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	366
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	367 @item
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	368 To enclose an expression for a suffix operator such as @samp{*} to act
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	369 on. Thus, @samp{ba\(na\)*} matches @samp{bananana}, etc., with any
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	370 (zero or more) number of @samp{na} strings.@refill
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	371
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	372 @item
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	373 To record a matched substring for future reference.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	374 @end enumerate
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	375
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	376 This last application is not a consequence of the idea of a
8427 bc548090f760 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 7735 diff changeset	377 parenthetical grouping; it is a separate feature that happens to be
6552 3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	378 assigned as a second meaning to the same @samp{\( @dots{} \)} construct
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	379 because there is no conflict in practice between the two meanings.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	380 Here is an explanation of this feature:
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	381
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	382 @item \@var{digit}
8427 bc548090f760 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 7735 diff changeset	383 matches the same text that matched the @var{digit}th occurrence of a
6552 3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	384 @samp{\( @dots{} \)} construct.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	385
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	386 In other words, after the end of a @samp{\( @dots{} \)} construct. the
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	387 matcher remembers the beginning and end of the text matched by that
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	388 construct. Then, later on in the regular expression, you can use
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	389 @samp{\} followed by @var{digit} to match that same text, whatever it
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	390 may have been.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	391
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	392 The strings matching the first nine @samp{\( @dots{} \)} constructs
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	393 appearing in a regular expression are assigned numbers 1 through 9 in
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	394 the order that the open parentheses appear in the regular expression.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	395 So you can use @samp{\1} through @samp{\9} to refer to the text matched
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	396 by the corresponding @samp{\( @dots{} \)} constructs.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	397
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	398 For example, @samp{\(.*\)\1} matches any newline-free string that is
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	399 composed of two identical halves. The @samp{\(.*\)} matches the first
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	400 half, which may be anything, but the @samp{\1} that follows must match
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	401 the same exact text.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	402
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	403 @item \w
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	404 @cindex @samp{\w} in regexp
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	405 matches any word-constituent character. The editor syntax table
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	406 determines which characters these are. @xref{Syntax Tables}.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	407
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	408 @item \W
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	409 @cindex @samp{\W} in regexp
8427 bc548090f760 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 7735 diff changeset	410 matches any character that is not a word constituent.
6552 3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	411
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	412 @item \s@var{code}
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	413 @cindex @samp{\s} in regexp
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	414 matches any character whose syntax is @var{code}. Here @var{code} is a
8427 bc548090f760 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 7735 diff changeset	415 character that represents a syntax code: thus, @samp{w} for word
6552 3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	416 constituent, @samp{-} for whitespace, @samp{(} for open parenthesis,
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	417 etc. @xref{Syntax Tables}, for a list of syntax codes and the
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	418 characters that stand for them.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	419
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	420 @item \S@var{code}
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	421 @cindex @samp{\S} in regexp
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	422 matches any character whose syntax is not @var{code}.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	423 @end table
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	424
8427 bc548090f760 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 7735 diff changeset	425 The following regular expression constructs match the empty string---that is,
6552 3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	426 they don't use up any characters---but whether they match depends on the
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	427 context.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	428
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	429 @table @kbd
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	430 @item \`
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	431 @cindex @samp{\`} in regexp
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	432 matches the empty string, but only at the beginning
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	433 of the buffer or string being matched against.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	434
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	435 @item \'
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	436 @cindex @samp{\'} in regexp
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	437 matches the empty string, but only at the end of
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	438 the buffer or string being matched against.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	439
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	440 @item \=
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	441 @cindex @samp{\=} in regexp
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	442 matches the empty string, but only at point.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	443 (This construct is not defined when matching against a string.)
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	444
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	445 @item \b
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	446 @cindex @samp{\b} in regexp
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	447 matches the empty string, but only at the beginning or
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	448 end of a word. Thus, @samp{\bfoo\b} matches any occurrence of
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	449 @samp{foo} as a separate word. @samp{\bballs?\b} matches
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	450 @samp{ball} or @samp{balls} as a separate word.@refill
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	451
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	452 @item \B
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	453 @cindex @samp{\B} in regexp
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	454 matches the empty string, but @emph{not} at the beginning or
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	455 end of a word.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	456
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	457 @item \<
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	458 @cindex @samp{\<} in regexp
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	459 matches the empty string, but only at the beginning of a word.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	460
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	461 @item \>
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	462 @cindex @samp{\>} in regexp
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	463 matches the empty string, but only at the end of a word.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	464 @end table
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	465
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	466 @kindex invalid-regexp
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	467 Not every string is a valid regular expression. For example, a string
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	468 with unbalanced square brackets is invalid (with a few exceptions, such
8427 bc548090f760 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 7735 diff changeset	469 as @samp{[]]}), and so is a string that ends with a single @samp{\}. If
6552 3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	470 an invalid regular expression is passed to any of the search functions,
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	471 an @code{invalid-regexp} error is signaled.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	472
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	473 @defun regexp-quote string
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	474 This function returns a regular expression string that matches exactly
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	475 @var{string} and nothing else. This allows you to request an exact
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	476 string match when calling a function that wants a regular expression.
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	477
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	478 @example
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	479 @group
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	480 (regexp-quote "^The cat$")
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	481 @result{} "\\^The cat\\$"
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	482 @end group
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	483 @end example
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	484
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	485 One use of @code{regexp-quote} is to combine an exact string match with
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	486 context described as a regular expression. For example, this searches
8427 bc548090f760 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 7735 diff changeset	487 for the string that is the value of @code{string}, surrounded by
6552 3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	488 whitespace:
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	489
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	490 @example
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	491 @group
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	492 (re-search-forward
8427 bc548090f760 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 7735 diff changeset	493 (concat "\\s-" (regexp-quote string) "\\s-"))
6552 3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	494 @end group
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	495 @end example
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	496 @end defun
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	497
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	498 @node Regexp Example
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	499 @comment node-name, next, previous, up
3b84ed22f747 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	500 @subsection Complex Regexp Example

6552

3b84ed22f747 Initial revision