comparison lispref/searching.texi @ 22252:40089afa2b1d

*** empty log message ***
author Richard M. Stallman <rms@gnu.org>
date Tue, 26 May 1998 18:56:56 +0000
parents d4ac295a98b3
children f0cd03a7dac9
comparison
equal deleted inserted replaced
22251:5989fa41cda6 22252:40089afa2b1d
232 Nested repetition operators can be extremely slow if they specify 232 Nested repetition operators can be extremely slow if they specify
233 backtracking loops. For example, it could take hours for the regular 233 backtracking loops. For example, it could take hours for the regular
234 expression @samp{\(x+y*\)*a} to try to match the sequence 234 expression @samp{\(x+y*\)*a} to try to match the sequence
235 @samp{xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxz}, before it ultimately fails. 235 @samp{xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxz}, before it ultimately fails.
236 The slowness is because Emacs must try each imaginable way of grouping 236 The slowness is because Emacs must try each imaginable way of grouping
237 the 35 @samp{x}'s before concluding that none of them can work. To make 237 the 35 @samp{x}s before concluding that none of them can work. To make
238 sure your regular expressions run fast, check nested repetitions 238 sure your regular expressions run fast, check nested repetitions
239 carefully. 239 carefully.
240 240
241 @item @samp{+} 241 @item @samp{+}
242 @cindex @samp{+} in regexp 242 @cindex @samp{+} in regexp
264 (including the empty string), from which it follows that @samp{c[ad]*r} 264 (including the empty string), from which it follows that @samp{c[ad]*r}
265 matches @samp{cr}, @samp{car}, @samp{cdr}, @samp{caddaar}, etc. 265 matches @samp{cr}, @samp{car}, @samp{cdr}, @samp{caddaar}, etc.
266 266
267 You can also include character ranges in a character alternative, by 267 You can also include character ranges in a character alternative, by
268 writing the starting and ending characters with a @samp{-} between them. 268 writing the starting and ending characters with a @samp{-} between them.
269 Thus, @samp{[a-z]} matches any lower-case ASCII letter. Ranges may be 269 Thus, @samp{[a-z]} matches any lower-case @sc{ASCII} letter. Ranges may be
270 intermixed freely with individual characters, as in @samp{[a-z$%.]}, 270 intermixed freely with individual characters, as in @samp{[a-z$%.]},
271 which matches any lower case ASCII letter or @samp{$}, @samp{%} or 271 which matches any lower case @sc{ASCII} letter or @samp{$}, @samp{%} or
272 period. 272 period.
273 273
274 You cannot always match all non-@sc{ASCII} characters with the regular 274 You cannot always match all non-@sc{ASCII} characters with the regular
275 expression @samp{[\200-\377]}. This works when searching a unibyte 275 expression @samp{[\200-\377]}. This works when searching a unibyte
276 buffer or string (@pxref{Text Representations}), but not in a multibyte 276 buffer or string (@pxref{Text Representations}), but not in a multibyte
278 above octal 0377. However, the regular expression @samp{[^\000-\177]} 278 above octal 0377. However, the regular expression @samp{[^\000-\177]}
279 does match all non-@sc{ASCII} characters, in both multibyte and unibyte 279 does match all non-@sc{ASCII} characters, in both multibyte and unibyte
280 representations, because only the @sc{ASCII} characters are excluded. 280 representations, because only the @sc{ASCII} characters are excluded.
281 281
282 The beginning and end of a range must be in the same character set 282 The beginning and end of a range must be in the same character set
283 (@pxref{Character Sets}). Thus, @samp{[a-\x8c0]} is invalid because 283 (@pxref{Character Sets}). Thus, @samp{[a-\x8e0]} is invalid because
284 @samp{a} is in the @sc{ASCII} character set but the character 0x8c0 284 @samp{a} is in the @sc{ASCII} character set but the character 0x8e0
285 (@samp{A} with grave accent) is in the Emacs character set for Latin-1. 285 (@samp{a} with grave accent) is in the Emacs character set for Latin-1.
286 286
287 Note that the usual regexp special characters are not special inside a 287 Note that the usual regexp special characters are not special inside a
288 character alternative. A completely different set of characters are 288 character alternative. A completely different set of characters are
289 special inside character alternatives: @samp{]}, @samp{-} and @samp{^}. 289 special inside character alternatives: @samp{]}, @samp{-} and @samp{^}.
290 290
1284 @end example 1284 @end example
1285 1285
1286 You can save and restore the match data with @code{save-match-data}: 1286 You can save and restore the match data with @code{save-match-data}:
1287 1287
1288 @defmac save-match-data body@dots{} 1288 @defmac save-match-data body@dots{}
1289 This special form executes @var{body}, saving and restoring the match 1289 This macro executes @var{body}, saving and restoring the match
1290 data around it. 1290 data around it.
1291 @end defmac 1291 @end defmac
1292 1292
1293 You could use @code{set-match-data} together with @code{match-data} to 1293 You could use @code{set-match-data} together with @code{match-data} to
1294 imitate the effect of the special form @code{save-match-data}. Here is 1294 imitate the effect of the special form @code{save-match-data}. Here is