Mercurial > emacs
comparison lispref/searching.texi @ 22252:40089afa2b1d
*** empty log message ***
author | Richard M. Stallman <rms@gnu.org> |
---|---|
date | Tue, 26 May 1998 18:56:56 +0000 |
parents | d4ac295a98b3 |
children | f0cd03a7dac9 |
comparison
equal
deleted
inserted
replaced
22251:5989fa41cda6 | 22252:40089afa2b1d |
---|---|
232 Nested repetition operators can be extremely slow if they specify | 232 Nested repetition operators can be extremely slow if they specify |
233 backtracking loops. For example, it could take hours for the regular | 233 backtracking loops. For example, it could take hours for the regular |
234 expression @samp{\(x+y*\)*a} to try to match the sequence | 234 expression @samp{\(x+y*\)*a} to try to match the sequence |
235 @samp{xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxz}, before it ultimately fails. | 235 @samp{xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxz}, before it ultimately fails. |
236 The slowness is because Emacs must try each imaginable way of grouping | 236 The slowness is because Emacs must try each imaginable way of grouping |
237 the 35 @samp{x}'s before concluding that none of them can work. To make | 237 the 35 @samp{x}s before concluding that none of them can work. To make |
238 sure your regular expressions run fast, check nested repetitions | 238 sure your regular expressions run fast, check nested repetitions |
239 carefully. | 239 carefully. |
240 | 240 |
241 @item @samp{+} | 241 @item @samp{+} |
242 @cindex @samp{+} in regexp | 242 @cindex @samp{+} in regexp |
264 (including the empty string), from which it follows that @samp{c[ad]*r} | 264 (including the empty string), from which it follows that @samp{c[ad]*r} |
265 matches @samp{cr}, @samp{car}, @samp{cdr}, @samp{caddaar}, etc. | 265 matches @samp{cr}, @samp{car}, @samp{cdr}, @samp{caddaar}, etc. |
266 | 266 |
267 You can also include character ranges in a character alternative, by | 267 You can also include character ranges in a character alternative, by |
268 writing the starting and ending characters with a @samp{-} between them. | 268 writing the starting and ending characters with a @samp{-} between them. |
269 Thus, @samp{[a-z]} matches any lower-case ASCII letter. Ranges may be | 269 Thus, @samp{[a-z]} matches any lower-case @sc{ASCII} letter. Ranges may be |
270 intermixed freely with individual characters, as in @samp{[a-z$%.]}, | 270 intermixed freely with individual characters, as in @samp{[a-z$%.]}, |
271 which matches any lower case ASCII letter or @samp{$}, @samp{%} or | 271 which matches any lower case @sc{ASCII} letter or @samp{$}, @samp{%} or |
272 period. | 272 period. |
273 | 273 |
274 You cannot always match all non-@sc{ASCII} characters with the regular | 274 You cannot always match all non-@sc{ASCII} characters with the regular |
275 expression @samp{[\200-\377]}. This works when searching a unibyte | 275 expression @samp{[\200-\377]}. This works when searching a unibyte |
276 buffer or string (@pxref{Text Representations}), but not in a multibyte | 276 buffer or string (@pxref{Text Representations}), but not in a multibyte |
278 above octal 0377. However, the regular expression @samp{[^\000-\177]} | 278 above octal 0377. However, the regular expression @samp{[^\000-\177]} |
279 does match all non-@sc{ASCII} characters, in both multibyte and unibyte | 279 does match all non-@sc{ASCII} characters, in both multibyte and unibyte |
280 representations, because only the @sc{ASCII} characters are excluded. | 280 representations, because only the @sc{ASCII} characters are excluded. |
281 | 281 |
282 The beginning and end of a range must be in the same character set | 282 The beginning and end of a range must be in the same character set |
283 (@pxref{Character Sets}). Thus, @samp{[a-\x8c0]} is invalid because | 283 (@pxref{Character Sets}). Thus, @samp{[a-\x8e0]} is invalid because |
284 @samp{a} is in the @sc{ASCII} character set but the character 0x8c0 | 284 @samp{a} is in the @sc{ASCII} character set but the character 0x8e0 |
285 (@samp{A} with grave accent) is in the Emacs character set for Latin-1. | 285 (@samp{a} with grave accent) is in the Emacs character set for Latin-1. |
286 | 286 |
287 Note that the usual regexp special characters are not special inside a | 287 Note that the usual regexp special characters are not special inside a |
288 character alternative. A completely different set of characters are | 288 character alternative. A completely different set of characters are |
289 special inside character alternatives: @samp{]}, @samp{-} and @samp{^}. | 289 special inside character alternatives: @samp{]}, @samp{-} and @samp{^}. |
290 | 290 |
1284 @end example | 1284 @end example |
1285 | 1285 |
1286 You can save and restore the match data with @code{save-match-data}: | 1286 You can save and restore the match data with @code{save-match-data}: |
1287 | 1287 |
1288 @defmac save-match-data body@dots{} | 1288 @defmac save-match-data body@dots{} |
1289 This special form executes @var{body}, saving and restoring the match | 1289 This macro executes @var{body}, saving and restoring the match |
1290 data around it. | 1290 data around it. |
1291 @end defmac | 1291 @end defmac |
1292 | 1292 |
1293 You could use @code{set-match-data} together with @code{match-data} to | 1293 You could use @code{set-match-data} together with @code{match-data} to |
1294 imitate the effect of the special form @code{save-match-data}. Here is | 1294 imitate the effect of the special form @code{save-match-data}. Here is |