Mercurial > emacs
changeset 108931:b3a058f166f0
Better doc fix for Bug#6283.
searching.texi (Regexp Special): Remove obsolete information
about matching non-ASCII characters, and suggest using char
classes (Bug#6283).
author | Chong Yidong <cyd@stupidchicken.com> |
---|---|
date | Wed, 02 Jun 2010 13:26:31 -0400 |
parents | 964c7b675743 |
children | 21c602686f38 |
files | doc/lispref/ChangeLog doc/lispref/searching.texi |
diffstat | 2 files changed, 13 insertions(+), 18 deletions(-) [+] |
line wrap: on
line diff
--- a/doc/lispref/ChangeLog Wed Jun 02 13:14:01 2010 -0400 +++ b/doc/lispref/ChangeLog Wed Jun 02 13:26:31 2010 -0400 @@ -1,7 +1,8 @@ 2010-06-02 Chong Yidong <cyd@stupidchicken.com> - * searching.texi (Regexp Special): Replace "octal 377" - with "#o377" (Bug#6283). + * searching.texi (Regexp Special): Remove obsolete information + about matching non-ASCII characters, and suggest using char + classes (Bug#6283). 2010-05-30 Juanma Barranquero <lekktu@gmail.com>
--- a/doc/lispref/searching.texi Wed Jun 02 13:14:01 2010 -0400 +++ b/doc/lispref/searching.texi Wed Jun 02 13:26:31 2010 -0400 @@ -362,7 +362,7 @@ Thus, @samp{[ad]} matches either one @samp{a} or one @samp{d}, and @samp{[ad]*} matches any string composed of just @samp{a}s and @samp{d}s -(including the empty string), from which it follows that @samp{c[ad]*r} +(including the empty string). It follows that @samp{c[ad]*r} matches @samp{cr}, @samp{car}, @samp{cdr}, @samp{caddaar}, etc. You can also include character ranges in a character alternative, by @@ -400,21 +400,11 @@ @var{c1} is the first character of the charset to which @var{c2} belongs. -You cannot always match all non-@acronym{ASCII} characters with the -regular expression @code{"[\200-\377]"}. This works when searching a -unibyte buffer or string (@pxref{Text Representations}), but not in a -multibyte buffer or string, because many non-@acronym{ASCII} -characters have codes above @code{#o377}. However, the regular -expression @code{"[^\000-\177]"} does match all non-@acronym{ASCII} -characters (see below regarding @samp{^}), in both multibyte and -unibyte representations, because only the @acronym{ASCII} characters -are excluded. - -A character alternative can also specify named -character classes (@pxref{Char Classes}). This is a POSIX feature whose -syntax is @samp{[:@var{class}:]}. Using a character class is equivalent -to mentioning each of the characters in that class; but the latter is -not feasible in practice, since some classes include thousands of +A character alternative can also specify named character classes +(@pxref{Char Classes}). This is a POSIX feature whose syntax is +@samp{[:@var{class}:]}. Using a character class is equivalent to +mentioning each of the characters in that class; but the latter is not +feasible in practice, since some classes include thousands of different characters. @item @samp{[^ @dots{} ]} @@ -432,6 +422,10 @@ mentioned as one of the characters not to match. This is in contrast to the handling of regexps in programs such as @code{grep}. +You can specify named character classes, just like in character +alternatives. For instance, @samp{[^[:ascii:]]} matches any +non-@acronym{ASCII} character. @xref{Char Classes}. + @item @samp{^} @cindex beginning of line in regexp When matching a buffer, @samp{^} matches the empty string, but only at the