comparison lispref/searching.texi @ 57403:5353c1a56ee3

(Regexp Example): Update description of how Emacs currently recognizes the end of a sentence. (Standard Regexps): Update definition of the variable `sentence-end'. Add definition of the function `sentence-end'.
author Luc Teirlinck <teirllm@auburn.edu>
date Sat, 09 Oct 2004 18:35:38 +0000
parents a325c378e9bb
children 30f22485a11e ff0e824afa37
comparison
equal deleted inserted replaced
57402:c50e857202e2 57403:5353c1a56ee3
1 @c -*-texinfo-*- 1 @c -*-texinfo-*-
2 @c This is part of the GNU Emacs Lisp Reference Manual. 2 @c This is part of the GNU Emacs Lisp Reference Manual.
3 @c Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995, 1998, 1999 3 @c Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995, 1998, 1999, 2004
4 @c Free Software Foundation, Inc. 4 @c Free Software Foundation, Inc.
5 @c See the file elisp.texi for copying conditions. 5 @c See the file elisp.texi for copying conditions.
6 @setfilename ../info/searching 6 @setfilename ../info/searching
7 @node Searching and Matching, Syntax Tables, Non-ASCII Characters, Top 7 @node Searching and Matching, Syntax Tables, Non-ASCII Characters, Top
8 @chapter Searching and Matching 8 @chapter Searching and Matching
692 @comment node-name, next, previous, up 692 @comment node-name, next, previous, up
693 @subsection Complex Regexp Example 693 @subsection Complex Regexp Example
694 694
695 Here is a complicated regexp which was formerly used by Emacs to 695 Here is a complicated regexp which was formerly used by Emacs to
696 recognize the end of a sentence together with any whitespace that 696 recognize the end of a sentence together with any whitespace that
697 follows. It was used as the variable @code{sentence-end}. (Its value 697 follows. (Nowadays Emacs uses a similar but more complex default
698 nowadays contains alternatives for @samp{.}, @samp{?} and @samp{!} in 698 regexp constructed by the function @code{sentence-end}.
699 other character sets.) 699 @xref{Standard Regexps}.)
700 700
701 First, we show the regexp as a string in Lisp syntax to distinguish 701 First, we show the regexp as a string in Lisp syntax to distinguish
702 spaces from tab characters. The string constant begins and ends with a 702 spaces from tab characters. The string constant begins and ends with a
703 double-quote. @samp{\"} stands for a double-quote as part of the 703 double-quote. @samp{\"} stands for a double-quote as part of the
704 string, @samp{\\} for a backslash as part of the string, @samp{\t} for a 704 string, @samp{\\} for a backslash as part of the string, @samp{\t} for a
728 @table @code 728 @table @code
729 @item [.?!] 729 @item [.?!]
730 The first part of the pattern is a character alternative that matches 730 The first part of the pattern is a character alternative that matches
731 any one of three characters: period, question mark, and exclamation 731 any one of three characters: period, question mark, and exclamation
732 mark. The match must begin with one of these three characters. (This 732 mark. The match must begin with one of these three characters. (This
733 is the one point where the new value of @code{sentence-end} differs 733 is one point where the new default regexp used by Emacs differs from
734 from the old. The new value also lists sentence ending 734 the old. The new value also allows some non-@acronym{ASCII}
735 non-@acronym{ASCII} characters.) 735 characters that end a sentence without any following whitespace.)
736 736
737 @item []\"')@}]* 737 @item []\"')@}]*
738 The second part of the pattern matches any closing braces and quotation 738 The second part of the pattern matches any closing braces and quotation
739 marks, zero or more of them, that may follow the period, question mark 739 marks, zero or more of them, that may follow the period, question mark
740 or exclamation mark. The @code{\"} is Lisp syntax for a double-quote in 740 or exclamation mark. The @code{\"} is Lisp syntax for a double-quote in
1696 @w{@code{"\f\\|[ \t]*$"}}, which matches a line containing only 1696 @w{@code{"\f\\|[ \t]*$"}}, which matches a line containing only
1697 whitespace or starting with a form feed (after its left margin). 1697 whitespace or starting with a form feed (after its left margin).
1698 @end defvar 1698 @end defvar
1699 1699
1700 @defvar sentence-end 1700 @defvar sentence-end
1701 This is the regular expression describing the end of a sentence. (All 1701 If non-@code{nil}, the value should be a regular expression describing
1702 paragraph boundaries also end sentences, regardless.) The (slightly 1702 the end of a sentence, including the whitespace following the
1703 simplified) default value is: 1703 sentence. (All paragraph boundaries also end sentences, regardless.)
1704 1704
1705 @example 1705 If the value is @code{nil}, the default, then the function
1706 "[.?!][]\"')@}]*\\($\\| $\\|\t\\|@ @ \\)[ \t\n]*" 1706 @code{sentence-end} has to construct the regexp. That is why you
1707 @end example 1707 should always call the function @code{sentence-end} to obtain the
1708 1708 regexp to be used to recognize the end of a sentence.
1709 This means a period, question mark or exclamation mark (the actual
1710 default value also lists their alternatives in other character sets),
1711 followed optionally by closing parenthetical characters, followed by
1712 tabs, spaces or new lines.
1713
1714 For a detailed explanation of this regular expression, see @ref{Regexp
1715 Example}.
1716 @end defvar 1709 @end defvar
1710
1711 @defun sentence-end
1712 This function returns the value of the variable @code{sentence-end},
1713 if non-@code{nil}. Otherwise it returns a default value based on the
1714 values of the variables @code{sentence-end-double-space}
1715 (@pxref{Definition of sentence-end-double-space}),
1716 @code{sentence-end-without-period} and
1717 @code{sentence-end-without-space}.
1718 @end defun
1717 1719
1718 @ignore 1720 @ignore
1719 arch-tag: c2573ca2-18aa-4839-93b8-924043ef831f 1721 arch-tag: c2573ca2-18aa-4839-93b8-924043ef831f
1720 @end ignore 1722 @end ignore