Mercurial > emacs

--- a/lispref/strings.texi	Mon Oct 27 15:48:12 2003 +0000
+++ b/lispref/strings.texi	Mon Oct 27 15:54:13 2003 +0000
@@ -172,7 +172,7 @@
 @samp{f} is @minus{}2, and the index for @samp{g} is @minus{}1.
 Therefore, @samp{e} and @samp{f} are included, and @samp{g} is excluded.

-When @code{nil} is used as an index, it stands for the length of the
+When @code{nil} is used for @var{end}, it stands for the length of the
 string.  Thus,

 @example
@@ -208,10 +208,11 @@
      @result{} [b (c)]
 @end example

-A @code{wrong-type-argument} error is signaled if either @var{start} or
-@var{end} is not an integer or @code{nil}.  An @code{args-out-of-range}
-error is signaled if @var{start} indicates a character following
-@var{end}, or if either integer is out of range for @var{string}.
+A @code{wrong-type-argument} error is signaled if @var{start} is not
+an integer or if @var{end} is neither an integer nor @code{nil}.  An
+@code{args-out-of-range} error is signaled if @var{start} indicates a
+character following @var{end}, or if either integer is out of range
+for @var{string}.

 Contrast this function with @code{buffer-substring} (@pxref{Buffer
 Contents}), which returns a string containing a portion of the text in
@@ -219,9 +220,12 @@
 beginning of a buffer is at index 1.
 @end defun

-@defun substring-no-properties string start &optional end
-This works like @code{substring} but discards all text properties
-from the value.
+@defun substring-no-properties string &optional start end
+This works like @code{substring} but discards all text properties from
+the value.  Also, @var{start} may be omitted or @code{nil}, which is
+equivalent to 0.  Thus, @w{@code{(substring-no-properties
+@var{string})}} returns a copy of @var{string}, with all text
+properties removed.
 @end defun

 @defun concat &rest sequences
@@ -264,7 +268,7 @@
 Lists}.
 @end defun

-@defun split-string string separators omit-nulls
+@defun split-string string &optional separators omit-nulls
 This function splits @var{string} into substrings at matches for the
 regular expression @var{separators}.  Each match for @var{separators}
 defines a splitting point; the substrings between the splitting points
@@ -285,7 +289,7 @@

 @example
 (split-string "  two words ")
-@result{} ("two" "words")
+     @result{} ("two" "words")
 @end example

 The result is not @samp{("" "two" "words" "")}, which would rarely be
@@ -294,33 +298,62 @@

 @example
 (split-string "  two words " split-string-default-separators)
-@result{} ("" "two" "words" "")
+     @result{} ("" "two" "words" "")
 @end example

 More examples:

 @example
 (split-string "Soup is good food" "o")
-@result{} ("S" "up is g" "" "d f" "" "d")
+     @result{} ("S" "up is g" "" "d f" "" "d")
 (split-string "Soup is good food" "o" t)
-@result{} ("S" "up is g" "d f" "d")
+     @result{} ("S" "up is g" "d f" "d")
 (split-string "Soup is good food" "o+")
-@result{} ("S" "up is g" "d f" "d")
+     @result{} ("S" "up is g" "d f" "d")
+@end example
+
+Empty matches do count, except that @code{split-string} will not look
+for a final empty match when it already reached the end of the string
+using a non-empty match or when @var{string} is empty:
+
+@example
+(split-string "aooob" "o*")
+     @result{} ("" "a" "" "b" "")
+(split-string "ooaboo" "o*")
+     @result{} ("" "" "a" "b" "")
+(split-string "" "")
+     @result{} ("")
 @end example

-Empty matches do count, when not adjacent to another match:
+However, when @var{separators} can match the empty string,
+@var{omit-nulls} is usually @code{t}, so that the subtleties in the
+three previous examples are rarely relevant:

 @example
-(split-string "Soup is good food" "o*")
-@result{}("S" "u" "p" " " "i" "s" " " "g" "d" " " "f" "d")
-(split-string "Nice doggy!" "")
-@result{}("N" "i" "c" "e" " " "d" "o" "g" "g" "y" "!")
+(split-string "Soup is good food" "o*" t)
+     @result{} ("S" "u" "p" " " "i" "s" " " "g" "d" " " "f" "d")
+(split-string "Nice doggy!" "" t)
+     @result{} ("N" "i" "c" "e" " " "d" "o" "g" "g" "y" "!")
+(split-string "" "" t)
+     @result{} nil
+@end example
+
+Somewhat odd, but predictable, behavior can occur for certain
+``non-greedy'' values of @var{separators} that can prefer empty
+matches over non-empty matches.  Again, such values rarely occur in
+practice:
+
+@example
+(split-string "ooo" "o*" t)
+     @result{} nil
+(split-string "ooo" "\\|o+" t)
+     @result{} ("o" "o" "o")
 @end example
 @end defun

 @defvar split-string-default-separators
 The default value of @var{separators} for @code{split-string}, initially
-@samp{"[ \f\t\n\r\v]+"}.
+@w{@samp{"[ \f\t\n\r\v]+"}}.
 @end defvar

 @node Modifying Strings
@@ -367,7 +400,8 @@

 @defun string= string1 string2
 This function returns @code{t} if the characters of the two strings
-match exactly.
+match exactly.  Symbols are also allowed as arguments, in which case
+their print names are used.
 Case is always significant, regardless of @code{case-fold-search}.

 @example
@@ -441,6 +475,9 @@
      @result{} nil
 @end group
 @end example
+
+Symbols are also allowed as arguments, in which case their print names
+are used.
 @end defun

 @defun string-lessp string1 string2
@@ -545,8 +582,10 @@
 @example
 (number-to-string 256)
      @result{} "256"
+@group
 (number-to-string -23)
      @result{} "-23"
+@end group
 (number-to-string -23.5)
      @result{} "-23.5"
 @end example
@@ -560,20 +599,22 @@
 @defun string-to-number string &optional base
 @cindex string to number
 This function returns the numeric value of the characters in
-@var{string}.  If @var{base} is non-@code{nil}, integers are converted
-in that base.  If @var{base} is @code{nil}, then base ten is used.
-Floating point conversion always uses base ten; we have not implemented
-other radices for floating point numbers, because that would be much
-more work and does not seem useful.  If @var{string} looks like an
-integer but its value is too large to fit into a Lisp integer,
+@var{string}.  If @var{base} is non-@code{nil}, it must be an integer
+between 2 and 16 (inclusive), and integers are converted in that base.
+If @var{base} is @code{nil}, then base ten is used.  Floating point
+conversion only works in base ten; we have not implemented other
+radices for floating point numbers, because that would be much more
+work and does not seem useful.  If @var{string} looks like an integer
+but its value is too large to fit into a Lisp integer,
 @code{string-to-number} returns a floating point result.

-The parsing skips spaces and tabs at the beginning of @var{string}, then
-reads as much of @var{string} as it can interpret as a number.  (On some
-systems it ignores other whitespace at the beginning, not just spaces
-and tabs.)  If the first character after the ignored whitespace is
-neither a digit, nor a plus or minus sign, nor the leading dot of a
-floating point number, this function returns 0.
+The parsing skips spaces and tabs at the beginning of @var{string},
+then reads as much of @var{string} as it can interpret as a number in
+the given base.  (On some systems it ignores other whitespace at the
+beginning, not just spaces and tabs.)  If the first character after
+the ignored whitespace is neither a digit in the given base, nor a
+plus or minus sign, nor the leading dot of a floating point number,
+this function returns 0.

 @example
 (string-to-number "256")
@@ -675,16 +716,12 @@
 copied into the output.  The text properties of the @samp{%s} itself
 are also copied, but those of the object take priority.

-If there is no corresponding object, the empty string is used.
-
 @item %S
 Replace the specification with the printed representation of the object,
 made with quoting (that is, using @code{prin1}---@pxref{Output
 Functions}).  Thus, strings are enclosed in @samp{"} characters, and
 @samp{\} characters appear where necessary before special characters.

-If there is no corresponding object, the empty string is used.
-
 @item %o
 @cindex integer to octal
 Replace the specification with the base-eight representation of an
@@ -747,12 +784,17 @@
 @cindex padding
   All the specification characters allow an optional numeric prefix
 between the @samp{%} and the character.  The optional numeric prefix
-defines the minimum width for the object.  If the printed representation
-of the object contains fewer characters than this, then it is padded.
-The padding is on the left if the prefix is positive (or starts with
-zero) and on the right if the prefix is negative.  The padding character
-is normally a space, but if the numeric prefix starts with a zero, zeros
-are used for padding.  Here are some examples of padding:
+defines the minimum width for the object.  If the printed
+representation of the object contains fewer characters than this, then
+it is padded.  The padding is on the left if the prefix is positive
+(or starts with zero) and on the right if the prefix is negative.  The
+padding character is normally a space, but if the numeric prefix
+starts with a zero, zeros are used for padding.  Some of these
+conventions are ignored for specification characters for which they do
+not make sense.  That is, %s, %S and %c accept a numeric prefix
+starting with 0, but still pad with @emph{spaces} on the left.  Also,
+%% accepts a numeric prefix, but ignores it.  Here are some examples
+of padding:

 @example
 (format "%06d is padded on the left with zeros" 123)
@@ -872,11 +914,15 @@
 has the same result as @code{upcase}.

 @example
+@group
 (capitalize "The cat in the hat")
      @result{} "The Cat In The Hat"
+@end group

+@group
 (capitalize "THE 77TH-HATTED CAT")
      @result{} "The 77th-Hatted Cat"
+@end group

 @group
 (capitalize ?x)
@@ -885,16 +931,20 @@
 @end example
 @end defun

-@defun upcase-initials string
-This function capitalizes the initials of the words in @var{string},
-without altering any letters other than the initials.  It returns a new
-string whose contents are a copy of @var{string}, in which each word has
+@defun upcase-initials string-or-char
+If @var{string-or-char} is a string, this function capitalizes the
+initials of the words in @var{string-or-char}, without altering any
+letters other than the initials.  It returns a new string whose
+contents are a copy of @var{string-or-char}, in which each word has
 had its initial letter converted to upper case.

 The definition of a word is any sequence of consecutive characters that
 are assigned to the word constituent syntax class in the current syntax
 table (@pxref{Syntax Class Table}).

+When the argument to @code{upcase-initials} is a character,
+@code{upcase-initials} has the same result as @code{upcase}.
+
 @example
 @group
 (upcase-initials "The CAT in the hAt")