Mercurial > emacs

diff lispref/strings.texi @ 21007:66d807bdc5b4
*** empty log message ***
author: Richard M. Stallman <rms@gnu.org>
date: Sat, 28 Feb 1998 01:53:53 +0000
parents: a4a1d7df2e7f
children: 90da2489c498
--- a/lispref/strings.texi	Sat Feb 28 01:49:58 1998 +0000
+++ b/lispref/strings.texi	Sat Feb 28 01:53:53 1998 +0000
@@ -1,6 +1,6 @@
 @c -*-texinfo-*-
 @c This is part of the GNU Emacs Lisp Reference Manual.
-@c Copyright (C) 1990, 1991, 1992, 1993, 1994 Free Software Foundation, Inc. 
+@c Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995, 1998 Free Software Foundation, Inc. 
 @c See the file elisp.texi for copying conditions.
 @setfilename ../info/strings
 @node Strings and Characters, Lists, Numbers, Top
@@ -25,6 +25,7 @@
 * Basics: String Basics.      Basic properties of strings and characters.
 * Predicates for Strings::    Testing whether an object is a string or char.
 * Creating Strings::          Functions to allocate new strings.
+* Modifying Strings::         Altering the contents of an existing string.
 * Text Comparison::           Comparing characters or strings.
 * String Conversion::         Converting characters or strings and vice versa.
 * Formatting Strings::        @code{format}: Emacs's analog of @code{printf}.
@@ -40,12 +41,10 @@
 whether an integer was intended as a character or not is determined only
 by how it is used.  Thus, strings really contain integers.
 
-  The length of a string (like any array) is fixed and independent of
-the string contents, and cannot be altered.  Strings in Lisp are
-@emph{not} terminated by a distinguished character code.  (By contrast,
-strings in C are terminated by a character with @sc{ASCII} code 0.)
-This means that any character, including the null character (@sc{ASCII}
-code 0), is a valid element of a string.@refill
+  The length of a string (like any array) is fixed, and cannot be
+altered once the string exists.  Strings in Lisp are @emph{not}
+terminated by a distinguished character code.  (By contrast, strings in
+C are terminated by a character with @sc{ASCII} code 0.)
 
   Since strings are considered arrays, you can operate on them with the
 general array functions.  (@xref{Sequences Arrays Vectors}.)  For
@@ -53,10 +52,13 @@
 using the functions @code{aref} and @code{aset} (@pxref{Array
 Functions}).
 
-  Each character in a string is stored in a single byte.  Therefore,
-numbers not in the range 0 to 255 are truncated when stored into a
-string.  This means that a string takes up much less memory than a
-vector of the same length.
+  There are two text representations for non-@sc{ASCII} characters in
+Emacs strings (and in buffers): unibyte and multibyte (@pxref{Text
+Representations}).  @sc{ASCII} characters always occupy one byte in a
+string; in fact, there is no real difference between the two
+representation for a string which is all @sc{ASCII}.  For most Lisp
+programming, you don't need to be concerned with these two
+representations.
 
   Sometimes key sequences are represented as strings.  When a string is
 a key sequence, string elements in the range 128 to 255 represent meta
@@ -66,9 +68,10 @@
   Strings cannot hold characters that have the hyper, super or alt
 modifiers; they can hold @sc{ASCII} control characters, but no other
 control characters.  They do not distinguish case in @sc{ASCII} control
-characters.  @xref{Character Type}, for more information about
-representation of meta and other modifiers for keyboard input
-characters.
+characters.  If you want to store such characters in a sequence, such as
+a key sequence, you must use a vector instead of a string.
+@xref{Character Type}, for more information about representation of meta
+and other modifiers for keyboard input characters.
 
   Strings are useful for holding regular expressions.  You can also
 match regular expressions against strings (@pxref{Regexp Search}).  The
@@ -84,6 +87,8 @@
   @xref{Text}, for information about functions that display strings or
 copy them into buffers.  @xref{Character Type}, and @ref{String Type},
 for information about the syntax of characters and strings.
+@xref{Non-ASCII Characters}, for functions to convert between text
+representations and encode and decode character codes.
 
 @node Predicates for Strings
 @section The Predicates for Strings
@@ -123,6 +128,16 @@
 @code{make-list} (@pxref{Building Lists}).
 @end defun
 
+@tindex string
+@defun string &rest characters
+This returns a string containing the characters @var{characters}.
+
+@example
+(string ?a ?b ?c)
+     @result{} "abc"
+@end example
+@end defun
+
 @defun substring string start &optional end
 This function returns a new string which consists of those characters
 from @var{string} in the range from (and including) the character at the
@@ -191,6 +206,9 @@
 error is signaled if @var{start} indicates a character following
 @var{end}, or if either integer is out of range for @var{string}.
 
+@code{substring} actually allows vectors as well as strings for
+the first argument.
+
 Contrast this function with @code{buffer-substring} (@pxref{Buffer
 Contents}), which returns a string containing a portion of the text in
 the current buffer.  The beginning of a string is at index 0, but the
@@ -251,6 +269,66 @@
 Lists}.
 @end defun
 
+@tindex split-string
+@defun split-string string separators
+Split @var{string} into substrings in between matches for the regular
+expression @var{separators}.  Each match for @var{separators} defines a
+splitting point; the substrings between the splitting points are made
+into a list, which is the value.  If @var{separators} is @code{nil} (or
+omitted), the default is @code{"[ \f\t\n\r\v]+"}.
+
+For example,
+
+@example
+(split-string "Soup is good food" "o")
+@result{} ("S" "up is g" "" "d f" "" "d")
+(split-string "Soup is good food" "o+")
+@result{} ("S" "up is g" "d f" "d")
+@end example
+
+When there is a match adjacent to the beginning or end of the string,
+this does not cause a null string to appear at the beginning or end
+of the list:
+
+@example
+(split-string "out to moo" "o+")
+@result{} ("ut t" " m")
+@end example
+
+Empty matches do count, when not adjacent to another match:
+
+@example
+(split-string "Soup is good food" "o*")
+@result{}("S" "u" "p" " " "i" "s" " " "g" "d" " " "f" "d")
+(split-string "Nice doggy!" "")
+@result{}("N" "i" "c" "e" " " "d" "o" "g" "g" "y" "!")
+@end example
+@end defun
+
+@node Modifying Strings
+@section Modifying Strings
+
+  The most basic way to alter the contents of an existing string is with
+@code{aset} (@pxref{Array Functions}).  @code{(aset @var{string}
+@var{idx} @var{char})} stores @var{char} into @var{string} at index
+@var{idx}.  Each character occupies one or more bytes, and if @var{char}
+needs a different number of bytes from the character already present at
+that index, @code{aset} gets an error.
+
+  A more powerful function is @code{store-substring}:
+
+@tindex store-substring
+@defun store-substring string idx obj
+This function alters part of the contents of the string @var{string}, by
+storing @var{obj} starting at index @var{idx}.  The argument @var{obj}
+may be either a character or a (smaller) string.
+
+Since it is impossible to change the length of an existing string, it is
+an error if @var{obj} doesn't fit within @var{string}'s actual length,
+or if it requires a different number of bytes from the characters
+currently present at that point in @var{string}.
+@end defun
+
 @need 2000
 @node Text Comparison
 @section Comparison of Characters and Strings
@@ -264,10 +342,9 @@
 @example
 (char-equal ?x ?x)
      @result{} t
-(char-to-string (+ 256 ?x))
-     @result{} "x"
-(char-equal ?x  (+ 256 ?x))
-     @result{} t
+(let ((case-fold-search nil))
+  (char-equal ?x ?X))
+     @result{} nil
 @end example
 @end defun
 
@@ -284,9 +361,13 @@
      @result{} nil
 @end example
 
-The function @code{string=} ignores the text properties of the
-two strings.  To compare strings in a way that compares their text
-properties also, use @code{equal} (@pxref{Equality Predicates}).
+The function @code{string=} ignores the text properties of the two
+strings.  When @code{equal} (@pxref{Equality Predicates}) compares two
+strings, it uses @code{string=}.
+
+If the arguments contain non-@sc{ASCII} characters, and one is unibyte
+while the other is multibyte, then they cannot be equal.  @xref{Text
+Representations}.
 @end defun
 
 @defun string-equal string1 string2
@@ -308,7 +389,8 @@
 mind that lower case letters have higher numeric values in the
 @sc{ASCII} character set than their upper case counterparts; numbers and
 many punctuation characters have a lower numeric value than upper case
-letters.
+letters.  A unibyte non-@sc{ASCII} character is always less than any
+multibyte non-@sc{ASCII} character (@pxref{Text Representations}).
 
 @example
 @group
@@ -360,7 +442,9 @@
 strings and integers.  @code{format} and @code{prin1-to-string}
 (@pxref{Output Functions}) can also convert Lisp objects into strings.
 @code{read-from-string} (@pxref{Input Functions}) can ``convert'' a
-string representation of a Lisp object into an object.
+string representation of a Lisp object into an object.  The functions
+@code{string-make-multibyte} and @code{string-make-unibyte} convert the
+text representation of a string (@pxref{Converting Representations}).
 
   @xref{Documentation}, for functions that produce textual descriptions
 of text characters and general input events
@@ -433,15 +517,20 @@
 See also the function @code{format} in @ref{Formatting Strings}.
 @end defun
 
-@defun string-to-number string
+@defun string-to-number string base
 @cindex string to number
 This function returns the numeric value of the characters in
-@var{string}, read in base ten.  It skips spaces and tabs at the
-beginning of @var{string}, then reads as much of @var{string} as it can
-interpret as a number.  (On some systems it ignores other whitespace at
-the beginning, not just spaces and tabs.)  If the first character after
-the ignored whitespace is not a digit or a minus sign, this function
-returns 0.
+@var{string}.  If @var{base} is non-@code{nil}, integers are converted
+in that base.  If @var{base} is @code{nil}, then base ten is used.
+Floating point conversion always uses base ten; we have not implemented
+other radices for floating point numbers, because that would be much
+more work and does not seem useful.
+
+The parsing skips spaces and tabs at the beginning of @var{string}, then
+reads as much of @var{string} as it can interpret as a number.  (On some
+systems it ignores other whitespace at the beginning, not just spaces
+and tabs.)  If the first character after the ignored whitespace is not a
+digit or a minus sign, this function returns 0.
 
 @example
 (string-to-number "256")
@@ -458,6 +547,21 @@
 @code{string-to-int} is an obsolete alias for this function.
 @end defun
 
+  Here are some other functions that can convert to or from a string:
+
+@table @code
+@item concat
+@code{concat} can convert a vector or a list into a string.
+@xref{Creating Strings}.
+
+@item vconcat
+@code{vconcat} can convert a string into a vector.  @xref{Vector
+Functions}.
+
+@item append
+@code{append} can convert a string into a list.  @xref{Building Lists}.
+@end table
+
 @node Formatting Strings
 @comment  node-name,  next,  previous,  up
 @section Formatting Strings
@@ -514,16 +618,18 @@
 @table @samp
 @item %s
 Replace the specification with the printed representation of the object,
-made without quoting.  Thus, strings are represented by their contents
-alone, with no @samp{"} characters, and symbols appear without @samp{\}
-characters.
+made without quoting (that is, using @code{princ}, not
+@code{print}---@pxref{Output Functions}).  Thus, strings are represented
+by their contents alone, with no @samp{"} characters, and symbols appear
+without @samp{\} characters.
 
 If there is no corresponding object, the empty string is used.
 
 @item %S
 Replace the specification with the printed representation of the object,
-made with quoting.  Thus, strings are enclosed in @samp{"} characters,
-and @samp{\} characters appear where necessary before special characters.
+made with quoting (that is, using @code{prin1}---@pxref{Output
+Functions}).  Thus, strings are enclosed in @samp{"} characters, and
+@samp{\} characters appear where necessary before special characters.
 
 If there is no corresponding object, the empty string is used.
 
@@ -593,7 +699,7 @@
 The padding is on the left if the prefix is positive (or starts with
 zero) and on the right if the prefix is negative.  The padding character
 is normally a space, but if the numeric prefix starts with a zero, zeros
-are used for padding.
+are used for padding.  Here are some examples of padding:
 
 @example
 (format "%06d is padded on the left with zeros" 123)
@@ -728,41 +834,48 @@
 table}.  A case table specifies the mapping between upper case and lower
 case letters.  It affects both the string and character case conversion
 functions (see the previous section) and those that apply to text in the
-buffer (@pxref{Case Changes}).  You need a case table if you are using a
-language which has letters other than the standard @sc{ASCII} letters.
+buffer (@pxref{Case Changes}).
 
-  A case table is a list of this form:
+  A case table is a char-table whose subtype is @code{case-table}.  This
+char-table maps each character into the corresponding lower case
+character  It has three extra slots, which are related tables:
 
-@example
-(@var{downcase} @var{upcase} @var{canonicalize} @var{equivalences})
-@end example
+@table @var
+@item upcase
+The upcase table maps each character into the corresponding upper
+case character.
+@item canonicalize
+The canonicalize table maps all of a set of case-related characters
+into some one of them.
+@item equivalences
+The equivalences table maps each of a set of case-related characters
+into the next one in that set.
+@end table
 
-@noindent
-where each element is either @code{nil} or a string of length 256.  The
-element @var{downcase} says how to map each character to its lower-case
-equivalent.  The element @var{upcase} maps each character to its
-upper-case equivalent.  If lower and upper case characters are in
-one-to-one correspondence, use @code{nil} for @var{upcase}; then Emacs
-deduces the upcase table from @var{downcase}.
+  In simple cases, all you need to specify is the mapping to lower-case;
+the three related tables will be calculated automatically from that one.
 
   For some languages, upper and lower case letters are not in one-to-one
 correspondence.  There may be two different lower case letters with the
 same upper case equivalent.  In these cases, you need to specify the
-maps for both directions.
-
-  The element @var{canonicalize} maps each character to a canonical
-equivalent; any two characters that are related by case-conversion have
-the same canonical equivalent character.
+maps for both lower case and upper case.
 
-  The element @var{equivalences} is a map that cyclicly permutes each
-equivalence class (of characters with the same canonical equivalent).
-(For ordinary @sc{ASCII}, this would map @samp{a} into @samp{A} and
-@samp{A} into @samp{a}, and likewise for each set of equivalent
-characters.)
+  The extra table @var{canonicalize} maps each character to a canonical
+equivalent; any two characters that are related by case-conversion have
+the same canonical equivalent character.  For example, since @samp{a}
+and @samp{A} are related by case-conversion, they should have the same
+canonical equivalent character (which should be either @samp{a} for both
+of them, or @samp{A} for both of them).
+
+  The extra table @var{equivalences} is a map that cyclicly permutes
+each equivalence class (of characters with the same canonical
+equivalent).  (For ordinary @sc{ASCII}, this would map @samp{a} into
+@samp{A} and @samp{A} into @samp{a}, and likewise for each set of
+equivalent characters.)
 
   When you construct a case table, you can provide @code{nil} for
-@var{canonicalize}; then Emacs fills in this string from @var{upcase}
-and @var{downcase}.  You can also provide @code{nil} for
+@var{canonicalize}; then Emacs fills in this string from the lower case
+and upper case mappings.  You can also provide @code{nil} for
 @var{equivalences}; then Emacs fills in this string from
 @var{canonicalize}.  In a case table that is actually in use, those
 components are non-@code{nil}.  Do not try to specify @var{equivalences}
@@ -797,22 +910,21 @@
 @end defun
 
   The following three functions are convenient subroutines for packages
-that define non-@sc{ASCII} character sets.  They modify a string
-@var{downcase-table} provided as an argument; this should be a string to
-be used as the @var{downcase} part of a case table.  They also modify
-the standard syntax table.  @xref{Syntax Tables}.
+that define non-@sc{ASCII} character sets.  They modify the specified
+case table @var{case-table}; they also modify the standard syntax table.
+@xref{Syntax Tables}.
 
-@defun set-case-syntax-pair uc lc downcase-table
+@defun set-case-syntax-pair uc lc case-table
 This function specifies a pair of corresponding letters, one upper case
 and one lower case.
 @end defun
 
-@defun set-case-syntax-delims l r downcase-table
+@defun set-case-syntax-delims l r case-table
 This function makes characters @var{l} and @var{r} a matching pair of
 case-invariant delimiters.
 @end defun
 
-@defun set-case-syntax char syntax downcase-table
+@defun set-case-syntax char syntax case-table
 This function makes @var{char} case-invariant, with syntax
 @var{syntax}.
 @end defun
@@ -821,8 +933,3 @@
 This command displays a description of the contents of the current
 buffer's case table.
 @end deffn
-
-@cindex ISO Latin 1
-@pindex iso-syntax
-You can load the library @file{iso-syntax} to set up the standard syntax
-table and define a case table for the 8-bit ISO Latin 1 character set.
author	Richard M. Stallman <rms@gnu.org>
date	Sat, 28 Feb 1998 01:53:53 +0000
parents	a4a1d7df2e7f
children	90da2489c498