Mercurial > emacs
changeset 53217:80af4875c661
(Non-ASCII in Strings): Clarify description of when a string is
unibyte or multibyte.
(Bool-Vector Type): Update examples.
(Equality Predicates): Correctly describe when two strings are `equal'.
author | Luc Teirlinck <teirllm@auburn.edu> |
---|---|
date | Mon, 01 Dec 2003 03:57:00 +0000 |
parents | 4f30a35fdb55 |
children | 917e6aba04d3 |
files | lispref/objects.texi |
diffstat | 1 files changed, 44 insertions(+), 28 deletions(-) [+] |
line wrap: on
line diff
--- a/lispref/objects.texi Mon Dec 01 02:29:01 2003 +0000 +++ b/lispref/objects.texi Mon Dec 01 03:57:00 2003 +0000 @@ -226,11 +226,12 @@ common to work with @emph{strings}, which are sequences composed of characters. @xref{String Type}. - Characters in strings, buffers, and files are currently limited to the -range of 0 to 524287---nineteen bits. But not all values in that range -are valid character codes. Codes 0 through 127 are @acronym{ASCII} codes; the -rest are non-@acronym{ASCII} (@pxref{Non-ASCII Characters}). Characters that represent -keyboard input have a much wider range, to encode modifier keys such as + Characters in strings, buffers, and files are currently limited to +the range of 0 to 524287---nineteen bits. But not all values in that +range are valid character codes. Codes 0 through 127 are +@acronym{ASCII} codes; the rest are non-@acronym{ASCII} +(@pxref{Non-ASCII Characters}). Characters that represent keyboard +input have a much wider range, to encode modifier keys such as Control, Meta and Shift. @cindex read syntax for characters @@ -375,11 +376,11 @@ @ifnottex 2**7 @end ifnottex -bit attached to an @acronym{ASCII} character indicates a meta character; thus, the -meta characters that can fit in a string have codes in the range from -128 to 255, and are the meta versions of the ordinary @acronym{ASCII} -characters. (In Emacs versions 18 and older, this convention was used -for characters outside of strings as well.) +bit attached to an @acronym{ASCII} character indicates a meta +character; thus, the meta characters that can fit in a string have +codes in the range from 128 to 255, and are the meta versions of the +ordinary @acronym{ASCII} characters. (In Emacs versions 18 and older, +this convention was used for characters outside of strings as well.) The read syntax for meta characters uses @samp{\M-}. For example, @samp{?\M-A} stands for @kbd{M-A}. You can use @samp{\M-} together with @@ -416,8 +417,8 @@ @kbd{Alt-Hyper-Meta-x}. (Note that @samp{\s} with no following @samp{-} represents the space character.) @tex -Numerically, the -bit values are @math{2^{22}} for alt, @math{2^{23}} for super and @math{2^{24}} for hyper. +Numerically, the bit values are @math{2^{22}} for alt, @math{2^{23}} +for super and @math{2^{24}} for hyper. @end tex @ifnottex Numerically, the @@ -938,10 +939,13 @@ constant is just like backslash-newline; it does not contribute any character to the string, but it does terminate the preceding hex escape. - Using a multibyte hex escape forces the string to multibyte. You can -represent a unibyte non-@acronym{ASCII} character with its character code, -which must be in the range from 128 (0200 octal) to 255 (0377 octal). -This forces a unibyte string. + You can represent a unibyte non-@acronym{ASCII} character with its +character code, which must be in the range from 128 (0200 octal) to +255 (0377 octal). If you write all such character codes in octal and +the string contains no other characters forcing it to be multibyte, +this produces a unibyte string. However, using any hex escape in a +string (even for an @acronym{ASCII} character) forces the string to be +multibyte. @xref{Text Representations}, for more information about the two text representations. @@ -963,9 +967,9 @@ Properly speaking, strings cannot hold meta characters; but when a string is to be used as a key sequence, there is a special convention -that provides a way to represent meta versions of @acronym{ASCII} characters in a -string. If you use the @samp{\M-} syntax to indicate a meta character -in a string constant, this sets the +that provides a way to represent meta versions of @acronym{ASCII} +characters in a string. If you use the @samp{\M-} syntax to indicate +a meta character in a string constant, this sets the @tex @math{2^{7}} @end tex @@ -1082,16 +1086,25 @@ as a bitmap---each ``character'' in the string contains 8 bits, which specify the next 8 elements of the bool-vector (1 stands for @code{t}, and 0 for @code{nil}). The least significant bits of the character -correspond to the lowest indices in the bool-vector. If the length is not a -multiple of 8, the printed representation shows extra elements, but -these extras really make no difference. +correspond to the lowest indices in the bool-vector. @example (make-bool-vector 3 t) - @result{} #&3"\007" + @result{} #&3"^G" (make-bool-vector 3 nil) - @result{} #&3"\0" -;; @r{These are equal since only the first 3 bits are used.} + @result{} #&3"^@@" +@end example + +@noindent +These results make sense, because the binary code for @samp{C-g} is +111 and @samp{C-@@} is the character with code 0. + + If the length is not a multiple of 8, the printed representation +shows extra elements, but these extras really make no difference. For +instance, in the next example, the two bool-vectors are equal, because +only the first 3 bits are used: + +@example (equal #&3"\377" #&3"\007") @result{} t @end example @@ -1875,9 +1888,12 @@ @end example Comparison of strings is case-sensitive, but does not take account of -text properties---it compares only the characters in the strings. -A unibyte string never equals a multibyte string unless the -contents are entirely @acronym{ASCII} (@pxref{Text Representations}). +text properties---it compares only the characters in the strings. For +technical reasons, a unibyte string and a multibyte string are +@code{equal} if and only if they contain the same sequence of +character codes and all these codes are either in the range 0 through +127 (@acronym{ASCII}) or 160 through 255 (@code{eight-bit-graphic}). +(@pxref{Text Representations}). @example @group