comparison lispref/objects.texi @ 52978:1a5c50faf357

Replace @sc{foo} with @acronym{FOO}.
author Eli Zaretskii <eliz@gnu.org>
date Sun, 02 Nov 2003 06:29:59 +0000
parents 8b6f25832ac6
children 80af4875c661
comparison
equal deleted inserted replaced
52977:8af8c70252c1 52978:1a5c50faf357
214 214
215 @xref{Numbers}, for more information. 215 @xref{Numbers}, for more information.
216 216
217 @node Character Type 217 @node Character Type
218 @subsection Character Type 218 @subsection Character Type
219 @cindex @sc{ascii} character codes 219 @cindex @acronym{ASCII} character codes
220 220
221 A @dfn{character} in Emacs Lisp is nothing more than an integer. In 221 A @dfn{character} in Emacs Lisp is nothing more than an integer. In
222 other words, characters are represented by their character codes. For 222 other words, characters are represented by their character codes. For
223 example, the character @kbd{A} is represented as the @w{integer 65}. 223 example, the character @kbd{A} is represented as the @w{integer 65}.
224 224
226 common to work with @emph{strings}, which are sequences composed of 226 common to work with @emph{strings}, which are sequences composed of
227 characters. @xref{String Type}. 227 characters. @xref{String Type}.
228 228
229 Characters in strings, buffers, and files are currently limited to the 229 Characters in strings, buffers, and files are currently limited to the
230 range of 0 to 524287---nineteen bits. But not all values in that range 230 range of 0 to 524287---nineteen bits. But not all values in that range
231 are valid character codes. Codes 0 through 127 are @sc{ascii} codes; the 231 are valid character codes. Codes 0 through 127 are @acronym{ASCII} codes; the
232 rest are non-@sc{ascii} (@pxref{Non-ASCII Characters}). Characters that represent 232 rest are non-@acronym{ASCII} (@pxref{Non-ASCII Characters}). Characters that represent
233 keyboard input have a much wider range, to encode modifier keys such as 233 keyboard input have a much wider range, to encode modifier keys such as
234 Control, Meta and Shift. 234 Control, Meta and Shift.
235 235
236 @cindex read syntax for characters 236 @cindex read syntax for characters
237 @cindex printed representation for characters 237 @cindex printed representation for characters
321 @example 321 @example
322 ?\^I @result{} 9 ?\C-I @result{} 9 322 ?\^I @result{} 9 ?\C-I @result{} 9
323 @end example 323 @end example
324 324
325 In strings and buffers, the only control characters allowed are those 325 In strings and buffers, the only control characters allowed are those
326 that exist in @sc{ascii}; but for keyboard input purposes, you can turn 326 that exist in @acronym{ASCII}; but for keyboard input purposes, you can turn
327 any character into a control character with @samp{C-}. The character 327 any character into a control character with @samp{C-}. The character
328 codes for these non-@sc{ascii} control characters include the 328 codes for these non-@acronym{ASCII} control characters include the
329 @tex 329 @tex
330 @math{2^{26}} 330 @math{2^{26}}
331 @end tex 331 @end tex
332 @ifnottex 332 @ifnottex
333 2**26 333 2**26
334 @end ifnottex 334 @end ifnottex
335 bit as well as the code for the corresponding non-control 335 bit as well as the code for the corresponding non-control
336 character. Ordinary terminals have no way of generating non-@sc{ascii} 336 character. Ordinary terminals have no way of generating non-@acronym{ASCII}
337 control characters, but you can generate them straightforwardly using X 337 control characters, but you can generate them straightforwardly using X
338 and other window systems. 338 and other window systems.
339 339
340 For historical reasons, Emacs treats the @key{DEL} character as 340 For historical reasons, Emacs treats the @key{DEL} character as
341 the control equivalent of @kbd{?}: 341 the control equivalent of @kbd{?}:
373 @math{2^{7}} 373 @math{2^{7}}
374 @end tex 374 @end tex
375 @ifnottex 375 @ifnottex
376 2**7 376 2**7
377 @end ifnottex 377 @end ifnottex
378 bit attached to an @sc{ascii} character indicates a meta character; thus, the 378 bit attached to an @acronym{ASCII} character indicates a meta character; thus, the
379 meta characters that can fit in a string have codes in the range from 379 meta characters that can fit in a string have codes in the range from
380 128 to 255, and are the meta versions of the ordinary @sc{ascii} 380 128 to 255, and are the meta versions of the ordinary @acronym{ASCII}
381 characters. (In Emacs versions 18 and older, this convention was used 381 characters. (In Emacs versions 18 and older, this convention was used
382 for characters outside of strings as well.) 382 for characters outside of strings as well.)
383 383
384 The read syntax for meta characters uses @samp{\M-}. For example, 384 The read syntax for meta characters uses @samp{\M-}. For example,
385 @samp{?\M-A} stands for @kbd{M-A}. You can use @samp{\M-} together with 385 @samp{?\M-A} stands for @kbd{M-A}. You can use @samp{\M-} together with
387 syntax for a character. Thus, you can write @kbd{M-A} as @samp{?\M-A}, 387 syntax for a character. Thus, you can write @kbd{M-A} as @samp{?\M-A},
388 or as @samp{?\M-\101}. Likewise, you can write @kbd{C-M-b} as 388 or as @samp{?\M-\101}. Likewise, you can write @kbd{C-M-b} as
389 @samp{?\M-\C-b}, @samp{?\C-\M-b}, or @samp{?\M-\002}. 389 @samp{?\M-\C-b}, @samp{?\C-\M-b}, or @samp{?\M-\002}.
390 390
391 The case of a graphic character is indicated by its character code; 391 The case of a graphic character is indicated by its character code;
392 for example, @sc{ascii} distinguishes between the characters @samp{a} 392 for example, @acronym{ASCII} distinguishes between the characters @samp{a}
393 and @samp{A}. But @sc{ascii} has no way to represent whether a control 393 and @samp{A}. But @acronym{ASCII} has no way to represent whether a control
394 character is upper case or lower case. Emacs uses the 394 character is upper case or lower case. Emacs uses the
395 @tex 395 @tex
396 @math{2^{25}} 396 @math{2^{25}}
397 @end tex 397 @end tex
398 @ifnottex 398 @ifnottex
430 Finally, the most general read syntax for a character represents the 430 Finally, the most general read syntax for a character represents the
431 character code in either octal or hex. To use octal, write a question 431 character code in either octal or hex. To use octal, write a question
432 mark followed by a backslash and the octal character code (up to three 432 mark followed by a backslash and the octal character code (up to three
433 octal digits); thus, @samp{?\101} for the character @kbd{A}, 433 octal digits); thus, @samp{?\101} for the character @kbd{A},
434 @samp{?\001} for the character @kbd{C-a}, and @code{?\002} for the 434 @samp{?\001} for the character @kbd{C-a}, and @code{?\002} for the
435 character @kbd{C-b}. Although this syntax can represent any @sc{ascii} 435 character @kbd{C-b}. Although this syntax can represent any @acronym{ASCII}
436 character, it is preferred only when the precise octal value is more 436 character, it is preferred only when the precise octal value is more
437 important than the @sc{ascii} representation. 437 important than the @acronym{ASCII} representation.
438 438
439 @example 439 @example
440 @group 440 @group
441 ?\012 @result{} 10 ?\n @result{} 10 ?\C-j @result{} 10 441 ?\012 @result{} 10 ?\n @result{} 10 ?\C-j @result{} 10
442 ?\101 @result{} 65 ?A @result{} 65 442 ?\101 @result{} 65 ?A @result{} 65
913 in documentation strings, 913 in documentation strings,
914 but the newline is ignored if escaped." 914 but the newline is ignored if escaped."
915 @end example 915 @end example
916 916
917 @node Non-ASCII in Strings 917 @node Non-ASCII in Strings
918 @subsubsection Non-@sc{ascii} Characters in Strings 918 @subsubsection Non-@acronym{ASCII} Characters in Strings
919 919
920 You can include a non-@sc{ascii} international character in a string 920 You can include a non-@acronym{ASCII} international character in a string
921 constant by writing it literally. There are two text representations 921 constant by writing it literally. There are two text representations
922 for non-@sc{ascii} characters in Emacs strings (and in buffers): unibyte 922 for non-@acronym{ASCII} characters in Emacs strings (and in buffers): unibyte
923 and multibyte. If the string constant is read from a multibyte source, 923 and multibyte. If the string constant is read from a multibyte source,
924 such as a multibyte buffer or string, or a file that would be visited as 924 such as a multibyte buffer or string, or a file that would be visited as
925 multibyte, then the character is read as a multibyte character, and that 925 multibyte, then the character is read as a multibyte character, and that
926 makes the string multibyte. If the string constant is read from a 926 makes the string multibyte. If the string constant is read from a
927 unibyte source, then the character is read as unibyte and that makes the 927 unibyte source, then the character is read as unibyte and that makes the
928 string unibyte. 928 string unibyte.
929 929
930 You can also represent a multibyte non-@sc{ascii} character with its 930 You can also represent a multibyte non-@acronym{ASCII} character with its
931 character code: use a hex escape, @samp{\x@var{nnnnnnn}}, with as many 931 character code: use a hex escape, @samp{\x@var{nnnnnnn}}, with as many
932 digits as necessary. (Multibyte non-@sc{ascii} character codes are all 932 digits as necessary. (Multibyte non-@acronym{ASCII} character codes are all
933 greater than 256.) Any character which is not a valid hex digit 933 greater than 256.) Any character which is not a valid hex digit
934 terminates this construct. If the next character in the string could be 934 terminates this construct. If the next character in the string could be
935 interpreted as a hex digit, write @w{@samp{\ }} (backslash and space) to 935 interpreted as a hex digit, write @w{@samp{\ }} (backslash and space) to
936 terminate the hex escape---for example, @w{@samp{\x8e0\ }} represents 936 terminate the hex escape---for example, @w{@samp{\x8e0\ }} represents
937 one character, @samp{a} with grave accent. @w{@samp{\ }} in a string 937 one character, @samp{a} with grave accent. @w{@samp{\ }} in a string
938 constant is just like backslash-newline; it does not contribute any 938 constant is just like backslash-newline; it does not contribute any
939 character to the string, but it does terminate the preceding hex escape. 939 character to the string, but it does terminate the preceding hex escape.
940 940
941 Using a multibyte hex escape forces the string to multibyte. You can 941 Using a multibyte hex escape forces the string to multibyte. You can
942 represent a unibyte non-@sc{ascii} character with its character code, 942 represent a unibyte non-@acronym{ASCII} character with its character code,
943 which must be in the range from 128 (0200 octal) to 255 (0377 octal). 943 which must be in the range from 128 (0200 octal) to 255 (0377 octal).
944 This forces a unibyte string. 944 This forces a unibyte string.
945 945
946 @xref{Text Representations}, for more information about the two 946 @xref{Text Representations}, for more information about the two
947 text representations. 947 text representations.
956 them, like this: @code{"\t, \C-a"}. @xref{Character Type}, for a 956 them, like this: @code{"\t, \C-a"}. @xref{Character Type}, for a
957 description of the read syntax for characters. 957 description of the read syntax for characters.
958 958
959 However, not all of the characters you can write with backslash 959 However, not all of the characters you can write with backslash
960 escape-sequences are valid in strings. The only control characters that 960 escape-sequences are valid in strings. The only control characters that
961 a string can hold are the @sc{ascii} control characters. Strings do not 961 a string can hold are the @acronym{ASCII} control characters. Strings do not
962 distinguish case in @sc{ascii} control characters. 962 distinguish case in @acronym{ASCII} control characters.
963 963
964 Properly speaking, strings cannot hold meta characters; but when a 964 Properly speaking, strings cannot hold meta characters; but when a
965 string is to be used as a key sequence, there is a special convention 965 string is to be used as a key sequence, there is a special convention
966 that provides a way to represent meta versions of @sc{ascii} characters in a 966 that provides a way to represent meta versions of @acronym{ASCII} characters in a
967 string. If you use the @samp{\M-} syntax to indicate a meta character 967 string. If you use the @samp{\M-} syntax to indicate a meta character
968 in a string constant, this sets the 968 in a string constant, this sets the
969 @tex 969 @tex
970 @math{2^{7}} 970 @math{2^{7}}
971 @end tex 971 @end tex
1875 @end example 1875 @end example
1876 1876
1877 Comparison of strings is case-sensitive, but does not take account of 1877 Comparison of strings is case-sensitive, but does not take account of
1878 text properties---it compares only the characters in the strings. 1878 text properties---it compares only the characters in the strings.
1879 A unibyte string never equals a multibyte string unless the 1879 A unibyte string never equals a multibyte string unless the
1880 contents are entirely @sc{ascii} (@pxref{Text Representations}). 1880 contents are entirely @acronym{ASCII} (@pxref{Text Representations}).
1881 1881
1882 @example 1882 @example
1883 @group 1883 @group
1884 (equal "asdf" "ASDF") 1884 (equal "asdf" "ASDF")
1885 @result{} nil 1885 @result{} nil