Mercurial > emacs
comparison lispref/objects.texi @ 52978:1a5c50faf357
Replace @sc{foo} with @acronym{FOO}.
author | Eli Zaretskii <eliz@gnu.org> |
---|---|
date | Sun, 02 Nov 2003 06:29:59 +0000 |
parents | 8b6f25832ac6 |
children | 80af4875c661 |
comparison
equal
deleted
inserted
replaced
52977:8af8c70252c1 | 52978:1a5c50faf357 |
---|---|
214 | 214 |
215 @xref{Numbers}, for more information. | 215 @xref{Numbers}, for more information. |
216 | 216 |
217 @node Character Type | 217 @node Character Type |
218 @subsection Character Type | 218 @subsection Character Type |
219 @cindex @sc{ascii} character codes | 219 @cindex @acronym{ASCII} character codes |
220 | 220 |
221 A @dfn{character} in Emacs Lisp is nothing more than an integer. In | 221 A @dfn{character} in Emacs Lisp is nothing more than an integer. In |
222 other words, characters are represented by their character codes. For | 222 other words, characters are represented by their character codes. For |
223 example, the character @kbd{A} is represented as the @w{integer 65}. | 223 example, the character @kbd{A} is represented as the @w{integer 65}. |
224 | 224 |
226 common to work with @emph{strings}, which are sequences composed of | 226 common to work with @emph{strings}, which are sequences composed of |
227 characters. @xref{String Type}. | 227 characters. @xref{String Type}. |
228 | 228 |
229 Characters in strings, buffers, and files are currently limited to the | 229 Characters in strings, buffers, and files are currently limited to the |
230 range of 0 to 524287---nineteen bits. But not all values in that range | 230 range of 0 to 524287---nineteen bits. But not all values in that range |
231 are valid character codes. Codes 0 through 127 are @sc{ascii} codes; the | 231 are valid character codes. Codes 0 through 127 are @acronym{ASCII} codes; the |
232 rest are non-@sc{ascii} (@pxref{Non-ASCII Characters}). Characters that represent | 232 rest are non-@acronym{ASCII} (@pxref{Non-ASCII Characters}). Characters that represent |
233 keyboard input have a much wider range, to encode modifier keys such as | 233 keyboard input have a much wider range, to encode modifier keys such as |
234 Control, Meta and Shift. | 234 Control, Meta and Shift. |
235 | 235 |
236 @cindex read syntax for characters | 236 @cindex read syntax for characters |
237 @cindex printed representation for characters | 237 @cindex printed representation for characters |
321 @example | 321 @example |
322 ?\^I @result{} 9 ?\C-I @result{} 9 | 322 ?\^I @result{} 9 ?\C-I @result{} 9 |
323 @end example | 323 @end example |
324 | 324 |
325 In strings and buffers, the only control characters allowed are those | 325 In strings and buffers, the only control characters allowed are those |
326 that exist in @sc{ascii}; but for keyboard input purposes, you can turn | 326 that exist in @acronym{ASCII}; but for keyboard input purposes, you can turn |
327 any character into a control character with @samp{C-}. The character | 327 any character into a control character with @samp{C-}. The character |
328 codes for these non-@sc{ascii} control characters include the | 328 codes for these non-@acronym{ASCII} control characters include the |
329 @tex | 329 @tex |
330 @math{2^{26}} | 330 @math{2^{26}} |
331 @end tex | 331 @end tex |
332 @ifnottex | 332 @ifnottex |
333 2**26 | 333 2**26 |
334 @end ifnottex | 334 @end ifnottex |
335 bit as well as the code for the corresponding non-control | 335 bit as well as the code for the corresponding non-control |
336 character. Ordinary terminals have no way of generating non-@sc{ascii} | 336 character. Ordinary terminals have no way of generating non-@acronym{ASCII} |
337 control characters, but you can generate them straightforwardly using X | 337 control characters, but you can generate them straightforwardly using X |
338 and other window systems. | 338 and other window systems. |
339 | 339 |
340 For historical reasons, Emacs treats the @key{DEL} character as | 340 For historical reasons, Emacs treats the @key{DEL} character as |
341 the control equivalent of @kbd{?}: | 341 the control equivalent of @kbd{?}: |
373 @math{2^{7}} | 373 @math{2^{7}} |
374 @end tex | 374 @end tex |
375 @ifnottex | 375 @ifnottex |
376 2**7 | 376 2**7 |
377 @end ifnottex | 377 @end ifnottex |
378 bit attached to an @sc{ascii} character indicates a meta character; thus, the | 378 bit attached to an @acronym{ASCII} character indicates a meta character; thus, the |
379 meta characters that can fit in a string have codes in the range from | 379 meta characters that can fit in a string have codes in the range from |
380 128 to 255, and are the meta versions of the ordinary @sc{ascii} | 380 128 to 255, and are the meta versions of the ordinary @acronym{ASCII} |
381 characters. (In Emacs versions 18 and older, this convention was used | 381 characters. (In Emacs versions 18 and older, this convention was used |
382 for characters outside of strings as well.) | 382 for characters outside of strings as well.) |
383 | 383 |
384 The read syntax for meta characters uses @samp{\M-}. For example, | 384 The read syntax for meta characters uses @samp{\M-}. For example, |
385 @samp{?\M-A} stands for @kbd{M-A}. You can use @samp{\M-} together with | 385 @samp{?\M-A} stands for @kbd{M-A}. You can use @samp{\M-} together with |
387 syntax for a character. Thus, you can write @kbd{M-A} as @samp{?\M-A}, | 387 syntax for a character. Thus, you can write @kbd{M-A} as @samp{?\M-A}, |
388 or as @samp{?\M-\101}. Likewise, you can write @kbd{C-M-b} as | 388 or as @samp{?\M-\101}. Likewise, you can write @kbd{C-M-b} as |
389 @samp{?\M-\C-b}, @samp{?\C-\M-b}, or @samp{?\M-\002}. | 389 @samp{?\M-\C-b}, @samp{?\C-\M-b}, or @samp{?\M-\002}. |
390 | 390 |
391 The case of a graphic character is indicated by its character code; | 391 The case of a graphic character is indicated by its character code; |
392 for example, @sc{ascii} distinguishes between the characters @samp{a} | 392 for example, @acronym{ASCII} distinguishes between the characters @samp{a} |
393 and @samp{A}. But @sc{ascii} has no way to represent whether a control | 393 and @samp{A}. But @acronym{ASCII} has no way to represent whether a control |
394 character is upper case or lower case. Emacs uses the | 394 character is upper case or lower case. Emacs uses the |
395 @tex | 395 @tex |
396 @math{2^{25}} | 396 @math{2^{25}} |
397 @end tex | 397 @end tex |
398 @ifnottex | 398 @ifnottex |
430 Finally, the most general read syntax for a character represents the | 430 Finally, the most general read syntax for a character represents the |
431 character code in either octal or hex. To use octal, write a question | 431 character code in either octal or hex. To use octal, write a question |
432 mark followed by a backslash and the octal character code (up to three | 432 mark followed by a backslash and the octal character code (up to three |
433 octal digits); thus, @samp{?\101} for the character @kbd{A}, | 433 octal digits); thus, @samp{?\101} for the character @kbd{A}, |
434 @samp{?\001} for the character @kbd{C-a}, and @code{?\002} for the | 434 @samp{?\001} for the character @kbd{C-a}, and @code{?\002} for the |
435 character @kbd{C-b}. Although this syntax can represent any @sc{ascii} | 435 character @kbd{C-b}. Although this syntax can represent any @acronym{ASCII} |
436 character, it is preferred only when the precise octal value is more | 436 character, it is preferred only when the precise octal value is more |
437 important than the @sc{ascii} representation. | 437 important than the @acronym{ASCII} representation. |
438 | 438 |
439 @example | 439 @example |
440 @group | 440 @group |
441 ?\012 @result{} 10 ?\n @result{} 10 ?\C-j @result{} 10 | 441 ?\012 @result{} 10 ?\n @result{} 10 ?\C-j @result{} 10 |
442 ?\101 @result{} 65 ?A @result{} 65 | 442 ?\101 @result{} 65 ?A @result{} 65 |
913 in documentation strings, | 913 in documentation strings, |
914 but the newline is ignored if escaped." | 914 but the newline is ignored if escaped." |
915 @end example | 915 @end example |
916 | 916 |
917 @node Non-ASCII in Strings | 917 @node Non-ASCII in Strings |
918 @subsubsection Non-@sc{ascii} Characters in Strings | 918 @subsubsection Non-@acronym{ASCII} Characters in Strings |
919 | 919 |
920 You can include a non-@sc{ascii} international character in a string | 920 You can include a non-@acronym{ASCII} international character in a string |
921 constant by writing it literally. There are two text representations | 921 constant by writing it literally. There are two text representations |
922 for non-@sc{ascii} characters in Emacs strings (and in buffers): unibyte | 922 for non-@acronym{ASCII} characters in Emacs strings (and in buffers): unibyte |
923 and multibyte. If the string constant is read from a multibyte source, | 923 and multibyte. If the string constant is read from a multibyte source, |
924 such as a multibyte buffer or string, or a file that would be visited as | 924 such as a multibyte buffer or string, or a file that would be visited as |
925 multibyte, then the character is read as a multibyte character, and that | 925 multibyte, then the character is read as a multibyte character, and that |
926 makes the string multibyte. If the string constant is read from a | 926 makes the string multibyte. If the string constant is read from a |
927 unibyte source, then the character is read as unibyte and that makes the | 927 unibyte source, then the character is read as unibyte and that makes the |
928 string unibyte. | 928 string unibyte. |
929 | 929 |
930 You can also represent a multibyte non-@sc{ascii} character with its | 930 You can also represent a multibyte non-@acronym{ASCII} character with its |
931 character code: use a hex escape, @samp{\x@var{nnnnnnn}}, with as many | 931 character code: use a hex escape, @samp{\x@var{nnnnnnn}}, with as many |
932 digits as necessary. (Multibyte non-@sc{ascii} character codes are all | 932 digits as necessary. (Multibyte non-@acronym{ASCII} character codes are all |
933 greater than 256.) Any character which is not a valid hex digit | 933 greater than 256.) Any character which is not a valid hex digit |
934 terminates this construct. If the next character in the string could be | 934 terminates this construct. If the next character in the string could be |
935 interpreted as a hex digit, write @w{@samp{\ }} (backslash and space) to | 935 interpreted as a hex digit, write @w{@samp{\ }} (backslash and space) to |
936 terminate the hex escape---for example, @w{@samp{\x8e0\ }} represents | 936 terminate the hex escape---for example, @w{@samp{\x8e0\ }} represents |
937 one character, @samp{a} with grave accent. @w{@samp{\ }} in a string | 937 one character, @samp{a} with grave accent. @w{@samp{\ }} in a string |
938 constant is just like backslash-newline; it does not contribute any | 938 constant is just like backslash-newline; it does not contribute any |
939 character to the string, but it does terminate the preceding hex escape. | 939 character to the string, but it does terminate the preceding hex escape. |
940 | 940 |
941 Using a multibyte hex escape forces the string to multibyte. You can | 941 Using a multibyte hex escape forces the string to multibyte. You can |
942 represent a unibyte non-@sc{ascii} character with its character code, | 942 represent a unibyte non-@acronym{ASCII} character with its character code, |
943 which must be in the range from 128 (0200 octal) to 255 (0377 octal). | 943 which must be in the range from 128 (0200 octal) to 255 (0377 octal). |
944 This forces a unibyte string. | 944 This forces a unibyte string. |
945 | 945 |
946 @xref{Text Representations}, for more information about the two | 946 @xref{Text Representations}, for more information about the two |
947 text representations. | 947 text representations. |
956 them, like this: @code{"\t, \C-a"}. @xref{Character Type}, for a | 956 them, like this: @code{"\t, \C-a"}. @xref{Character Type}, for a |
957 description of the read syntax for characters. | 957 description of the read syntax for characters. |
958 | 958 |
959 However, not all of the characters you can write with backslash | 959 However, not all of the characters you can write with backslash |
960 escape-sequences are valid in strings. The only control characters that | 960 escape-sequences are valid in strings. The only control characters that |
961 a string can hold are the @sc{ascii} control characters. Strings do not | 961 a string can hold are the @acronym{ASCII} control characters. Strings do not |
962 distinguish case in @sc{ascii} control characters. | 962 distinguish case in @acronym{ASCII} control characters. |
963 | 963 |
964 Properly speaking, strings cannot hold meta characters; but when a | 964 Properly speaking, strings cannot hold meta characters; but when a |
965 string is to be used as a key sequence, there is a special convention | 965 string is to be used as a key sequence, there is a special convention |
966 that provides a way to represent meta versions of @sc{ascii} characters in a | 966 that provides a way to represent meta versions of @acronym{ASCII} characters in a |
967 string. If you use the @samp{\M-} syntax to indicate a meta character | 967 string. If you use the @samp{\M-} syntax to indicate a meta character |
968 in a string constant, this sets the | 968 in a string constant, this sets the |
969 @tex | 969 @tex |
970 @math{2^{7}} | 970 @math{2^{7}} |
971 @end tex | 971 @end tex |
1875 @end example | 1875 @end example |
1876 | 1876 |
1877 Comparison of strings is case-sensitive, but does not take account of | 1877 Comparison of strings is case-sensitive, but does not take account of |
1878 text properties---it compares only the characters in the strings. | 1878 text properties---it compares only the characters in the strings. |
1879 A unibyte string never equals a multibyte string unless the | 1879 A unibyte string never equals a multibyte string unless the |
1880 contents are entirely @sc{ascii} (@pxref{Text Representations}). | 1880 contents are entirely @acronym{ASCII} (@pxref{Text Representations}). |
1881 | 1881 |
1882 @example | 1882 @example |
1883 @group | 1883 @group |
1884 (equal "asdf" "ASDF") | 1884 (equal "asdf" "ASDF") |
1885 @result{} nil | 1885 @result{} nil |