comparison lispref/objects.texi @ 72859:c5744ceda9ba

(Character Type): Node split. Add xref to Describing Characters. (Basic Char Syntax, General Escape Syntax) (Ctl-Char Syntax, Meta-Char Syntax): New subnodes.
author Richard M. Stallman <rms@gnu.org>
date Thu, 14 Sep 2006 01:43:18 +0000
parents a02949a3a808
children 6d19c76d81c5 a1a25ac6c88a
comparison
equal deleted inserted replaced
72858:a9629d84bf9f 72859:c5744ceda9ba
225 225
226 A @dfn{character} in Emacs Lisp is nothing more than an integer. In 226 A @dfn{character} in Emacs Lisp is nothing more than an integer. In
227 other words, characters are represented by their character codes. For 227 other words, characters are represented by their character codes. For
228 example, the character @kbd{A} is represented as the @w{integer 65}. 228 example, the character @kbd{A} is represented as the @w{integer 65}.
229 229
230 Individual characters are not often used in programs. It is far more 230 Individual characters are used occasionally in programs, but it is
231 common to work with @emph{strings}, which are sequences composed of 231 more common to work with @emph{strings}, which are sequences composed
232 characters. @xref{String Type}. 232 of characters. @xref{String Type}.
233 233
234 Characters in strings, buffers, and files are currently limited to 234 Characters in strings, buffers, and files are currently limited to
235 the range of 0 to 524287---nineteen bits. But not all values in that 235 the range of 0 to 524287---nineteen bits. But not all values in that
236 range are valid character codes. Codes 0 through 127 are 236 range are valid character codes. Codes 0 through 127 are
237 @acronym{ASCII} codes; the rest are non-@acronym{ASCII} 237 @acronym{ASCII} codes; the rest are non-@acronym{ASCII}
238 (@pxref{Non-ASCII Characters}). Characters that represent keyboard 238 (@pxref{Non-ASCII Characters}). Characters that represent keyboard
239 input have a much wider range, to encode modifier keys such as 239 input have a much wider range, to encode modifier keys such as
240 Control, Meta and Shift. 240 Control, Meta and Shift.
241 241
242 There are special functions for producing a human-readable textual
243 description of a character for the sake of messages. @xref{Describing
244 Characters}.
245
246 @menu
247 * Basic Char Syntax::
248 * General Escape Syntax::
249 * Ctl-Char Syntax::
250 * Meta-Char Syntax::
251 * Other Char Bits::
252 @end menu
253
254 @node Basic Char Syntax
255 @subsubsection Basic Char Syntax
242 @cindex read syntax for characters 256 @cindex read syntax for characters
243 @cindex printed representation for characters 257 @cindex printed representation for characters
244 @cindex syntax for characters 258 @cindex syntax for characters
245 @cindex @samp{?} in character constant 259 @cindex @samp{?} in character constant
246 @cindex question mark in character constant 260 @cindex question mark in character constant
247 Since characters are really integers, the printed representation of a 261
248 character is a decimal number. This is also a possible read syntax for 262 Since characters are really integers, the printed representation of
249 a character, but writing characters that way in Lisp programs is a very 263 a character is a decimal number. This is also a possible read syntax
250 bad idea. You should @emph{always} use the special read syntax formats 264 for a character, but writing characters that way in Lisp programs is
251 that Emacs Lisp provides for characters. These syntax formats start 265 not clear programming. You should @emph{always} use the special read
252 with a question mark. 266 syntax formats that Emacs Lisp provides for characters. These syntax
267 formats start with a question mark.
253 268
254 The usual read syntax for alphanumeric characters is a question mark 269 The usual read syntax for alphanumeric characters is a question mark
255 followed by the character; thus, @samp{?A} for the character 270 followed by the character; thus, @samp{?A} for the character
256 @kbd{A}, @samp{?B} for the character @kbd{B}, and @samp{?a} for the 271 @kbd{A}, @samp{?B} for the character @kbd{B}, and @samp{?a} for the
257 character @kbd{a}. 272 character @kbd{a}.
313 @dfn{escape sequences}, because backslash plays the role of an 328 @dfn{escape sequences}, because backslash plays the role of an
314 ``escape character''; this terminology has nothing to do with the 329 ``escape character''; this terminology has nothing to do with the
315 character @key{ESC}. @samp{\s} is meant for use in character 330 character @key{ESC}. @samp{\s} is meant for use in character
316 constants; in string constants, just write the space. 331 constants; in string constants, just write the space.
317 332
333 A backslash is allowed, and harmless, preceding any character without
334 a special escape meaning; thus, @samp{?\+} is equivalent to @samp{?+}.
335 There is no reason to add a backslash before most characters. However,
336 you should add a backslash before any of the characters
337 @samp{()\|;'`"#.,} to avoid confusing the Emacs commands for editing
338 Lisp code. You can also add a backslash before whitespace characters such as
339 space, tab, newline and formfeed. However, it is cleaner to use one of
340 the easily readable escape sequences, such as @samp{\t} or @samp{\s},
341 instead of an actual whitespace character such as a tab or a space.
342 (If you do write backslash followed by a space, you should write
343 an extra space after the character constant to separate it from the
344 following text.)
345
346 @node General Escape Syntax
347 @subsubsection General Escape Syntax
348
349 In addition to the specific excape sequences for special important
350 control characters, Emacs provides general categories of escape syntax
351 that you can use to specify non-ASCII text characters.
352
353 @cindex unicode character escape
354 For instance, you can specify characters by their Unicode values.
355 @code{?\u@var{nnnn}} represents a character that maps to the Unicode
356 code point @samp{U+@var{nnnn}}. There is a slightly different syntax
357 for specifying characters with code points above @code{#xFFFF};
358 @code{\U00@var{nnnnnn}} represents the character whose Unicode code
359 point is @samp{U+@var{nnnnnn}}, if such a character is supported by
360 Emacs. If the corresponding character is not supported, Emacs signals
361 an error.
362
363 This peculiar and inconvenient syntax was adopted for compatibility
364 with other programming languages. Unlike some other languages, Emacs
365 Lisp supports this syntax in only character literals and strings.
366
367 @cindex @samp{\} in character constant
368 @cindex backslash in character constant
369 @cindex octal character code
370 The most general read syntax for a character represents the
371 character code in either octal or hex. To use octal, write a question
372 mark followed by a backslash and the octal character code (up to three
373 octal digits); thus, @samp{?\101} for the character @kbd{A},
374 @samp{?\001} for the character @kbd{C-a}, and @code{?\002} for the
375 character @kbd{C-b}. Although this syntax can represent any
376 @acronym{ASCII} character, it is preferred only when the precise octal
377 value is more important than the @acronym{ASCII} representation.
378
379 @example
380 @group
381 ?\012 @result{} 10 ?\n @result{} 10 ?\C-j @result{} 10
382 ?\101 @result{} 65 ?A @result{} 65
383 @end group
384 @end example
385
386 To use hex, write a question mark followed by a backslash, @samp{x},
387 and the hexadecimal character code. You can use any number of hex
388 digits, so you can represent any character code in this way.
389 Thus, @samp{?\x41} for the character @kbd{A}, @samp{?\x1} for the
390 character @kbd{C-a}, and @code{?\x8e0} for the Latin-1 character
391 @iftex
392 @samp{@`a}.
393 @end iftex
394 @ifnottex
395 @samp{a} with grave accent.
396 @end ifnottex
397
398 @node Ctl-Char Syntax
399 @subsubsection Control-Character Syntax
400
318 @cindex control characters 401 @cindex control characters
319 Control characters may be represented using yet another read syntax. 402 Control characters can be represented using yet another read syntax.
320 This consists of a question mark followed by a backslash, caret, and the 403 This consists of a question mark followed by a backslash, caret, and the
321 corresponding non-control character, in either upper or lower case. For 404 corresponding non-control character, in either upper or lower case. For
322 example, both @samp{?\^I} and @samp{?\^i} are valid read syntax for the 405 example, both @samp{?\^I} and @samp{?\^i} are valid read syntax for the
323 character @kbd{C-i}, the character whose value is 9. 406 character @kbd{C-i}, the character whose value is 9.
324 407
361 we recommend the @samp{^} syntax; for control characters in keyboard 444 we recommend the @samp{^} syntax; for control characters in keyboard
362 input, we prefer the @samp{C-} syntax. Which one you use does not 445 input, we prefer the @samp{C-} syntax. Which one you use does not
363 affect the meaning of the program, but may guide the understanding of 446 affect the meaning of the program, but may guide the understanding of
364 people who read it. 447 people who read it.
365 448
449 @node Meta-Char Syntax
450 @subsubsection Meta-Character Syntax
451
366 @cindex meta characters 452 @cindex meta characters
367 A @dfn{meta character} is a character typed with the @key{META} 453 A @dfn{meta character} is a character typed with the @key{META}
368 modifier key. The integer that represents such a character has the 454 modifier key. The integer that represents such a character has the
369 @tex 455 @tex
370 @math{2^{27}} 456 @math{2^{27}}
392 @samp{?\M-A} stands for @kbd{M-A}. You can use @samp{\M-} together with 478 @samp{?\M-A} stands for @kbd{M-A}. You can use @samp{\M-} together with
393 octal character codes (see below), with @samp{\C-}, or with any other 479 octal character codes (see below), with @samp{\C-}, or with any other
394 syntax for a character. Thus, you can write @kbd{M-A} as @samp{?\M-A}, 480 syntax for a character. Thus, you can write @kbd{M-A} as @samp{?\M-A},
395 or as @samp{?\M-\101}. Likewise, you can write @kbd{C-M-b} as 481 or as @samp{?\M-\101}. Likewise, you can write @kbd{C-M-b} as
396 @samp{?\M-\C-b}, @samp{?\C-\M-b}, or @samp{?\M-\002}. 482 @samp{?\M-\C-b}, @samp{?\C-\M-b}, or @samp{?\M-\002}.
483
484 @node Other Char Bits
485 @subsubsection Other Character Modifier Bits
397 486
398 The case of a graphic character is indicated by its character code; 487 The case of a graphic character is indicated by its character code;
399 for example, @acronym{ASCII} distinguishes between the characters @samp{a} 488 for example, @acronym{ASCII} distinguishes between the characters @samp{a}
400 and @samp{A}. But @acronym{ASCII} has no way to represent whether a control 489 and @samp{A}. But @acronym{ASCII} has no way to represent whether a control
401 character is upper case or lower case. Emacs uses the 490 character is upper case or lower case. Emacs uses the
429 @ifnottex 518 @ifnottex
430 Numerically, the 519 Numerically, the
431 bit values are 2**22 for alt, 2**23 for super and 2**24 for hyper. 520 bit values are 2**22 for alt, 2**23 for super and 2**24 for hyper.
432 @end ifnottex 521 @end ifnottex
433 522
434 @cindex unicode character escape
435 Emacs provides a syntax for specifying characters by their Unicode
436 code points. @code{?\u@var{nnnn}} represents a character that maps to
437 the Unicode code point @samp{U+@var{nnnn}}. There is a slightly
438 different syntax for specifying characters with code points above
439 @code{#xFFFF}; @code{\U00@var{nnnnnn}} represents the character whose
440 Unicode code point is @samp{U+@var{nnnnnn}}, if such a character
441 is supported by Emacs. If the corresponding character is not
442 supported, Emacs signals an error.
443
444 This peculiar and inconvenient syntax was adopted for compatibility
445 with other programming languages. Unlike some other languages, Emacs
446 Lisp supports this syntax in only character literals and strings.
447
448 @cindex @samp{\} in character constant
449 @cindex backslash in character constant
450 @cindex octal character code
451 Finally, the most general read syntax for a character represents the
452 character code in either octal or hex. To use octal, write a question
453 mark followed by a backslash and the octal character code (up to three
454 octal digits); thus, @samp{?\101} for the character @kbd{A},
455 @samp{?\001} for the character @kbd{C-a}, and @code{?\002} for the
456 character @kbd{C-b}. Although this syntax can represent any @acronym{ASCII}
457 character, it is preferred only when the precise octal value is more
458 important than the @acronym{ASCII} representation.
459
460 @example
461 @group
462 ?\012 @result{} 10 ?\n @result{} 10 ?\C-j @result{} 10
463 ?\101 @result{} 65 ?A @result{} 65
464 @end group
465 @end example
466
467 To use hex, write a question mark followed by a backslash, @samp{x},
468 and the hexadecimal character code. You can use any number of hex
469 digits, so you can represent any character code in this way.
470 Thus, @samp{?\x41} for the character @kbd{A}, @samp{?\x1} for the
471 character @kbd{C-a}, and @code{?\x8e0} for the Latin-1 character
472 @iftex
473 @samp{@`a}.
474 @end iftex
475 @ifnottex
476 @samp{a} with grave accent.
477 @end ifnottex
478
479 A backslash is allowed, and harmless, preceding any character without
480 a special escape meaning; thus, @samp{?\+} is equivalent to @samp{?+}.
481 There is no reason to add a backslash before most characters. However,
482 you should add a backslash before any of the characters
483 @samp{()\|;'`"#.,} to avoid confusing the Emacs commands for editing
484 Lisp code. You can also add a backslash before whitespace characters such as
485 space, tab, newline and formfeed. However, it is cleaner to use one of
486 the easily readable escape sequences, such as @samp{\t} or @samp{\s},
487 instead of an actual whitespace character such as a tab or a space.
488 (If you do write backslash followed by a space, you should write
489 an extra space after the character constant to separate it from the
490 following text.)
491
492 @node Symbol Type 523 @node Symbol Type
493 @subsection Symbol Type 524 @subsection Symbol Type
494 525
495 A @dfn{symbol} in GNU Emacs Lisp is an object with a name. The 526 A @dfn{symbol} in GNU Emacs Lisp is an object with a name. The
496 symbol name serves as the printed representation of the symbol. In 527 symbol name serves as the printed representation of the symbol. In