changeset 72859:c5744ceda9ba

(Character Type): Node split. Add xref to Describing Characters. (Basic Char Syntax, General Escape Syntax) (Ctl-Char Syntax, Meta-Char Syntax): New subnodes.
author Richard M. Stallman <rms@gnu.org>
date Thu, 14 Sep 2006 01:43:18 +0000
parents a9629d84bf9f
children cc89b0870fa8
files lispref/objects.texi
diffstat 1 files changed, 99 insertions(+), 68 deletions(-) [+]
line wrap: on
line diff
--- a/lispref/objects.texi	Thu Sep 14 01:21:06 2006 +0000
+++ b/lispref/objects.texi	Thu Sep 14 01:43:18 2006 +0000
@@ -227,9 +227,9 @@
 other words, characters are represented by their character codes.  For
 example, the character @kbd{A} is represented as the @w{integer 65}.
 
-  Individual characters are not often used in programs.  It is far more
-common to work with @emph{strings}, which are sequences composed of
-characters.  @xref{String Type}.
+  Individual characters are used occasionally in programs, but it is
+more common to work with @emph{strings}, which are sequences composed
+of characters.  @xref{String Type}.
 
   Characters in strings, buffers, and files are currently limited to
 the range of 0 to 524287---nineteen bits.  But not all values in that
@@ -239,17 +239,32 @@
 input have a much wider range, to encode modifier keys such as
 Control, Meta and Shift.
 
+  There are special functions for producing a human-readable textual
+description of a character for the sake of messages.  @xref{Describing
+Characters}.
+
+@menu
+* Basic Char Syntax::
+* General Escape Syntax::
+* Ctl-Char Syntax::
+* Meta-Char Syntax::
+* Other Char Bits::
+@end menu
+
+@node Basic Char Syntax
+@subsubsection Basic Char Syntax
 @cindex read syntax for characters
 @cindex printed representation for characters
 @cindex syntax for characters
 @cindex @samp{?} in character constant
 @cindex question mark in character constant
-  Since characters are really integers, the printed representation of a
-character is a decimal number.  This is also a possible read syntax for
-a character, but writing characters that way in Lisp programs is a very
-bad idea.  You should @emph{always} use the special read syntax formats
-that Emacs Lisp provides for characters.  These syntax formats start
-with a question mark.
+
+  Since characters are really integers, the printed representation of
+a character is a decimal number.  This is also a possible read syntax
+for a character, but writing characters that way in Lisp programs is
+not clear programming.  You should @emph{always} use the special read
+syntax formats that Emacs Lisp provides for characters.  These syntax
+formats start with a question mark.
 
   The usual read syntax for alphanumeric characters is a question mark
 followed by the character; thus, @samp{?A} for the character
@@ -315,8 +330,76 @@
 character @key{ESC}.  @samp{\s} is meant for use in character
 constants; in string constants, just write the space.
 
+  A backslash is allowed, and harmless, preceding any character without
+a special escape meaning; thus, @samp{?\+} is equivalent to @samp{?+}.
+There is no reason to add a backslash before most characters.  However,
+you should add a backslash before any of the characters
+@samp{()\|;'`"#.,} to avoid confusing the Emacs commands for editing
+Lisp code.  You can also add a backslash before whitespace characters such as
+space, tab, newline and formfeed.  However, it is cleaner to use one of
+the easily readable escape sequences, such as @samp{\t} or @samp{\s},
+instead of an actual whitespace character such as a tab or a space.
+(If you do write backslash followed by a space, you should write
+an extra space after the character constant to separate it from the
+following text.)
+
+@node General Escape Syntax
+@subsubsection General Escape Syntax
+
+  In addition to the specific excape sequences for special important
+control characters, Emacs provides general categories of escape syntax
+that you can use to specify non-ASCII text characters.
+
+@cindex unicode character escape
+  For instance, you can specify characters by their Unicode values.
+@code{?\u@var{nnnn}} represents a character that maps to the Unicode
+code point @samp{U+@var{nnnn}}.  There is a slightly different syntax
+for specifying characters with code points above @code{#xFFFF};
+@code{\U00@var{nnnnnn}} represents the character whose Unicode code
+point is @samp{U+@var{nnnnnn}}, if such a character is supported by
+Emacs.  If the corresponding character is not supported, Emacs signals
+an error.
+
+  This peculiar and inconvenient syntax was adopted for compatibility
+with other programming languages.  Unlike some other languages, Emacs
+Lisp supports this syntax in only character literals and strings.
+
+@cindex @samp{\} in character constant
+@cindex backslash in character constant
+@cindex octal character code
+  The most general read syntax for a character represents the
+character code in either octal or hex.  To use octal, write a question
+mark followed by a backslash and the octal character code (up to three
+octal digits); thus, @samp{?\101} for the character @kbd{A},
+@samp{?\001} for the character @kbd{C-a}, and @code{?\002} for the
+character @kbd{C-b}.  Although this syntax can represent any
+@acronym{ASCII} character, it is preferred only when the precise octal
+value is more important than the @acronym{ASCII} representation.
+
+@example
+@group
+?\012 @result{} 10         ?\n @result{} 10         ?\C-j @result{} 10
+?\101 @result{} 65         ?A @result{} 65
+@end group
+@end example
+
+  To use hex, write a question mark followed by a backslash, @samp{x},
+and the hexadecimal character code.  You can use any number of hex
+digits, so you can represent any character code in this way.
+Thus, @samp{?\x41} for the character @kbd{A}, @samp{?\x1} for the
+character @kbd{C-a}, and @code{?\x8e0} for the Latin-1 character
+@iftex
+@samp{@`a}.
+@end iftex
+@ifnottex
+@samp{a} with grave accent.
+@end ifnottex
+
+@node Ctl-Char Syntax
+@subsubsection Control-Character Syntax
+
 @cindex control characters
-  Control characters may be represented using yet another read syntax.
+  Control characters can be represented using yet another read syntax.
 This consists of a question mark followed by a backslash, caret, and the
 corresponding non-control character, in either upper or lower case.  For
 example, both @samp{?\^I} and @samp{?\^i} are valid read syntax for the
@@ -363,6 +446,9 @@
 affect the meaning of the program, but may guide the understanding of
 people who read it.
 
+@node Meta-Char Syntax
+@subsubsection Meta-Character Syntax
+
 @cindex meta characters
   A @dfn{meta character} is a character typed with the @key{META}
 modifier key.  The integer that represents such a character has the
@@ -395,6 +481,9 @@
 or as @samp{?\M-\101}.  Likewise, you can write @kbd{C-M-b} as
 @samp{?\M-\C-b}, @samp{?\C-\M-b}, or @samp{?\M-\002}.
 
+@node Other Char Bits
+@subsubsection Other Character Modifier Bits
+
   The case of a graphic character is indicated by its character code;
 for example, @acronym{ASCII} distinguishes between the characters @samp{a}
 and @samp{A}.  But @acronym{ASCII} has no way to represent whether a control
@@ -431,64 +520,6 @@
 bit values are 2**22 for alt, 2**23 for super and 2**24 for hyper.
 @end ifnottex
 
-@cindex unicode character escape
-  Emacs provides a syntax for specifying characters by their Unicode
-code points.  @code{?\u@var{nnnn}} represents a character that maps to
-the Unicode code point @samp{U+@var{nnnn}}.  There is a slightly
-different syntax for specifying characters with code points above
-@code{#xFFFF}; @code{\U00@var{nnnnnn}} represents the character whose
-Unicode code point is @samp{U+@var{nnnnnn}}, if such a character
-is supported by Emacs.  If the corresponding character is not
-supported, Emacs signals an error.
-
-  This peculiar and inconvenient syntax was adopted for compatibility
-with other programming languages.  Unlike some other languages, Emacs
-Lisp supports this syntax in only character literals and strings.
-
-@cindex @samp{\} in character constant
-@cindex backslash in character constant
-@cindex octal character code
-  Finally, the most general read syntax for a character represents the
-character code in either octal or hex.  To use octal, write a question
-mark followed by a backslash and the octal character code (up to three
-octal digits); thus, @samp{?\101} for the character @kbd{A},
-@samp{?\001} for the character @kbd{C-a}, and @code{?\002} for the
-character @kbd{C-b}.  Although this syntax can represent any @acronym{ASCII}
-character, it is preferred only when the precise octal value is more
-important than the @acronym{ASCII} representation.
-
-@example
-@group
-?\012 @result{} 10         ?\n @result{} 10         ?\C-j @result{} 10
-?\101 @result{} 65         ?A @result{} 65
-@end group
-@end example
-
-  To use hex, write a question mark followed by a backslash, @samp{x},
-and the hexadecimal character code.  You can use any number of hex
-digits, so you can represent any character code in this way.
-Thus, @samp{?\x41} for the character @kbd{A}, @samp{?\x1} for the
-character @kbd{C-a}, and @code{?\x8e0} for the Latin-1 character
-@iftex
-@samp{@`a}.
-@end iftex
-@ifnottex
-@samp{a} with grave accent.
-@end ifnottex
-
-  A backslash is allowed, and harmless, preceding any character without
-a special escape meaning; thus, @samp{?\+} is equivalent to @samp{?+}.
-There is no reason to add a backslash before most characters.  However,
-you should add a backslash before any of the characters
-@samp{()\|;'`"#.,} to avoid confusing the Emacs commands for editing
-Lisp code.  You can also add a backslash before whitespace characters such as
-space, tab, newline and formfeed.  However, it is cleaner to use one of
-the easily readable escape sequences, such as @samp{\t} or @samp{\s},
-instead of an actual whitespace character such as a tab or a space.
-(If you do write backslash followed by a space, you should write
-an extra space after the character constant to separate it from the
-following text.)
-
 @node Symbol Type
 @subsection Symbol Type