emacs: doc/lispref/nonascii.texi comparison

comparison doc/lispref/nonascii.texi @ 106711:b87d77f96245

Consistently hex notation to represent character codes. * nonascii.texi (Text Representations, Character Codes) (Converting Representations, Explicit Encoding) (Translation of Characters): Use hex notation consistently. (Character Sets): Fix map-charset-chars doc (Bug#5197).

author	Chong Yidong <cyd@stupidchicken.com>
date	Sat, 02 Jan 2010 13:55:19 -0500
parents	810bd90737d5
children	1d1d5d9bd884

comparison

equal deleted inserted replaced

-:a96887ed3368
+:b87d77f96245
 @cindex Unicode
 To support this multitude of characters and scripts, Emacs closely
 follows the @dfn{Unicode Standard}.  The Unicode Standard assigns a
 unique number, called a @dfn{codepoint}, to each and every character.
 The range of codepoints defined by Unicode, or the Unicode
-@dfn{codespace}, is @code{0..10FFFF} (in hex), inclusive.  Emacs
+@dfn{codespace}, is @code{0..#x10FFFF} (in hexadecimal notation),
-extends this range with codepoints in the range @code{110000..3FFFFF},
+inclusive.  Emacs extends this range with codepoints in the range
-which it uses for representing characters that are not unified with
+@code{#x110000..#x3FFFFF}, which it uses for representing characters
-Unicode and raw 8-bit bytes that cannot be interpreted as characters
+that are not unified with Unicode and @dfn{raw 8-bit bytes} that
-(the latter occupy the range @code{3FFF80..3FFFFF}).  Thus, a
+cannot be interpreted as characters.  Thus, a character codepoint in
-character codepoint in Emacs is a 22-bit integer number.
+Emacs is a 22-bit integer number.
 @cindex internal representation of characters
 @cindex characters, representation in buffers and strings
 @cindex multibyte text
 To conserve memory, Emacs does not hold fixed-length 22-bit numbers
 This function returns a multibyte string containing the same sequence
 of characters as @var{string}.  If @var{string} is a multibyte string,
 it is returned unchanged.  The function assumes that @var{string}
 includes only @acronym{ASCII} characters and raw 8-bit bytes; the
 latter are converted to their multibyte representation corresponding
-to the codepoints in the @code{3FFF80..3FFFFF} area (@pxref{Text
+to the codepoints @code{#x3FFF80} through @code{#x3FFFFF}, inclusive
-Representations, codepoints}).
+(@pxref{Text Representations, codepoints}).
 @end defun
 @defun string-to-unibyte string
 This function returns a unibyte string containing the same sequence of
 characters as @var{string}.  It signals an error if @var{string}
 @section Character Codes
 @cindex character codes
 The unibyte and multibyte text representations use different
 character codes.  The valid character codes for unibyte representation
-range from 0 to 255---the values that can fit in one byte.  The valid
+range from 0 to @code{#xFF} (255)---the values that can fit in one
-character codes for multibyte representation range from 0 to 4194303
+byte.  The valid character codes for multibyte representation range
-(#x3FFFFF).  In this code space, values 0 through 127 are for
+from 0 to @code{#x3FFFFF}.  In this code space, values 0 through
-@acronym{ASCII} characters, and values 128 through 4194175 (#x3FFF7F)
+@code{#x7F} (127) are for @acronym{ASCII} characters, and values
-are for non-@acronym{ASCII} characters.  Values 0 through 1114111
+@code{#x80} (128) through @code{#x3FFF7F} (4194175) are for
-(#10FFFF) correspond to Unicode characters of the same codepoint;
+non-@acronym{ASCII} characters.
-values 1114112 (#110000) through 4194175 (#x3FFF7F) represent
-characters that are not unified with Unicode; and values 4194176
+Emacs character codes are a superset of the Unicode standard.
-(#x3FFF80) through 4194303 (#x3FFFFF) represent eight-bit raw bytes.
+Values 0 through @code{#x10FFFF} (1114111) correspond to Unicode
+characters of the same codepoint; values @code{#x110000} (1114112)
+through @code{#x3FFF7F} (4194175) represent characters that are not
+unified with Unicode; and values @code{#x3FFF80} (4194176) through
+@code{#x3FFFFF} (4194303) represent eight-bit raw bytes.
 @defun characterp charcode
 This returns @code{t} if @var{charcode} is a valid character, and
 @code{nil} otherwise.
 @cindex @code{emacs}, a charset
 @cindex @code{unicode}, a charset
 @cindex @code{eight-bit}, a charset
 Emacs defines several special character sets.  The character set
 @code{unicode} includes all the characters whose Emacs code points are
-in the range @code{0..10FFFF}.  The character set @code{emacs}
+in the range @code{0..#x10FFFF}.  The character set @code{emacs}
 includes all @acronym{ASCII} and non-@acronym{ASCII} characters.
 Finally, the @code{eight-bit} charset includes the 8-bit raw bytes;
 Emacs uses it to represent raw bytes encountered in text.
 @defun charsetp object
 @end defun
 The following function comes in handy for applying a certain
 function to all or part of the characters in a charset:
-@defun map-charset-chars function charset &optional arg from to
+@defun map-charset-chars function charset &optional arg from-code to-code
 Call @var{function} for characters in @var{charset}.  @var{function}
 is called with two arguments.  The first one is a cons cell
 @code{(@var{from} .  @var{to})}, where @var{from} and @var{to}
 indicate a range of characters contained in charset.  The second
-argument is the optional argument @var{arg}.
+argument passed to @var{function} is @var{arg}.
 By default, the range of codepoints passed to @var{function} includes
 all the characters in @var{charset}, but optional arguments
 @var{from-code} and @var{to-code} limit that to the range of
 characters between these two codepoints of @var{charset}.  If either
 This variable automatically becomes buffer-local when set.
 @end defvar
 @defun make-translation-table-from-vector vec
 This function returns a translation table made from @var{vec} that is
-an array of 256 elements to map byte values 0 through 255 to
+an array of 256 elements to map bytes (values 0 through #xFF) to
 characters.  Elements may be @code{nil} for untranslated bytes.  The
 returned table has a translation table for reverse mapping in the
 first extra slot, and the value @code{1} in the second extra slot.
 This function provides an easy way to make a private coding system
 The result of encoding, and the input to decoding, are not ordinary
 text.  They logically consist of a series of byte values; that is, a
 series of @acronym{ASCII} and eight-bit characters.  In unibyte
 buffers and strings, these characters have codes in the range 0
-through 255.  In a multibyte buffer or string, eight-bit characters
+through #xFF (255).  In a multibyte buffer or string, eight-bit
-have character codes higher than 255 (@pxref{Text Representations}),
+characters have character codes higher than #xFF (@pxref{Text
-but Emacs transparently converts them to their single-byte values when
+Representations}), but Emacs transparently converts them to their
-you encode or decode such text.
+single-byte values when you encode or decode such text.
 The usual way to read a file into a buffer as a sequence of bytes, so
 you can decode the contents explicitly, is with
 @code{insert-file-contents-literally} (@pxref{Reading from Files});
 alternatively, specify a non-@code{nil} @var{rawfile} argument when

Mercurial > emacs

comparison doc/lispref/nonascii.texi @ 106711:b87d77f96245