emacs: lispref/nonascii.texi annotate

author	Richard M. Stallman <rms@gnu.org>
date	Wed, 14 Feb 2001 15:29:59 +0000 (2001-02-14)
parents	e1d9a16467ae
children	8f8df4d24f48

rev	line source
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	1 @c --texinfo--
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	2 @c This is part of the GNU Emacs Lisp Reference Manual.
27189 d2e5f1b7d8e2 Update copyrights. Gerd Moellmann <gerd@gnu.org> parents: 27187 diff changeset	3 @c Copyright (C) 1998, 1999 Free Software Foundation, Inc.
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	4 @c See the file elisp.texi for copying conditions.
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	5 @setfilename ../info/characters
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	6 @node Non-ASCII Characters, Searching and Matching, Text, Top
27374 0f5edee5242b * empty log message * Richard M. Stallman <rms@gnu.org> parents: 27362 diff changeset	7 @chapter Non-@sc{ascii} Characters
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	8 @cindex multibyte characters
27374 0f5edee5242b * empty log message * Richard M. Stallman <rms@gnu.org> parents: 27362 diff changeset	9 @cindex non-@sc{ascii} characters
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	10
25751 467b88fab665 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 24952 diff changeset	11 This chapter covers the special issues relating to non-@sc{ascii}
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	12 characters and how they are stored in strings and buffers.
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	13
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	14 @menu
28635 cda2b6ed6aec * empty log message * Richard M. Stallman <rms@gnu.org> parents: 27374 diff changeset	15 * Text Representations:: Unibyte and multibyte representations
cda2b6ed6aec * empty log message * Richard M. Stallman <rms@gnu.org> parents: 27374 diff changeset	16 * Converting Representations:: Converting unibyte to multibyte and vice versa.
cda2b6ed6aec * empty log message * Richard M. Stallman <rms@gnu.org> parents: 27374 diff changeset	17 * Selecting a Representation:: Treating a byte sequence as unibyte or multi.
cda2b6ed6aec * empty log message * Richard M. Stallman <rms@gnu.org> parents: 27374 diff changeset	18 * Character Codes:: How unibyte and multibyte relate to
cda2b6ed6aec * empty log message * Richard M. Stallman <rms@gnu.org> parents: 27374 diff changeset	19 codes of individual characters.
cda2b6ed6aec * empty log message * Richard M. Stallman <rms@gnu.org> parents: 27374 diff changeset	20 * Character Sets:: The space of possible characters codes
cda2b6ed6aec * empty log message * Richard M. Stallman <rms@gnu.org> parents: 27374 diff changeset	21 is divided into various character sets.
cda2b6ed6aec * empty log message * Richard M. Stallman <rms@gnu.org> parents: 27374 diff changeset	22 * Chars and Bytes:: More information about multibyte encodings.
cda2b6ed6aec * empty log message * Richard M. Stallman <rms@gnu.org> parents: 27374 diff changeset	23 * Splitting Characters:: Converting a character to its byte sequence.
cda2b6ed6aec * empty log message * Richard M. Stallman <rms@gnu.org> parents: 27374 diff changeset	24 * Scanning Charsets:: Which character sets are used in a buffer?
cda2b6ed6aec * empty log message * Richard M. Stallman <rms@gnu.org> parents: 27374 diff changeset	25 * Translation of Characters:: Translation tables are used for conversion.
cda2b6ed6aec * empty log message * Richard M. Stallman <rms@gnu.org> parents: 27374 diff changeset	26 * Coding Systems:: Coding systems are conversions for saving files.
cda2b6ed6aec * empty log message * Richard M. Stallman <rms@gnu.org> parents: 27374 diff changeset	27 * Input Methods:: Input methods allow users to enter various
cda2b6ed6aec * empty log message * Richard M. Stallman <rms@gnu.org> parents: 27374 diff changeset	28 non-ASCII characters without speciak keyboards.
cda2b6ed6aec * empty log message * Richard M. Stallman <rms@gnu.org> parents: 27374 diff changeset	29 * Locales:: Interacting with the POSIX locale.
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	30 @end menu
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	31
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	32 @node Text Representations
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	33 @section Text Representations
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	34 @cindex text representations
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	35
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	36 Emacs has two @dfn{text representations}---two ways to represent text
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	37 in a string or buffer. These are called @dfn{unibyte} and
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	38 @dfn{multibyte}. Each string, and each buffer, uses one of these two
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	39 representations. For most purposes, you can ignore the issue of
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	40 representations, because Emacs converts text between them as
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	41 appropriate. Occasionally in Lisp programming you will need to pay
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	42 attention to the difference.
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	43
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	44 @cindex unibyte text
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	45 In unibyte representation, each character occupies one byte and
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	46 therefore the possible character codes range from 0 to 255. Codes 0
25751 467b88fab665 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 24952 diff changeset	47 through 127 are @sc{ascii} characters; the codes from 128 through 255
467b88fab665 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 24952 diff changeset	48 are used for one non-@sc{ascii} character set (you can choose which
21682 90da2489c498 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21006 diff changeset	49 character set by setting the variable @code{nonascii-insert-offset}).
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	50
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	51 @cindex leading code
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	52 @cindex multibyte text
22252 40089afa2b1d * empty log message * Richard M. Stallman <rms@gnu.org> parents: 22138 diff changeset	53 @cindex trailing codes
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	54 In multibyte representation, a character may occupy more than one
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	55 byte, and as a result, the full range of Emacs character codes can be
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	56 stored. The first byte of a multibyte character is always in the range
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	57 128 through 159 (octal 0200 through 0237). These values are called
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	58 @dfn{leading codes}. The second and subsequent bytes of a multibyte
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	59 character are always in the range 160 through 255 (octal 0240 through
22252 40089afa2b1d * empty log message * Richard M. Stallman <rms@gnu.org> parents: 22138 diff changeset	60 0377); these values are @dfn{trailing codes}.
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	61
28877 607e317d50b5 * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 28635 diff changeset	62 Some sequences of bytes are not valid in multibyte text: for example,
32523 4881cd839f12 * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 29339 diff changeset	63 a single isolated byte in the range 128 through 159 is not allowed. But
4881cd839f12 * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 29339 diff changeset	64 character codes 128 through 159 can appear in multibyte text,
4881cd839f12 * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 29339 diff changeset	65 represented as two-byte sequences. All the character codes 128 through
4881cd839f12 * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 29339 diff changeset	66 255 are possible (though slightly abnormal) in multibyte text; they
28877 607e317d50b5 * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 28635 diff changeset	67 appear in multibyte buffers and strings when you do explicit encoding
607e317d50b5 * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 28635 diff changeset	68 and decoding (@pxref{Explicit Encoding}).
24951 7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	69
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	70 In a buffer, the buffer-local value of the variable
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	71 @code{enable-multibyte-characters} specifies the representation used.
24952 a6db4671c7a0 * empty log message * Karl Heuer <kwzh@gnu.org> parents: 24951 diff changeset	72 The representation for a string is determined and recorded in the string
a6db4671c7a0 * empty log message * Karl Heuer <kwzh@gnu.org> parents: 24951 diff changeset	73 when the string is constructed.
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	74
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	75 @defvar enable-multibyte-characters
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	76 This variable specifies the current buffer's text representation.
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	77 If it is non-@code{nil}, the buffer contains multibyte text; otherwise,
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	78 it contains unibyte text.
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	79
21682 90da2489c498 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21006 diff changeset	80 You cannot set this variable directly; instead, use the function
90da2489c498 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21006 diff changeset	81 @code{set-buffer-multibyte} to change a buffer's representation.
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	82 @end defvar
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	83
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	84 @defvar default-enable-multibyte-characters
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	85 This variable's value is entirely equivalent to @code{(default-value
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	86 'enable-multibyte-characters)}, and setting this variable changes that
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	87 default value. Setting the local binding of
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	88 @code{enable-multibyte-characters} in a specific buffer is not allowed,
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	89 but changing the default value is supported, and it is a reasonable
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	90 thing to do, because it has no effect on existing buffers.
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	91
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	92 The @samp{--unibyte} command line option does its job by setting the
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	93 default value to @code{nil} early in startup.
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	94 @end defvar
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	95
24951 7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	96 @defun position-bytes position
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	97 @tindex position-bytes
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	98 Return the byte-position corresponding to buffer position @var{position}
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	99 in the current buffer.
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	100 @end defun
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	101
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	102 @defun byte-to-position byte-position
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	103 @tindex byte-to-position
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	104 Return the buffer position corresponding to byte-position
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	105 @var{byte-position} in the current buffer.
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	106 @end defun
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	107
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	108 @defun multibyte-string-p string
24951 7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	109 Return @code{t} if @var{string} is a multibyte string.
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	110 @end defun
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	111
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	112 @node Converting Representations
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	113 @section Converting Text Representations
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	114
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	115 Emacs can convert unibyte text to multibyte; it can also convert
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	116 multibyte text to unibyte, though this conversion loses information. In
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	117 general these conversions happen when inserting text into a buffer, or
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	118 when putting text from several strings together in one string. You can
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	119 also explicitly convert a string's contents to either representation.
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	120
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	121 Emacs chooses the representation for a string based on the text that
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	122 it is constructed from. The general rule is to convert unibyte text to
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	123 multibyte text when combining it with other multibyte text, because the
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	124 multibyte representation is more general and can hold whatever
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	125 characters the unibyte text has.
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	126
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	127 When inserting text into a buffer, Emacs converts the text to the
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	128 buffer's representation, as specified by
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	129 @code{enable-multibyte-characters} in that buffer. In particular, when
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	130 you insert multibyte text into a unibyte buffer, Emacs converts the text
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	131 to unibyte, even though this conversion cannot in general preserve all
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	132 the characters that might be in the multibyte text. The other natural
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	133 alternative, to convert the buffer contents to multibyte, is not
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	134 acceptable because the buffer's representation is a choice made by the
21682 90da2489c498 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21006 diff changeset	135 user that cannot be overridden automatically.
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	136
25751 467b88fab665 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 24952 diff changeset	137 Converting unibyte text to multibyte text leaves @sc{ascii} characters
32523 4881cd839f12 * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 29339 diff changeset	138 unchanged, and likewise character codes 128 through 159. It converts
4881cd839f12 * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 29339 diff changeset	139 the non-@sc{ascii} codes 160 through 255 by adding the value
4881cd839f12 * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 29339 diff changeset	140 @code{nonascii-insert-offset} to each character code. By setting this
4881cd839f12 * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 29339 diff changeset	141 variable, you specify which character set the unibyte characters
4881cd839f12 * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 29339 diff changeset	142 correspond to (@pxref{Character Sets}). For example, if
4881cd839f12 * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 29339 diff changeset	143 @code{nonascii-insert-offset} is 2048, which is @code{(- (make-char
4881cd839f12 * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 29339 diff changeset	144 'latin-iso8859-1) 128)}, then the unibyte non-@sc{ascii} characters
4881cd839f12 * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 29339 diff changeset	145 correspond to Latin 1. If it is 2688, which is @code{(- (make-char
4881cd839f12 * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 29339 diff changeset	146 'greek-iso8859-7) 128)}, then they correspond to Greek letters.
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	147
25751 467b88fab665 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 24952 diff changeset	148 Converting multibyte text to unibyte is simpler: it discards all but
467b88fab665 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 24952 diff changeset	149 the low 8 bits of each character code. If @code{nonascii-insert-offset}
467b88fab665 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 24952 diff changeset	150 has a reasonable value, corresponding to the beginning of some character
467b88fab665 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 24952 diff changeset	151 set, this conversion is the inverse of the other: converting unibyte
467b88fab665 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 24952 diff changeset	152 text to multibyte and back to unibyte reproduces the original unibyte
467b88fab665 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 24952 diff changeset	153 text.
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	154
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	155 @defvar nonascii-insert-offset
25751 467b88fab665 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 24952 diff changeset	156 This variable specifies the amount to add to a non-@sc{ascii} character
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	157 when converting unibyte text to multibyte. It also applies when
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	158 @code{self-insert-command} inserts a character in the unibyte
29339 d831c2ad9313 Fix xref Dave Love <fx@gnu.org> parents: 29265 diff changeset	159 non-@sc{ascii} range, 128 through 255. However, the functions
29265 69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	160 @code{insert} and @code{insert-char} do not perform this conversion.
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	161
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	162 The right value to use to select character set @var{cs} is @code{(-
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	163 (make-char @var{cs}) 128)}. If the value of
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	164 @code{nonascii-insert-offset} is zero, then conversion actually uses the
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	165 value for the Latin 1 character set, rather than zero.
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	166 @end defvar
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	167
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	168 @defvar nonascii-translation-table
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	169 This variable provides a more general alternative to
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	170 @code{nonascii-insert-offset}. You can use it to specify independently
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	171 how to translate each code in the range of 128 through 255 into a
29265 69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	172 multibyte character. The value should be a char-table, or @code{nil}.
21682 90da2489c498 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21006 diff changeset	173 If this is non-@code{nil}, it overrides @code{nonascii-insert-offset}.
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	174 @end defvar
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	175
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	176 @defun string-make-unibyte string
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	177 This function converts the text of @var{string} to unibyte
22252 40089afa2b1d * empty log message * Richard M. Stallman <rms@gnu.org> parents: 22138 diff changeset	178 representation, if it isn't already, and returns the result. If
21682 90da2489c498 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21006 diff changeset	179 @var{string} is a unibyte string, it is returned unchanged.
33912 67b6bdbd95c6 8-bit tweaks Dave Love <fx@gnu.org> parents: 32523 diff changeset	180 Multibyte character codes are converted to unibyte
67b6bdbd95c6 8-bit tweaks Dave Love <fx@gnu.org> parents: 32523 diff changeset	181 by using just the low 8 bits.
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	182 @end defun
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	183
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	184 @defun string-make-multibyte string
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	185 This function converts the text of @var{string} to multibyte
22252 40089afa2b1d * empty log message * Richard M. Stallman <rms@gnu.org> parents: 22138 diff changeset	186 representation, if it isn't already, and returns the result. If
21682 90da2489c498 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21006 diff changeset	187 @var{string} is a multibyte string, it is returned unchanged.
33912 67b6bdbd95c6 8-bit tweaks Dave Love <fx@gnu.org> parents: 32523 diff changeset	188 The function @code{unibyte-char-to-multibyte} is used to convert
67b6bdbd95c6 8-bit tweaks Dave Love <fx@gnu.org> parents: 32523 diff changeset	189 each unibyte character to a multibyte character.
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	190 @end defun
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	191
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	192 @node Selecting a Representation
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	193 @section Selecting a Representation
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	194
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	195 Sometimes it is useful to examine an existing buffer or string as
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	196 multibyte when it was unibyte, or vice versa.
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	197
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	198 @defun set-buffer-multibyte multibyte
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	199 Set the representation type of the current buffer. If @var{multibyte}
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	200 is non-@code{nil}, the buffer becomes multibyte. If @var{multibyte}
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	201 is @code{nil}, the buffer becomes unibyte.
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	202
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	203 This function leaves the buffer contents unchanged when viewed as a
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	204 sequence of bytes. As a consequence, it can change the contents viewed
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	205 as characters; a sequence of two bytes which is treated as one character
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	206 in multibyte representation will count as two characters in unibyte
29265 69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	207 representation. Character codes 128 through 159 are an exception. They
69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	208 are represented by one byte in a unibyte buffer, but when the buffer is
69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	209 set to multibyte, they are converted to two-byte sequences, and vice
69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	210 versa.
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	211
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	212 This function sets @code{enable-multibyte-characters} to record which
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	213 representation is in use. It also adjusts various data in the buffer
21682 90da2489c498 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21006 diff changeset	214 (including overlays, text properties and markers) so that they cover the
90da2489c498 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21006 diff changeset	215 same text as they did before.
24951 7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	216
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	217 You cannot use @code{set-buffer-multibyte} on an indirect buffer,
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	218 because indirect buffers always inherit the representation of the
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	219 base buffer.
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	220 @end defun
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	221
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	222 @defun string-as-unibyte string
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	223 This function returns a string with the same bytes as @var{string} but
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	224 treating each byte as a character. This means that the value may have
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	225 more characters than @var{string} has.
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	226
24951 7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	227 If @var{string} is already a unibyte string, then the value is
33912 67b6bdbd95c6 8-bit tweaks Dave Love <fx@gnu.org> parents: 32523 diff changeset	228 @var{string} itself. Otherwise it is a newly created string, with no
67b6bdbd95c6 8-bit tweaks Dave Love <fx@gnu.org> parents: 32523 diff changeset	229 text properties. If @var{string} is multibyte, any characters it
67b6bdbd95c6 8-bit tweaks Dave Love <fx@gnu.org> parents: 32523 diff changeset	230 contains of charset @var{eight-bit-control} or @var{eight-bit-graphic}
67b6bdbd95c6 8-bit tweaks Dave Love <fx@gnu.org> parents: 32523 diff changeset	231 are converted to the corresponding single byte.
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	232 @end defun
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	233
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	234 @defun string-as-multibyte string
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	235 This function returns a string with the same bytes as @var{string} but
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	236 treating each multibyte sequence as one character. This means that the
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	237 value may have fewer characters than @var{string} has.
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	238
24951 7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	239 If @var{string} is already a multibyte string, then the value is
33912 67b6bdbd95c6 8-bit tweaks Dave Love <fx@gnu.org> parents: 32523 diff changeset	240 @var{string} itself. Otherwise it is a newly created string, with no
67b6bdbd95c6 8-bit tweaks Dave Love <fx@gnu.org> parents: 32523 diff changeset	241 text properties. If @var{string} is unibyte and contains any individual
67b6bdbd95c6 8-bit tweaks Dave Love <fx@gnu.org> parents: 32523 diff changeset	242 8-bit bytes (i.e.@: not part of a multibyte form), they are converted to
67b6bdbd95c6 8-bit tweaks Dave Love <fx@gnu.org> parents: 32523 diff changeset	243 the corresponding multibyte character of charset @var{eight-bit-control}
67b6bdbd95c6 8-bit tweaks Dave Love <fx@gnu.org> parents: 32523 diff changeset	244 or @var{eight-bit-graphic}.
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	245 @end defun
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	246
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	247 @node Character Codes
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	248 @section Character Codes
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	249 @cindex character codes
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	250
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	251 The unibyte and multibyte text representations use different character
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	252 codes. The valid character codes for unibyte representation range from
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	253 0 to 255---the values that can fit in one byte. The valid character
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	254 codes for multibyte representation range from 0 to 524287, but not all
28877 607e317d50b5 * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 28635 diff changeset	255 values in that range are valid. The values 128 through 255 are not
32523 4881cd839f12 * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 29339 diff changeset	256 entirely proper in multibyte text, but they can occur if you do explicit
28877 607e317d50b5 * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 28635 diff changeset	257 encoding and decoding (@pxref{Explicit Encoding}). Some other character
607e317d50b5 * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 28635 diff changeset	258 codes cannot occur at all in multibyte text. Only the @sc{ascii} codes
32523 4881cd839f12 * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 29339 diff changeset	259 0 through 127 are completely legitimate in both representations.
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	260
29265 69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	261 @defun char-valid-p charcode &optional genericp
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	262 This returns @code{t} if @var{charcode} is valid for either one of the two
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	263 text representations.
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	264
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	265 @example
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	266 (char-valid-p 65)
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	267 @result{} t
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	268 (char-valid-p 256)
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	269 @result{} nil
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	270 (char-valid-p 2248)
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	271 @result{} t
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	272 @end example
29265 69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	273
69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	274 If the optional argument @var{genericp} is non-nil, this function
69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	275 returns @code{t} if @var{charcode} is a generic character
29339 d831c2ad9313 Fix xref Dave Love <fx@gnu.org> parents: 29265 diff changeset	276 (@pxref{Splitting Characters}).
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	277 @end defun
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	278
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	279 @node Character Sets
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	280 @section Character Sets
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	281 @cindex character sets
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	282
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	283 Emacs classifies characters into various @dfn{character sets}, each of
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	284 which has a name which is a symbol. Each character belongs to one and
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	285 only one character set.
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	286
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	287 In general, there is one character set for each distinct script. For
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	288 example, @code{latin-iso8859-1} is one character set,
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	289 @code{greek-iso8859-7} is another, and @code{ascii} is another. An
21682 90da2489c498 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21006 diff changeset	290 Emacs character set can hold at most 9025 characters; therefore, in some
90da2489c498 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21006 diff changeset	291 cases, characters that would logically be grouped together are split
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	292 into several character sets. For example, one set of Chinese
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	293 characters, generally known as Big 5, is divided into two Emacs
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	294 character sets, @code{chinese-big5-1} and @code{chinese-big5-2}.
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	295
28900 ac620ff5fd5d * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 28887 diff changeset	296 @sc{ascii} characters are in character set @code{ascii}. The
ac620ff5fd5d * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 28887 diff changeset	297 non-@sc{ascii} characters 128 through 159 are in character set
ac620ff5fd5d * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 28887 diff changeset	298 @code{eight-bit-control}, and codes 160 through 255 are in character set
ac620ff5fd5d * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 28887 diff changeset	299 @code{eight-bit-graphic}.
ac620ff5fd5d * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 28887 diff changeset	300
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	301 @defun charsetp object
25751 467b88fab665 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 24952 diff changeset	302 Returns @code{t} if @var{object} is a symbol that names a character set,
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	303 @code{nil} otherwise.
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	304 @end defun
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	305
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	306 @defun charset-list
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	307 This function returns a list of all defined character set names.
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	308 @end defun
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	309
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	310 @defun char-charset character
24951 7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	311 This function returns the name of the character set that @var{character}
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	312 belongs to.
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	313 @end defun
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	314
25751 467b88fab665 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 24952 diff changeset	315 @defun charset-plist charset
467b88fab665 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 24952 diff changeset	316 @tindex charset-plist
467b88fab665 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 24952 diff changeset	317 This function returns the charset property list of the character set
467b88fab665 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 24952 diff changeset	318 @var{charset}. Although @var{charset} is a symbol, this is not the same
467b88fab665 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 24952 diff changeset	319 as the property list of that symbol. Charset properties are used for
29265 69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	320 special purposes within Emacs; for example,
69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	321 @code{preferred-coding-system} helps determine which coding system to
69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	322 use to encode characters in a charset.
25751 467b88fab665 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 24952 diff changeset	323 @end defun
467b88fab665 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 24952 diff changeset	324
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	325 @node Chars and Bytes
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	326 @section Characters and Bytes
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	327 @cindex bytes and characters
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	328
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	329 @cindex introduction sequence
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	330 @cindex dimension (of character set)
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	331 In multibyte representation, each character occupies one or more
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	332 bytes. Each character set has an @dfn{introduction sequence}, which is
25751 467b88fab665 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 24952 diff changeset	333 normally one or two bytes long. (Exception: the @sc{ascii} character
29265 69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	334 set and the @sc{eight-bit-graphic} character set have a zero-length
69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	335 introduction sequence.) The introduction sequence is the beginning of
69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	336 the byte sequence for any character in the character set. The rest of
69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	337 the character's bytes distinguish it from the other characters in the
69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	338 same character set. Depending on the character set, there are either
69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	339 one or two distinguishing bytes; the number of such bytes is called the
69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	340 @dfn{dimension} of the character set.
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	341
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	342 @defun charset-dimension charset
24951 7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	343 This function returns the dimension of @var{charset}; at present, the
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	344 dimension is always 1 or 2.
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	345 @end defun
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	346
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	347 @defun charset-bytes charset
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	348 @tindex charset-bytes
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	349 This function returns the number of bytes used to represent a character
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	350 in character set @var{charset}.
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	351 @end defun
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	352
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	353 This is the simplest way to determine the byte length of a character
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	354 set's introduction sequence:
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	355
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	356 @example
24951 7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	357 (- (charset-bytes @var{charset})
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	358 (charset-dimension @var{charset}))
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	359 @end example
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	360
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	361 @node Splitting Characters
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	362 @section Splitting Characters
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	363
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	364 The functions in this section convert between characters and the byte
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	365 values used to represent them. For most purposes, there is no need to
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	366 be concerned with the sequence of bytes used to represent a character,
21682 90da2489c498 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21006 diff changeset	367 because Emacs translates automatically when necessary.
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	368
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	369 @defun split-char character
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	370 Return a list containing the name of the character set of
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	371 @var{character}, followed by one or two byte values (integers) which
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	372 identify @var{character} within that character set. The number of byte
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	373 values is the character set's dimension.
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	374
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	375 @example
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	376 (split-char 2248)
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	377 @result{} (latin-iso8859-1 72)
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	378 (split-char 65)
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	379 @result{} (ascii 65)
29265 69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	380 (split-char 128)
69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	381 @result{} (eight-bit-control 128)
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	382 @end example
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	383 @end defun
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	384
34811 c2170032744b make-char change Dave Love <fx@gnu.org> parents: 33912 diff changeset	385 @defun make-char charset &optional code1 code2
c2170032744b make-char change Dave Love <fx@gnu.org> parents: 33912 diff changeset	386 This function returns the character in character set @var{charset} whose
c2170032744b make-char change Dave Love <fx@gnu.org> parents: 33912 diff changeset	387 position codes are @var{code1} and @var{code2}. This is roughly the
c2170032744b make-char change Dave Love <fx@gnu.org> parents: 33912 diff changeset	388 inverse of @code{split-char}. Normally, you should specify either one
c2170032744b make-char change Dave Love <fx@gnu.org> parents: 33912 diff changeset	389 or both of @var{code1} and @var{code2} according to the dimension of
c2170032744b make-char change Dave Love <fx@gnu.org> parents: 33912 diff changeset	390 @var{charset}. For example,
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	391
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	392 @example
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	393 (make-char 'latin-iso8859-1 72)
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	394 @result{} 2248
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	395 @end example
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	396 @end defun
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	397
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	398 @cindex generic characters
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	399 If you call @code{make-char} with no @var{byte-values}, the result is
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	400 a @dfn{generic character} which stands for @var{charset}. A generic
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	401 character is an integer, but it is @emph{not} valid for insertion in the
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	402 buffer as a character. It can be used in @code{char-table-range} to
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	403 refer to the whole character set (@pxref{Char-Tables}).
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	404 @code{char-valid-p} returns @code{nil} for generic characters.
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	405 For example:
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	406
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	407 @example
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	408 (make-char 'latin-iso8859-1)
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	409 @result{} 2176
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	410 (char-valid-p 2176)
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	411 @result{} nil
29265 69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	412 (char-valid-p 2176 t)
69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	413 @result{} t
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	414 (split-char 2176)
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	415 @result{} (latin-iso8859-1 0)
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	416 @end example
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	417
29265 69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	418 The character sets @sc{ascii}, @sc{eight-bit-control}, and
34811 c2170032744b make-char change Dave Love <fx@gnu.org> parents: 33912 diff changeset	419 @sc{eight-bit-graphic} don't have corresponding generic characters. If
c2170032744b make-char change Dave Love <fx@gnu.org> parents: 33912 diff changeset	420 @var{charset} is one of them and you don't supply @var{code1},
c2170032744b make-char change Dave Love <fx@gnu.org> parents: 33912 diff changeset	421 @code{make-char} returns the character code corresponding to the
c2170032744b make-char change Dave Love <fx@gnu.org> parents: 33912 diff changeset	422 smallest code in @var{charset}.
29265 69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	423
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	424 @node Scanning Charsets
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	425 @section Scanning for Character Sets
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	426
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	427 Sometimes it is useful to find out which character sets appear in a
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	428 part of a buffer or a string. One use for this is in determining which
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	429 coding systems (@pxref{Coding Systems}) are capable of representing all
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	430 of the text in question.
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	431
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	432 @defun find-charset-region beg end &optional translation
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	433 This function returns a list of the character sets that appear in the
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	434 current buffer between positions @var{beg} and @var{end}.
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	435
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	436 The optional argument @var{translation} specifies a translation table to
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	437 be used in scanning the text (@pxref{Translation of Characters}). If it
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	438 is non-@code{nil}, then each character in the region is translated
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	439 through this table, and the value returned describes the translated
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	440 characters instead of the characters actually in the buffer.
28887 0778eff185b6 * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 28877 diff changeset	441 @end defun
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	442
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	443 @defun find-charset-string string &optional translation
24951 7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	444 This function returns a list of the character sets that appear in the
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	445 string @var{string}. It is just like @code{find-charset-region}, except
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	446 that it applies to the contents of @var{string} instead of part of the
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	447 current buffer.
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	448 @end defun
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	449
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	450 @node Translation of Characters
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	451 @section Translation of Characters
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	452 @cindex character translation tables
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	453 @cindex translation tables
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	454
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	455 A @dfn{translation table} specifies a mapping of characters
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	456 into characters. These tables are used in encoding and decoding, and
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	457 for other purposes. Some coding systems specify their own particular
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	458 translation tables; there are also default translation tables which
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	459 apply to all other coding systems.
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	460
25751 467b88fab665 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 24952 diff changeset	461 @defun make-translation-table &rest translations
467b88fab665 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 24952 diff changeset	462 This function returns a translation table based on the argument
35752 e1d9a16467ae * empty log message * Dave Love <fx@gnu.org> parents: 35493 diff changeset	463 @var{translations}. Each element of @var{translations} should be a
e1d9a16467ae * empty log message * Dave Love <fx@gnu.org> parents: 35493 diff changeset	464 list of elements of the form @code{(@var{from} . @var{to})}; this says
e1d9a16467ae * empty log message * Dave Love <fx@gnu.org> parents: 35493 diff changeset	465 to translate the character @var{from} into @var{to}.
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	466
35493 679a73dad19a make-translation-table addition Dave Love <fx@gnu.org> parents: 34811 diff changeset	467 The arguments and the forms in each argument are processed in order,
679a73dad19a make-translation-table addition Dave Love <fx@gnu.org> parents: 34811 diff changeset	468 and if a previous form already translates @var{to} to some other
679a73dad19a make-translation-table addition Dave Love <fx@gnu.org> parents: 34811 diff changeset	469 character, say @var{to-alt}, @var{from} is also translated to
679a73dad19a make-translation-table addition Dave Love <fx@gnu.org> parents: 34811 diff changeset	470 @var{to-alt}.
679a73dad19a make-translation-table addition Dave Love <fx@gnu.org> parents: 34811 diff changeset	471
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	472 You can also map one whole character set into another character set with
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	473 the same dimension. To do this, you specify a generic character (which
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	474 designates a character set) for @var{from} (@pxref{Splitting Characters}).
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	475 In this case, @var{to} should also be a generic character, for another
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	476 character set of the same dimension. Then the translation table
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	477 translates each character of @var{from}'s character set into the
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	478 corresponding character of @var{to}'s character set.
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	479 @end defun
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	480
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	481 In decoding, the translation table's translations are applied to the
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	482 characters that result from ordinary decoding. If a coding system has
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	483 property @code{character-translation-table-for-decode}, that specifies
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	484 the translation table to use. Otherwise, if
23433 a53274056f20 Fix names of standard-translation-table-for-decode(encode). Richard M. Stallman <rms@gnu.org> parents: 23110 diff changeset	485 @code{standard-translation-table-for-decode} is non-@code{nil}, decoding
a53274056f20 Fix names of standard-translation-table-for-decode(encode). Richard M. Stallman <rms@gnu.org> parents: 23110 diff changeset	486 uses that table.
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	487
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	488 In encoding, the translation table's translations are applied to the
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	489 characters in the buffer, and the result of translation is actually
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	490 encoded. If a coding system has property
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	491 @code{character-translation-table-for-encode}, that specifies the
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	492 translation table to use. Otherwise the variable
23433 a53274056f20 Fix names of standard-translation-table-for-decode(encode). Richard M. Stallman <rms@gnu.org> parents: 23110 diff changeset	493 @code{standard-translation-table-for-encode} specifies the translation
a53274056f20 Fix names of standard-translation-table-for-decode(encode). Richard M. Stallman <rms@gnu.org> parents: 23110 diff changeset	494 table.
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	495
23433 a53274056f20 Fix names of standard-translation-table-for-decode(encode). Richard M. Stallman <rms@gnu.org> parents: 23110 diff changeset	496 @defvar standard-translation-table-for-decode
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	497 This is the default translation table for decoding, for
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	498 coding systems that don't specify any other translation table.
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	499 @end defvar
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	500

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

1 @c -*-texinfo-*-

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

2 @c This is part of the GNU Emacs Lisp Reference Manual.

27189

d2e5f1b7d8e2 Update copyrights.

Gerd Moellmann <gerd@gnu.org>

parents: 27187

diff changeset

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

4 @c See the file elisp.texi for copying conditions.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

5 @setfilename ../info/characters

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

6 @node Non-ASCII Characters, Searching and Matching, Text, Top

27374

0f5edee5242b *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27362

diff changeset

7 @chapter Non-@sc{ascii} Characters

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

8 @cindex multibyte characters

27374

0f5edee5242b *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27362

diff changeset

9 @cindex non-@sc{ascii} characters

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

10

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

11 This chapter covers the special issues relating to non-@sc{ascii}

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

12 characters and how they are stored in strings and buffers.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

13

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

14 @menu

28635

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

15 * Text Representations:: Unibyte and multibyte representations

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

16 * Converting Representations:: Converting unibyte to multibyte and vice versa.

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

17 * Selecting a Representation:: Treating a byte sequence as unibyte or multi.

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

18 * Character Codes:: How unibyte and multibyte relate to

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

19 codes of individual characters.

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

20 * Character Sets:: The space of possible characters codes

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

21 is divided into various character sets.

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

22 * Chars and Bytes:: More information about multibyte encodings.

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

23 * Splitting Characters:: Converting a character to its byte sequence.

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

24 * Scanning Charsets:: Which character sets are used in a buffer?

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

25 * Translation of Characters:: Translation tables are used for conversion.

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

26 * Coding Systems:: Coding systems are conversions for saving files.

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

27 * Input Methods:: Input methods allow users to enter various

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

28 non-ASCII characters without speciak keyboards.

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

29 * Locales:: Interacting with the POSIX locale.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

30 @end menu

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

31

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

32 @node Text Representations

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

33 @section Text Representations

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

34 @cindex text representations

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

35

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

36 Emacs has two @dfn{text representations}---two ways to represent text

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

37 in a string or buffer. These are called @dfn{unibyte} and

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

38 @dfn{multibyte}. Each string, and each buffer, uses one of these two

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

39 representations. For most purposes, you can ignore the issue of

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

40 representations, because Emacs converts text between them as

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

41 appropriate. Occasionally in Lisp programming you will need to pay

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

42 attention to the difference.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

43

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

44 @cindex unibyte text

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

45 In unibyte representation, each character occupies one byte and

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

46 therefore the possible character codes range from 0 to 255. Codes 0

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

47 through 127 are @sc{ascii} characters; the codes from 128 through 255

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

48 are used for one non-@sc{ascii} character set (you can choose which

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

49 character set by setting the variable @code{nonascii-insert-offset}).

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

50

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

51 @cindex leading code

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

52 @cindex multibyte text

22252

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

53 @cindex trailing codes

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

54 In multibyte representation, a character may occupy more than one

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

55 byte, and as a result, the full range of Emacs character codes can be

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

56 stored. The first byte of a multibyte character is always in the range

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

57 128 through 159 (octal 0200 through 0237). These values are called

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

58 @dfn{leading codes}. The second and subsequent bytes of a multibyte

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

59 character are always in the range 160 through 255 (octal 0240 through

22252

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

60 0377); these values are @dfn{trailing codes}.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

61

28877

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

62 Some sequences of bytes are not valid in multibyte text: for example,

32523

4881cd839f12 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 29339

diff changeset

63 a single isolated byte in the range 128 through 159 is not allowed. But

4881cd839f12 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 29339

diff changeset

64 character codes 128 through 159 can appear in multibyte text,

4881cd839f12 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 29339

diff changeset

65 represented as two-byte sequences. All the character codes 128 through

4881cd839f12 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 29339

diff changeset

66 255 are possible (though slightly abnormal) in multibyte text; they

28877

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

67 appear in multibyte buffers and strings when you do explicit encoding

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

68 and decoding (@pxref{Explicit Encoding}).

24951

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

69

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

70 In a buffer, the buffer-local value of the variable

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

71 @code{enable-multibyte-characters} specifies the representation used.

24952

a6db4671c7a0 *** empty log message ***

Karl Heuer <kwzh@gnu.org>

parents: 24951

diff changeset

72 The representation for a string is determined and recorded in the string

a6db4671c7a0 *** empty log message ***

Karl Heuer <kwzh@gnu.org>

parents: 24951

diff changeset

73 when the string is constructed.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

74

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

75 @defvar enable-multibyte-characters

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

76 This variable specifies the current buffer's text representation.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

77 If it is non-@code{nil}, the buffer contains multibyte text; otherwise,

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

78 it contains unibyte text.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

79

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

80 You cannot set this variable directly; instead, use the function

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

81 @code{set-buffer-multibyte} to change a buffer's representation.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

82 @end defvar

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

83

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

84 @defvar default-enable-multibyte-characters

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

85 This variable's value is entirely equivalent to @code{(default-value

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

86 'enable-multibyte-characters)}, and setting this variable changes that

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

87 default value. Setting the local binding of

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

88 @code{enable-multibyte-characters} in a specific buffer is not allowed,

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

89 but changing the default value is supported, and it is a reasonable

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

90 thing to do, because it has no effect on existing buffers.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

91

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

92 The @samp{--unibyte} command line option does its job by setting the

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

93 default value to @code{nil} early in startup.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

94 @end defvar

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

95

24951

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

96 @defun position-bytes position

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

97 @tindex position-bytes

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

98 Return the byte-position corresponding to buffer position @var{position}

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

99 in the current buffer.

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

100 @end defun

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

101

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

102 @defun byte-to-position byte-position

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

103 @tindex byte-to-position

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

104 Return the buffer position corresponding to byte-position

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

105 @var{byte-position} in the current buffer.

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

106 @end defun

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

107

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

108 @defun multibyte-string-p string

24951

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

109 Return @code{t} if @var{string} is a multibyte string.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

110 @end defun

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

111

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

112 @node Converting Representations

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

113 @section Converting Text Representations

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

114

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

115 Emacs can convert unibyte text to multibyte; it can also convert

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

116 multibyte text to unibyte, though this conversion loses information. In

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

117 general these conversions happen when inserting text into a buffer, or

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

118 when putting text from several strings together in one string. You can

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

119 also explicitly convert a string's contents to either representation.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

120

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

121 Emacs chooses the representation for a string based on the text that

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

122 it is constructed from. The general rule is to convert unibyte text to

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

123 multibyte text when combining it with other multibyte text, because the

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

124 multibyte representation is more general and can hold whatever

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

125 characters the unibyte text has.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

126

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

127 When inserting text into a buffer, Emacs converts the text to the

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

128 buffer's representation, as specified by

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

129 @code{enable-multibyte-characters} in that buffer. In particular, when

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

130 you insert multibyte text into a unibyte buffer, Emacs converts the text

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

131 to unibyte, even though this conversion cannot in general preserve all

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

132 the characters that might be in the multibyte text. The other natural

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

133 alternative, to convert the buffer contents to multibyte, is not

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

134 acceptable because the buffer's representation is a choice made by the

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

135 user that cannot be overridden automatically.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

136

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

137 Converting unibyte text to multibyte text leaves @sc{ascii} characters

32523

4881cd839f12 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 29339

diff changeset

138 unchanged, and likewise character codes 128 through 159. It converts

4881cd839f12 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 29339

diff changeset

139 the non-@sc{ascii} codes 160 through 255 by adding the value

4881cd839f12 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 29339

diff changeset

140 @code{nonascii-insert-offset} to each character code. By setting this

4881cd839f12 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 29339

diff changeset

141 variable, you specify which character set the unibyte characters

4881cd839f12 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 29339

diff changeset

142 correspond to (@pxref{Character Sets}). For example, if

4881cd839f12 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 29339

diff changeset

143 @code{nonascii-insert-offset} is 2048, which is @code{(- (make-char

4881cd839f12 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 29339

diff changeset

144 'latin-iso8859-1) 128)}, then the unibyte non-@sc{ascii} characters

4881cd839f12 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 29339

diff changeset

145 correspond to Latin 1. If it is 2688, which is @code{(- (make-char

4881cd839f12 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 29339

diff changeset

146 'greek-iso8859-7) 128)}, then they correspond to Greek letters.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

147

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

148 Converting multibyte text to unibyte is simpler: it discards all but

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

149 the low 8 bits of each character code. If @code{nonascii-insert-offset}

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

150 has a reasonable value, corresponding to the beginning of some character

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

151 set, this conversion is the inverse of the other: converting unibyte

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

152 text to multibyte and back to unibyte reproduces the original unibyte

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

153 text.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

154

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

155 @defvar nonascii-insert-offset

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

156 This variable specifies the amount to add to a non-@sc{ascii} character

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

157 when converting unibyte text to multibyte. It also applies when

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

158 @code{self-insert-command} inserts a character in the unibyte

29339

d831c2ad9313 Fix xref

Dave Love <fx@gnu.org>

parents: 29265

diff changeset

159 non-@sc{ascii} range, 128 through 255. However, the functions

29265

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

160 @code{insert} and @code{insert-char} do not perform this conversion.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

161

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

162 The right value to use to select character set @var{cs} is @code{(-

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

163 (make-char @var{cs}) 128)}. If the value of

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

164 @code{nonascii-insert-offset} is zero, then conversion actually uses the

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

165 value for the Latin 1 character set, rather than zero.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

166 @end defvar

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

167

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

168 @defvar nonascii-translation-table

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

169 This variable provides a more general alternative to

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

170 @code{nonascii-insert-offset}. You can use it to specify independently

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

171 how to translate each code in the range of 128 through 255 into a

29265

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

172 multibyte character. The value should be a char-table, or @code{nil}.

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

173 If this is non-@code{nil}, it overrides @code{nonascii-insert-offset}.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

174 @end defvar

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

175

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

176 @defun string-make-unibyte string

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

177 This function converts the text of @var{string} to unibyte

22252

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

178 representation, if it isn't already, and returns the result. If

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

179 @var{string} is a unibyte string, it is returned unchanged.

33912

67b6bdbd95c6 8-bit tweaks

Dave Love <fx@gnu.org>

parents: 32523

diff changeset

180 Multibyte character codes are converted to unibyte

67b6bdbd95c6 8-bit tweaks

Dave Love <fx@gnu.org>

parents: 32523

diff changeset

181 by using just the low 8 bits.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

182 @end defun

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

183

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

184 @defun string-make-multibyte string

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

185 This function converts the text of @var{string} to multibyte

22252

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

186 representation, if it isn't already, and returns the result. If

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

187 @var{string} is a multibyte string, it is returned unchanged.

33912

67b6bdbd95c6 8-bit tweaks

Dave Love <fx@gnu.org>

parents: 32523

diff changeset

188 The function @code{unibyte-char-to-multibyte} is used to convert

67b6bdbd95c6 8-bit tweaks

Dave Love <fx@gnu.org>

parents: 32523

diff changeset

189 each unibyte character to a multibyte character.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

190 @end defun

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

191

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

192 @node Selecting a Representation

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

193 @section Selecting a Representation

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

194

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

195 Sometimes it is useful to examine an existing buffer or string as

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

196 multibyte when it was unibyte, or vice versa.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

197

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

198 @defun set-buffer-multibyte multibyte

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

199 Set the representation type of the current buffer. If @var{multibyte}

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

200 is non-@code{nil}, the buffer becomes multibyte. If @var{multibyte}

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

201 is @code{nil}, the buffer becomes unibyte.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

202

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

203 This function leaves the buffer contents unchanged when viewed as a

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

204 sequence of bytes. As a consequence, it can change the contents viewed

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

205 as characters; a sequence of two bytes which is treated as one character

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

206 in multibyte representation will count as two characters in unibyte

29265

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

207 representation. Character codes 128 through 159 are an exception. They

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

208 are represented by one byte in a unibyte buffer, but when the buffer is

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

209 set to multibyte, they are converted to two-byte sequences, and vice

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

210 versa.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

211

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

212 This function sets @code{enable-multibyte-characters} to record which

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

213 representation is in use. It also adjusts various data in the buffer

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

214 (including overlays, text properties and markers) so that they cover the

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

215 same text as they did before.

24951

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

216

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

217 You cannot use @code{set-buffer-multibyte} on an indirect buffer,

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

218 because indirect buffers always inherit the representation of the

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

219 base buffer.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

220 @end defun

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

221

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

222 @defun string-as-unibyte string

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

223 This function returns a string with the same bytes as @var{string} but

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

224 treating each byte as a character. This means that the value may have

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

225 more characters than @var{string} has.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

226

24951

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

227 If @var{string} is already a unibyte string, then the value is

33912

67b6bdbd95c6 8-bit tweaks

Dave Love <fx@gnu.org>

parents: 32523

diff changeset

228 @var{string} itself. Otherwise it is a newly created string, with no

67b6bdbd95c6 8-bit tweaks

Dave Love <fx@gnu.org>

parents: 32523

diff changeset

229 text properties. If @var{string} is multibyte, any characters it

67b6bdbd95c6 8-bit tweaks

Dave Love <fx@gnu.org>

parents: 32523

diff changeset

230 contains of charset @var{eight-bit-control} or @var{eight-bit-graphic}

67b6bdbd95c6 8-bit tweaks

Dave Love <fx@gnu.org>

parents: 32523

diff changeset

231 are converted to the corresponding single byte.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

232 @end defun

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

233

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

234 @defun string-as-multibyte string

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

235 This function returns a string with the same bytes as @var{string} but

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

236 treating each multibyte sequence as one character. This means that the

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

237 value may have fewer characters than @var{string} has.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

238

24951

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

239 If @var{string} is already a multibyte string, then the value is

33912

67b6bdbd95c6 8-bit tweaks

Dave Love <fx@gnu.org>

parents: 32523

diff changeset

240 @var{string} itself. Otherwise it is a newly created string, with no

67b6bdbd95c6 8-bit tweaks

Dave Love <fx@gnu.org>

parents: 32523

diff changeset

241 text properties. If @var{string} is unibyte and contains any individual

67b6bdbd95c6 8-bit tweaks

Dave Love <fx@gnu.org>

parents: 32523

diff changeset

242 8-bit bytes (i.e.@: not part of a multibyte form), they are converted to

67b6bdbd95c6 8-bit tweaks

Dave Love <fx@gnu.org>

parents: 32523

diff changeset

243 the corresponding multibyte character of charset @var{eight-bit-control}

67b6bdbd95c6 8-bit tweaks

Dave Love <fx@gnu.org>

parents: 32523

diff changeset

244 or @var{eight-bit-graphic}.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

245 @end defun

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

246

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

247 @node Character Codes

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

248 @section Character Codes

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

249 @cindex character codes

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

250

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

251 The unibyte and multibyte text representations use different character

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

252 codes. The valid character codes for unibyte representation range from

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

253 0 to 255---the values that can fit in one byte. The valid character

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

254 codes for multibyte representation range from 0 to 524287, but not all

28877

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

255 values in that range are valid. The values 128 through 255 are not

32523

4881cd839f12 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 29339

diff changeset

256 entirely proper in multibyte text, but they can occur if you do explicit

28877

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

257 encoding and decoding (@pxref{Explicit Encoding}). Some other character

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

258 codes cannot occur at all in multibyte text. Only the @sc{ascii} codes

32523

4881cd839f12 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 29339

diff changeset

259 0 through 127 are completely legitimate in both representations.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

260

29265

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

261 @defun char-valid-p charcode &optional genericp

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

262 This returns @code{t} if @var{charcode} is valid for either one of the two

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

263 text representations.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

264

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

265 @example

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

266 (char-valid-p 65)

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

267 @result{} t

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

268 (char-valid-p 256)

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

269 @result{} nil

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

270 (char-valid-p 2248)

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

271 @result{} t

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

272 @end example

29265

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

273

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

274 If the optional argument @var{genericp} is non-nil, this function

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

275 returns @code{t} if @var{charcode} is a generic character

29339

d831c2ad9313 Fix xref

Dave Love <fx@gnu.org>

parents: 29265

diff changeset

276 (@pxref{Splitting Characters}).

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

277 @end defun

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

278

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

279 @node Character Sets

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

280 @section Character Sets

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

281 @cindex character sets

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

282

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

283 Emacs classifies characters into various @dfn{character sets}, each of

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

284 which has a name which is a symbol. Each character belongs to one and

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

285 only one character set.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

286

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

287 In general, there is one character set for each distinct script. For

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

288 example, @code{latin-iso8859-1} is one character set,

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

289 @code{greek-iso8859-7} is another, and @code{ascii} is another. An

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

290 Emacs character set can hold at most 9025 characters; therefore, in some

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

291 cases, characters that would logically be grouped together are split

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

292 into several character sets. For example, one set of Chinese

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

293 characters, generally known as Big 5, is divided into two Emacs

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

294 character sets, @code{chinese-big5-1} and @code{chinese-big5-2}.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

295

28900

ac620ff5fd5d *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28887

diff changeset

296 @sc{ascii} characters are in character set @code{ascii}. The

ac620ff5fd5d *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28887

diff changeset

297 non-@sc{ascii} characters 128 through 159 are in character set

ac620ff5fd5d *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28887

diff changeset

298 @code{eight-bit-control}, and codes 160 through 255 are in character set

ac620ff5fd5d *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28887

diff changeset

299 @code{eight-bit-graphic}.

ac620ff5fd5d *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28887

diff changeset

300

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

301 @defun charsetp object

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

302 Returns @code{t} if @var{object} is a symbol that names a character set,

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

303 @code{nil} otherwise.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

304 @end defun

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

305

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

306 @defun charset-list

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

307 This function returns a list of all defined character set names.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

308 @end defun

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

309

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

310 @defun char-charset character

24951

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

311 This function returns the name of the character set that @var{character}

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

312 belongs to.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

313 @end defun

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

314

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

315 @defun charset-plist charset

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

316 @tindex charset-plist

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

317 This function returns the charset property list of the character set

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

318 @var{charset}. Although @var{charset} is a symbol, this is not the same

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

319 as the property list of that symbol. Charset properties are used for

29265

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

320 special purposes within Emacs; for example,

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

321 @code{preferred-coding-system} helps determine which coding system to

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

322 use to encode characters in a charset.

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

323 @end defun

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

324

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

325 @node Chars and Bytes

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

326 @section Characters and Bytes

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

327 @cindex bytes and characters

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

328

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

329 @cindex introduction sequence

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

330 @cindex dimension (of character set)

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

331 In multibyte representation, each character occupies one or more

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

332 bytes. Each character set has an @dfn{introduction sequence}, which is

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

333 normally one or two bytes long. (Exception: the @sc{ascii} character

29265

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

334 set and the @sc{eight-bit-graphic} character set have a zero-length

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

335 introduction sequence.) The introduction sequence is the beginning of

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

336 the byte sequence for any character in the character set. The rest of

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

337 the character's bytes distinguish it from the other characters in the

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

338 same character set. Depending on the character set, there are either

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

339 one or two distinguishing bytes; the number of such bytes is called the

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

340 @dfn{dimension} of the character set.

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

341

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

342 @defun charset-dimension charset

24951

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

343 This function returns the dimension of @var{charset}; at present, the

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

344 dimension is always 1 or 2.

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

345 @end defun

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

346

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

347 @defun charset-bytes charset

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

348 @tindex charset-bytes

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

349 This function returns the number of bytes used to represent a character

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

350 in character set @var{charset}.

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

351 @end defun

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

352

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

353 This is the simplest way to determine the byte length of a character

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

354 set's introduction sequence:

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

355

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

356 @example

24951

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

357 (- (charset-bytes @var{charset})

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

358 (charset-dimension @var{charset}))

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

359 @end example

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

360

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

361 @node Splitting Characters

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

362 @section Splitting Characters

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

363

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

364 The functions in this section convert between characters and the byte

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

365 values used to represent them. For most purposes, there is no need to

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

366 be concerned with the sequence of bytes used to represent a character,

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

367 because Emacs translates automatically when necessary.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

368

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

369 @defun split-char character

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

370 Return a list containing the name of the character set of

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

371 @var{character}, followed by one or two byte values (integers) which

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

372 identify @var{character} within that character set. The number of byte

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

373 values is the character set's dimension.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

374

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

375 @example

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

376 (split-char 2248)

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

377 @result{} (latin-iso8859-1 72)

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

378 (split-char 65)

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

379 @result{} (ascii 65)

29265

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

380 (split-char 128)

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

381 @result{} (eight-bit-control 128)

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

382 @end example

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

383 @end defun

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

384

34811

c2170032744b make-char change

Dave Love <fx@gnu.org>

parents: 33912

diff changeset

385 @defun make-char charset &optional code1 code2

c2170032744b make-char change

Dave Love <fx@gnu.org>

parents: 33912

diff changeset

386 This function returns the character in character set @var{charset} whose

c2170032744b make-char change

Dave Love <fx@gnu.org>

parents: 33912

diff changeset

387 position codes are @var{code1} and @var{code2}. This is roughly the

c2170032744b make-char change

Dave Love <fx@gnu.org>

parents: 33912

diff changeset

388 inverse of @code{split-char}. Normally, you should specify either one

c2170032744b make-char change

Dave Love <fx@gnu.org>

parents: 33912

diff changeset

389 or both of @var{code1} and @var{code2} according to the dimension of

c2170032744b make-char change

Dave Love <fx@gnu.org>

parents: 33912

diff changeset

390 @var{charset}. For example,

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

391

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

392 @example

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

393 (make-char 'latin-iso8859-1 72)

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

394 @result{} 2248

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

395 @end example

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

396 @end defun

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

397

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

398 @cindex generic characters

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

399 If you call @code{make-char} with no @var{byte-values}, the result is

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

400 a @dfn{generic character} which stands for @var{charset}. A generic

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

401 character is an integer, but it is @emph{not} valid for insertion in the

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

402 buffer as a character. It can be used in @code{char-table-range} to

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

403 refer to the whole character set (@pxref{Char-Tables}).

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

404 @code{char-valid-p} returns @code{nil} for generic characters.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

405 For example:

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

406

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

407 @example

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

408 (make-char 'latin-iso8859-1)

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

409 @result{} 2176

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

410 (char-valid-p 2176)

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

411 @result{} nil

29265

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

412 (char-valid-p 2176 t)

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

413 @result{} t

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

414 (split-char 2176)

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

415 @result{} (latin-iso8859-1 0)

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

416 @end example

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

417

29265

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

418 The character sets @sc{ascii}, @sc{eight-bit-control}, and

34811

c2170032744b make-char change

Dave Love <fx@gnu.org>

parents: 33912

diff changeset

419 @sc{eight-bit-graphic} don't have corresponding generic characters. If

c2170032744b make-char change

Dave Love <fx@gnu.org>

parents: 33912

diff changeset

420 @var{charset} is one of them and you don't supply @var{code1},

c2170032744b make-char change

Dave Love <fx@gnu.org>

parents: 33912

diff changeset

421 @code{make-char} returns the character code corresponding to the

c2170032744b make-char change

Dave Love <fx@gnu.org>

parents: 33912

diff changeset

422 smallest code in @var{charset}.

29265

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

423

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

424 @node Scanning Charsets

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

425 @section Scanning for Character Sets

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

426

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

427 Sometimes it is useful to find out which character sets appear in a

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

428 part of a buffer or a string. One use for this is in determining which

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

429 coding systems (@pxref{Coding Systems}) are capable of representing all

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

430 of the text in question.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

431

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

432 @defun find-charset-region beg end &optional translation

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

433 This function returns a list of the character sets that appear in the

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

434 current buffer between positions @var{beg} and @var{end}.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

435

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

436 The optional argument @var{translation} specifies a translation table to

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

437 be used in scanning the text (@pxref{Translation of Characters}). If it

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

438 is non-@code{nil}, then each character in the region is translated

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

439 through this table, and the value returned describes the translated

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

440 characters instead of the characters actually in the buffer.

28887

0778eff185b6 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28877

diff changeset

441 @end defun

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

442

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

443 @defun find-charset-string string &optional translation

24951

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

444 This function returns a list of the character sets that appear in the

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

445 string @var{string}. It is just like @code{find-charset-region}, except

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

446 that it applies to the contents of @var{string} instead of part of the

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

447 current buffer.

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

448 @end defun

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

449

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

450 @node Translation of Characters

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

451 @section Translation of Characters

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

452 @cindex character translation tables

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

453 @cindex translation tables

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

454

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

455 A @dfn{translation table} specifies a mapping of characters

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

456 into characters. These tables are used in encoding and decoding, and

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

457 for other purposes. Some coding systems specify their own particular

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

458 translation tables; there are also default translation tables which

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

459 apply to all other coding systems.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

460

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

461 @defun make-translation-table &rest translations

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

462 This function returns a translation table based on the argument

35752

e1d9a16467ae *** empty log message ***

Dave Love <fx@gnu.org>

parents: 35493

diff changeset

463 @var{translations}. Each element of @var{translations} should be a

e1d9a16467ae *** empty log message ***

Dave Love <fx@gnu.org>

parents: 35493

diff changeset

464 list of elements of the form @code{(@var{from} . @var{to})}; this says

e1d9a16467ae *** empty log message ***

Dave Love <fx@gnu.org>

parents: 35493

diff changeset

465 to translate the character @var{from} into @var{to}.

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

466

35493

679a73dad19a make-translation-table addition

Dave Love <fx@gnu.org>

parents: 34811

diff changeset

467 The arguments and the forms in each argument are processed in order,

679a73dad19a make-translation-table addition

Dave Love <fx@gnu.org>

parents: 34811

diff changeset

468 and if a previous form already translates @var{to} to some other

679a73dad19a make-translation-table addition

Dave Love <fx@gnu.org>

parents: 34811

diff changeset

469 character, say @var{to-alt}, @var{from} is also translated to

679a73dad19a make-translation-table addition

Dave Love <fx@gnu.org>

parents: 34811

diff changeset

470 @var{to-alt}.

679a73dad19a make-translation-table addition

Dave Love <fx@gnu.org>

parents: 34811

diff changeset

471

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

472 You can also map one whole character set into another character set with

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

473 the same dimension. To do this, you specify a generic character (which

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

474 designates a character set) for @var{from} (@pxref{Splitting Characters}).

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

475 In this case, @var{to} should also be a generic character, for another

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

476 character set of the same dimension. Then the translation table

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

477 translates each character of @var{from}'s character set into the

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

478 corresponding character of @var{to}'s character set.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

479 @end defun

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

480

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

481 In decoding, the translation table's translations are applied to the

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

482 characters that result from ordinary decoding. If a coding system has

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

483 property @code{character-translation-table-for-decode}, that specifies

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

484 the translation table to use. Otherwise, if

23433

a53274056f20 Fix names of standard-translation-table-for-decode(encode).

Richard M. Stallman <rms@gnu.org>

parents: 23110

diff changeset

485 @code{standard-translation-table-for-decode} is non-@code{nil}, decoding

a53274056f20 Fix names of standard-translation-table-for-decode(encode).

Richard M. Stallman <rms@gnu.org>

parents: 23110

diff changeset

486 uses that table.

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

487

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

488 In encoding, the translation table's translations are applied to the

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

489 characters in the buffer, and the result of translation is actually

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

490 encoded. If a coding system has property

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

491 @code{character-translation-table-for-encode}, that specifies the

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

492 translation table to use. Otherwise the variable

23433

a53274056f20 Fix names of standard-translation-table-for-decode(encode).

Richard M. Stallman <rms@gnu.org>

parents: 23110

diff changeset

493 @code{standard-translation-table-for-encode} specifies the translation

a53274056f20 Fix names of standard-translation-table-for-decode(encode).

Richard M. Stallman <rms@gnu.org>

parents: 23110

diff changeset

494 table.

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

495

23433

a53274056f20 Fix names of standard-translation-table-for-decode(encode).

Richard M. Stallman <rms@gnu.org>

parents: 23110

diff changeset

496 @defvar standard-translation-table-for-decode

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

497 This is the default translation table for decoding, for

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

498 coding systems that don't specify any other translation table.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

499 @end defvar

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

500

23433

a53274056f20 Fix names of standard-translation-table-for-decode(encode).

Richard M. Stallman <rms@gnu.org>

parents: 23110

diff changeset

501 @defvar standard-translation-table-for-encode

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

502 This is the default translation table for encoding, for

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

503 coding systems that don't specify any other translation table.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

504 @end defvar

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

505

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

506 @node Coding Systems

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

507 @section Coding Systems

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

508

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

509 @cindex coding system

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

510 When Emacs reads or writes a file, and when Emacs sends text to a

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

511 subprocess or receives text from a subprocess, it normally performs

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

512 character code conversion and end-of-line conversion as specified

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

513 by a particular @dfn{coding system}.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

514

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

515 How to define a coding system is an arcane matter, and is not

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

516 documented here.

24951

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

517

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

518 @menu

28635

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

519 * Coding System Basics:: Basic concepts.

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

520 * Encoding and I/O:: How file I/O functions handle coding systems.

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

521 * Lisp and Coding Systems:: Functions to operate on coding system names.

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

522 * User-Chosen Coding Systems:: Asking the user to choose a coding system.

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

523 * Default Coding Systems:: Controlling the default choices.

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

524 * Specifying Coding Systems:: Requesting a particular coding system

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

525 for a single file operation.

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

526 * Explicit Encoding:: Encoding or decoding text without doing I/O.

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

527 * Terminal I/O Encoding:: Use of encoding for terminal I/O.

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

528 * MS-DOS File Types:: How DOS "text" and "binary" files

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

529 relate to coding systems.

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

530 @end menu

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

531

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

532 @node Coding System Basics

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

533 @subsection Basic Concepts of Coding Systems

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

534

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

535 @cindex character code conversion

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

536 @dfn{Character code conversion} involves conversion between the encoding

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

537 used inside Emacs and some other encoding. Emacs supports many

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

538 different encodings, in that it can convert to and from them. For

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

539 example, it can convert text to or from encodings such as Latin 1, Latin

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

540 2, Latin 3, Latin 4, Latin 5, and several variants of ISO 2022. In some

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

541 cases, Emacs supports several alternative encodings for the same

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

542 characters; for example, there are three coding systems for the Cyrillic

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

543 (Russian) alphabet: ISO, Alternativnyj, and KOI8.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

544

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

545 Most coding systems specify a particular character code for

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

546 conversion, but some of them leave the choice unspecified---to be chosen

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

547 heuristically for each file, based on the data.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

548

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

549 @cindex end of line conversion

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

550 @dfn{End of line conversion} handles three different conventions used

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

551 on various systems for representing end of line in files. The Unix

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

552 convention is to use the linefeed character (also called newline). The

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

553 DOS convention is to use a carriage-return and a linefeed at the end of

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

554 a line. The Mac convention is to use just carriage-return.

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

555

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

556 @cindex base coding system

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

557 @cindex variant coding system

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

558 @dfn{Base coding systems} such as @code{latin-1} leave the end-of-line

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

559 conversion unspecified, to be chosen based on the data. @dfn{Variant

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

560 coding systems} such as @code{latin-1-unix}, @code{latin-1-dos} and

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

561 @code{latin-1-mac} specify the end-of-line conversion explicitly as

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

562 well. Most base coding systems have three corresponding variants whose

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

563 names are formed by adding @samp{-unix}, @samp{-dos} and @samp{-mac}.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

564

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

565 The coding system @code{raw-text} is special in that it prevents

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

566 character code conversion, and causes the buffer visited with that

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

567 coding system to be a unibyte buffer. It does not specify the

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

568 end-of-line conversion, allowing that to be determined as usual by the

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

569 data, and has the usual three variants which specify the end-of-line

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

570 conversion. @code{no-conversion} is equivalent to @code{raw-text-unix}:

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

571 it specifies no conversion of either character codes or end-of-line.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

572

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

573 The coding system @code{emacs-mule} specifies that the data is

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

574 represented in the internal Emacs encoding. This is like

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

575 @code{raw-text} in that no code conversion happens, but different in

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

576 that the result is multibyte data.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

577

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

578 @defun coding-system-get coding-system property

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

579 This function returns the specified property of the coding system

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

580 @var{coding-system}. Most coding system properties exist for internal

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

581 purposes, but one that you might find useful is @code{mime-charset}.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

582 That property's value is the name used in MIME for the character coding

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

583 which this coding system can read and write. Examples:

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

584

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

585 @example

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

586 (coding-system-get 'iso-latin-1 'mime-charset)

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

587 @result{} iso-8859-1

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

588 (coding-system-get 'iso-2022-cn 'mime-charset)

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

589 @result{} iso-2022-cn

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

590 (coding-system-get 'cyrillic-koi8 'mime-charset)

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

591 @result{} koi8-r

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

592 @end example

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

593

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

594 The value of the @code{mime-charset} property is also defined

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

595 as an alias for the coding system.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

596 @end defun

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

597

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

598 @node Encoding and I/O

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

599 @subsection Encoding and I/O

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

600

22252

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

601 The principal purpose of coding systems is for use in reading and

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

602 writing files. The function @code{insert-file-contents} uses

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

603 a coding system for decoding the file data, and @code{write-region}

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

604 uses one to encode the buffer contents.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

605

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

606 You can specify the coding system to use either explicitly

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

607 (@pxref{Specifying Coding Systems}), or implicitly using the defaulting

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

608 mechanism (@pxref{Default Coding Systems}). But these methods may not

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

609 completely specify what to do. For example, they may choose a coding

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

610 system such as @code{undefined} which leaves the character code

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

611 conversion to be determined from the data. In these cases, the I/O

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

612 operation finishes the job of choosing a coding system. Very often

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

613 you will want to find out afterwards which coding system was chosen.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

614

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

615 @defvar buffer-file-coding-system

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

616 This variable records the coding system that was used for visiting the

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

617 current buffer. It is used for saving the buffer, and for writing part

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

618 of the buffer with @code{write-region}. When those operations ask the

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

619 user to specify a different coding system,

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

620 @code{buffer-file-coding-system} is updated to the coding system

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

621 specified.

24951

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

622

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

623 However, @code{buffer-file-coding-system} does not affect sending text

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

624 to a subprocess.

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

625 @end defvar

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

626

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

627 @defvar save-buffer-coding-system

29265

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

628 This variable specifies the coding system for saving the buffer (by

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

629 overriding @code{buffer-file-coding-system}). Note that it is not used

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

630 for @code{write-region}.

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

631

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

632 When a command to save the buffer starts out to use

29265

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

633 @code{buffer-file-coding-system} (or @code{save-buffer-coding-system}),

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

634 and that coding system cannot handle

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

635 the actual text in the buffer, the command asks the user to choose

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

636 another coding system. After that happens, the command also updates

29265

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

637 @code{buffer-file-coding-system} to represent the coding system that the

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

638 user specified.

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

639 @end defvar

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

640

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

641 @defvar last-coding-system-used

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

642 I/O operations for files and subprocesses set this variable to the

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

643 coding system name that was used. The explicit encoding and decoding

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

644 functions (@pxref{Explicit Encoding}) set it too.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

645

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

646 @strong{Warning:} Since receiving subprocess output sets this variable,

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

647 it can change whenever Emacs waits; therefore, you should copy the

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

648 value shortly after the function call that stores the value you are

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

649 interested in.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

650 @end defvar

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

651

23110

0d84817a4973 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22267

diff changeset

652 The variable @code{selection-coding-system} specifies how to encode

0d84817a4973 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22267

diff changeset

653 selections for the window system. @xref{Window System Selections}.

0d84817a4973 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22267

diff changeset

654

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

655 @node Lisp and Coding Systems

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

656 @subsection Coding Systems in Lisp

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

657

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

658 Here are the Lisp facilities for working with coding systems:

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

659

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

660 @defun coding-system-list &optional base-only

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

661 This function returns a list of all coding system names (symbols). If

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

662 @var{base-only} is non-@code{nil}, the value includes only the

29265

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

663 base coding systems. Otherwise, it includes alias and variant coding

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

664 systems as well.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

665 @end defun

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

666

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

667 @defun coding-system-p object

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

668 This function returns @code{t} if @var{object} is a coding system

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

669 name.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

670 @end defun

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

671

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

672 @defun check-coding-system coding-system

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

673 This function checks the validity of @var{coding-system}.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

674 If that is valid, it returns @var{coding-system}.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

675 Otherwise it signals an error with condition @code{coding-system-error}.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

676 @end defun

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

677

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

678 @defun coding-system-change-eol-conversion coding-system eol-type

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

679 This function returns a coding system which is like @var{coding-system}

22252

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

680 except for its eol conversion, which is specified by @code{eol-type}.

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

681 @var{eol-type} should be @code{unix}, @code{dos}, @code{mac}, or

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

682 @code{nil}. If it is @code{nil}, the returned coding system determines

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

683 the end-of-line conversion from the data.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

684 @end defun

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

685

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

686 @defun coding-system-change-text-conversion eol-coding text-coding

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

687 This function returns a coding system which uses the end-of-line

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

688 conversion of @var{eol-coding}, and the text conversion of

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

689 @var{text-coding}. If @var{text-coding} is @code{nil}, it returns

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

690 @code{undecided}, or one of its variants according to @var{eol-coding}.

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

691 @end defun

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

692

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

693 @defun find-coding-systems-region from to

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

694 This function returns a list of coding systems that could be used to

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

695 encode a text between @var{from} and @var{to}. All coding systems in

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

696 the list can safely encode any multibyte characters in that portion of

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

697 the text.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

698

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

699 If the text contains no multibyte characters, the function returns the

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

700 list @code{(undecided)}.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

701 @end defun

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

702

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

703 @defun find-coding-systems-string string

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

704 This function returns a list of coding systems that could be used to

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

705 encode the text of @var{string}. All coding systems in the list can

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

706 safely encode any multibyte characters in @var{string}. If the text

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

707 contains no multibyte characters, this returns the list

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

708 @code{(undecided)}.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

709 @end defun

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

710

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

711 @defun find-coding-systems-for-charsets charsets

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

712 This function returns a list of coding systems that could be used to

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

713 encode all the character sets in the list @var{charsets}.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

714 @end defun

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

715

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

716 @defun detect-coding-region start end &optional highest

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

717 This function chooses a plausible coding system for decoding the text

28877

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

718 from @var{start} to @var{end}. This text should be a byte sequence

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

719 (@pxref{Explicit Encoding}).

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

720

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

721 Normally this function returns a list of coding systems that could

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

722 handle decoding the text that was scanned. They are listed in order of

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

723 decreasing priority. But if @var{highest} is non-@code{nil}, then the

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

724 return value is just one coding system, the one that is highest in

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

725 priority.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

726

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

727 If the region contains only @sc{ascii} characters, the value

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

728 is @code{undecided} or @code{(undecided)}.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

729 @end defun

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

730

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

731 @defun detect-coding-string string highest

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

732 This function is like @code{detect-coding-region} except that it

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

733 operates on the contents of @var{string} instead of bytes in the buffer.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

734 @end defun

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

735

22252

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

736 @xref{Process Information}, for how to examine or set the coding

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

737 systems used for I/O to a subprocess.

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

738

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

739 @node User-Chosen Coding Systems

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

740 @subsection User-Chosen Coding Systems

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

741

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

742 @defun select-safe-coding-system from to &optional preferred-coding-system

22267

dfac7398266b *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22252

diff changeset

743 This function selects a coding system for encoding the text between

22252

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

744 @var{from} and @var{to}, asking the user to choose if necessary.

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

745

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

746 The optional argument @var{preferred-coding-system} specifies a coding

22267

dfac7398266b *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22252

diff changeset

747 system to try first. If that one can handle the text in the specified

dfac7398266b *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22252

diff changeset

748 region, then it is used. If this argument is omitted, the current

dfac7398266b *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22252

diff changeset

749 buffer's value of @code{buffer-file-coding-system} is tried first.

22252

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

750

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

751 If the region contains some multibyte characters that the preferred

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

752 coding system cannot encode, this function asks the user to choose from

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

753 a list of coding systems which can encode the text, and returns the

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

754 user's choice.

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

755

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

756 One other kludgy feature: if @var{from} is a string, the string is the

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

757 target text, and @var{to} is ignored.

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

758 @end defun

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

759

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

760 Here are two functions you can use to let the user specify a coding

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

761 system, with completion. @xref{Completion}.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

762

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

763 @defun read-coding-system prompt &optional default

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

764 This function reads a coding system using the minibuffer, prompting with

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

765 string @var{prompt}, and returns the coding system name as a symbol. If

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

766 the user enters null input, @var{default} specifies which coding system

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

767 to return. It should be a symbol or a string.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

768 @end defun

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

769

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

770 @defun read-non-nil-coding-system prompt

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

771 This function reads a coding system using the minibuffer, prompting with

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

772 string @var{prompt}, and returns the coding system name as a symbol. If

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

773 the user tries to enter null input, it asks the user to try again.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

774 @xref{Coding Systems}.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

775 @end defun

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

776

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

777 @node Default Coding Systems

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

778 @subsection Default Coding Systems

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

779

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

780 This section describes variables that specify the default coding

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

781 system for certain files or when running certain subprograms, and the

22252

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

782 function that I/O operations use to access them.

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

783

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

784 The idea of these variables is that you set them once and for all to the

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

785 defaults you want, and then do not change them again. To specify a

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

786 particular coding system for a particular operation in a Lisp program,

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

787 don't change these variables; instead, override them using

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

788 @code{coding-system-for-read} and @code{coding-system-for-write}

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

789 (@pxref{Specifying Coding Systems}).

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

790

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

791 @defvar file-coding-system-alist

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

792 This variable is an alist that specifies the coding systems to use for

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

793 reading and writing particular files. Each element has the form

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

794 @code{(@var{pattern} . @var{coding})}, where @var{pattern} is a regular

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

795 expression that matches certain file names. The element applies to file

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

796 names that match @var{pattern}.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

797

22252

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

798 The @sc{cdr} of the element, @var{coding}, should be either a coding

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

799 system, a cons cell containing two coding systems, or a function name (a

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

800 symbol with a function definition). If @var{coding} is a coding system,

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

801 that coding system is used for both reading the file and writing it. If

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

802 @var{coding} is a cons cell containing two coding systems, its @sc{car}

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

803 specifies the coding system for decoding, and its @sc{cdr} specifies the

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

804 coding system for encoding.

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

805

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

806 If @var{coding} is a function name, the function must return a coding

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

807 system or a cons cell containing two coding systems. This value is used

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

808 as described above.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

809 @end defvar

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

810

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

811 @defvar process-coding-system-alist

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

812 This variable is an alist specifying which coding systems to use for a

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

813 subprocess, depending on which program is running in the subprocess. It

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

814 works like @code{file-coding-system-alist}, except that @var{pattern} is

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

815 matched against the program name used to start the subprocess. The coding

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

816 system or systems specified in this alist are used to initialize the

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

817 coding systems used for I/O to the subprocess, but you can specify

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

818 other coding systems later using @code{set-process-coding-system}.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

819 @end defvar

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

820

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

821 @strong{Warning:} Coding systems such as @code{undecided}, which

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

822 determine the coding system from the data, do not work entirely reliably

22252

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

823 with asynchronous subprocess output. This is because Emacs handles

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

824 asynchronous subprocess output in batches, as it arrives. If the coding

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

825 system leaves the character code conversion unspecified, or leaves the

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

826 end-of-line conversion unspecified, Emacs must try to detect the proper

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

827 conversion from one batch at a time, and this does not always work.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

828

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

829 Therefore, with an asynchronous subprocess, if at all possible, use a

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

830 coding system which determines both the character code conversion and

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

831 the end of line conversion---that is, one like @code{latin-1-unix},

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

832 rather than @code{undecided} or @code{latin-1}.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

833

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

834 @defvar network-coding-system-alist

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

835 This variable is an alist that specifies the coding system to use for

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

836 network streams. It works much like @code{file-coding-system-alist},

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

837 with the difference that the @var{pattern} in an element may be either a

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

838 port number or a regular expression. If it is a regular expression, it

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

839 is matched against the network service name used to open the network

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

840 stream.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

841 @end defvar

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

842

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

843 @defvar default-process-coding-system

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

844 This variable specifies the coding systems to use for subprocess (and

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

845 network stream) input and output, when nothing else specifies what to

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

846 do.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

847

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

848 The value should be a cons cell of the form @code{(@var{input-coding}

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

849 . @var{output-coding})}. Here @var{input-coding} applies to input from

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

850 the subprocess, and @var{output-coding} applies to output to it.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

851 @end defvar

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

852

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

853 @defun find-operation-coding-system operation &rest arguments

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

854 This function returns the coding system to use (by default) for

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

855 performing @var{operation} with @var{arguments}. The value has this

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

856 form:

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

857

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

858 @example

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

859 (@var{decoding-system} @var{encoding-system})

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

860 @end example

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

861

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

862 The first element, @var{decoding-system}, is the coding system to use

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

863 for decoding (in case @var{operation} does decoding), and

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

864 @var{encoding-system} is the coding system for encoding (in case

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

865 @var{operation} does encoding).

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

866

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

867 The argument @var{operation} should be a symbol, one of

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

868 @code{insert-file-contents}, @code{write-region}, @code{call-process},

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

869 @code{call-process-region}, @code{start-process}, or

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

870 @code{open-network-stream}. These are the names of the Emacs I/O primitives

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

871 that can do coding system conversion.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

872

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

873 The remaining arguments should be the same arguments that might be given

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

874 to that I/O primitive. Depending on the primitive, one of those

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

875 arguments is selected as the @dfn{target}. For example, if

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

876 @var{operation} does file I/O, whichever argument specifies the file

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

877 name is the target. For subprocess primitives, the process name is the

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

878 target. For @code{open-network-stream}, the target is the service name

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

879 or port number.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

880

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

881 This function looks up the target in @code{file-coding-system-alist},

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

882 @code{process-coding-system-alist}, or

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

883 @code{network-coding-system-alist}, depending on @var{operation}.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

884 @xref{Default Coding Systems}.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

885 @end defun

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

886

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

887 @node Specifying Coding Systems

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

888 @subsection Specifying a Coding System for One Operation

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

889

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

890 You can specify the coding system for a specific operation by binding

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

891 the variables @code{coding-system-for-read} and/or

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

892 @code{coding-system-for-write}.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

893

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

894 @defvar coding-system-for-read

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

895 If this variable is non-@code{nil}, it specifies the coding system to

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

896 use for reading a file, or for input from a synchronous subprocess.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

897

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

898 It also applies to any asynchronous subprocess or network stream, but in

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

899 a different way: the value of @code{coding-system-for-read} when you

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

900 start the subprocess or open the network stream specifies the input

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

901 decoding method for that subprocess or network stream. It remains in

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

902 use for that subprocess or network stream unless and until overridden.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

903

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

904 The right way to use this variable is to bind it with @code{let} for a

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

905 specific I/O operation. Its global value is normally @code{nil}, and

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

906 you should not globally set it to any other value. Here is an example

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

907 of the right way to use the variable:

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

908

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

909 @example

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

910 ;; @r{Read the file with no character code conversion.}

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

911 ;; @r{Assume @sc{crlf} represents end-of-line.}

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

912 (let ((coding-system-for-write 'emacs-mule-dos))

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

913 (insert-file-contents filename))

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

914 @end example

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

915

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

916 When its value is non-@code{nil}, @code{coding-system-for-read} takes

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

917 precedence over all other methods of specifying a coding system to use for

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

918 input, including @code{file-coding-system-alist},

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

919 @code{process-coding-system-alist} and

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

920 @code{network-coding-system-alist}.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

921 @end defvar

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

922

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

923 @defvar coding-system-for-write

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

924 This works much like @code{coding-system-for-read}, except that it

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

925 applies to output rather than input. It affects writing to files,

24951

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

926 as well as sending output to subprocesses and net connections.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

927

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

928 When a single operation does both input and output, as do

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

929 @code{call-process-region} and @code{start-process}, both

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

930 @code{coding-system-for-read} and @code{coding-system-for-write}

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

931 affect it.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

932 @end defvar

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

933

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

934 @defvar inhibit-eol-conversion

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

935 When this variable is non-@code{nil}, no end-of-line conversion is done,

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

936 no matter which coding system is specified. This applies to all the

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

937 Emacs I/O and subprocess primitives, and to the explicit encoding and

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

938 decoding functions (@pxref{Explicit Encoding}).

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

939 @end defvar

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

940

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

941 @node Explicit Encoding

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

942 @subsection Explicit Encoding and Decoding

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

943 @cindex encoding text

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

944 @cindex decoding text

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

945

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

946 All the operations that transfer text in and out of Emacs have the

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

947 ability to use a coding system to encode or decode the text.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

948 You can also explicitly encode and decode text using the functions

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

949 in this section.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

950

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

951 The result of encoding, and the input to decoding, are not ordinary

28877

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

952 text. They logically consist of a series of byte values; that is, a

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

953 series of characters whose codes are in the range 0 through 255. In a

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

954 multibyte buffer or string, character codes 128 through 159 are

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

955 represented by multibyte sequences, but this is invisible to Lisp

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

956 programs.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

957

28877

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

958 The usual way to read a file into a buffer as a sequence of bytes, so

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

959 you can decode the contents explicitly, is with

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

960 @code{insert-file-contents-literally} (@pxref{Reading from Files});

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

961 alternatively, specify a non-@code{nil} @var{rawfile} argument when

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

962 visiting a file with @code{find-file-noselect}. These methods result in

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

963 a unibyte buffer.

24951

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

964

28877

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

965 The usual way to use the byte sequence that results from explicitly

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

966 encoding text is to copy it to a file or process---for example, to write

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

967 it with @code{write-region} (@pxref{Writing to Files}), and suppress

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

968 encoding by binding @code{coding-system-for-write} to

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

969 @code{no-conversion}.

24951

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

970

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

971 Here are the functions to perform explicit encoding or decoding. The

28877

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

972 decoding functions produce sequences of bytes; the encoding functions

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

973 are meant to operate on sequences of bytes. All of these functions

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

974 discard text properties.

22252

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

975

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

976 @defun encode-coding-region start end coding-system

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

977 This function encodes the text from @var{start} to @var{end} according

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

978 to coding system @var{coding-system}. The encoded text replaces the

28877

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

979 original text in the buffer. The result of encoding is logically a

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

980 sequence of bytes, but the buffer remains multibyte if it was multibyte

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

981 before.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

982 @end defun

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

983

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

984 @defun encode-coding-string string coding-system

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

985 This function encodes the text in @var{string} according to coding

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

986 system @var{coding-system}. It returns a new string containing the

28877

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

987 encoded text. The result of encoding is a unibyte string.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

988 @end defun

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

989

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

990 @defun decode-coding-region start end coding-system

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

991 This function decodes the text from @var{start} to @var{end} according

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

992 to coding system @var{coding-system}. The decoded text replaces the

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

993 original text in the buffer. To make explicit decoding useful, the text

28877

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

994 before decoding ought to be a sequence of byte values, but both

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

995 multibyte and unibyte buffers are acceptable.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

996 @end defun

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

997

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

998 @defun decode-coding-string string coding-system

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

999 This function decodes the text in @var{string} according to coding

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

1000 system @var{coding-system}. It returns a new string containing the

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

1001 decoded text. To make explicit decoding useful, the contents of

28877

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

1002 @var{string} ought to be a sequence of byte values, but a multibyte

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

1003 string is acceptable.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

1004 @end defun

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1005

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1006 @node Terminal I/O Encoding

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1007 @subsection Terminal I/O Encoding

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1008

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1009 Emacs can decode keyboard input using a coding system, and encode

23110

0d84817a4973 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22267

diff changeset

1010 terminal output. This is useful for terminals that transmit or display

0d84817a4973 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22267

diff changeset

1011 text using a particular encoding such as Latin-1. Emacs does not set

0d84817a4973 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22267

diff changeset

1012 @code{last-coding-system-used} for encoding or decoding for the

0d84817a4973 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22267

diff changeset

1013 terminal.

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1014

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1015 @defun keyboard-coding-system

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1016 This function returns the coding system that is in use for decoding

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1017 keyboard input---or @code{nil} if no coding system is to be used.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1018 @end defun

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1019

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1020 @defun set-keyboard-coding-system coding-system

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1021 This function specifies @var{coding-system} as the coding system to

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1022 use for decoding keyboard input. If @var{coding-system} is @code{nil},

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1023 that means do not decode keyboard input.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1024 @end defun

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1025

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1026 @defun terminal-coding-system

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1027 This function returns the coding system that is in use for encoding

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1028 terminal output---or @code{nil} for no encoding.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1029 @end defun

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1030

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1031 @defun set-terminal-coding-system coding-system

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1032 This function specifies @var{coding-system} as the coding system to use

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1033 for encoding terminal output. If @var{coding-system} is @code{nil},

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1034 that means do not encode terminal output.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1035 @end defun

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1036

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1037 @node MS-DOS File Types

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1038 @subsection MS-DOS File Types

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1039 @cindex DOS file types

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1040 @cindex MS-DOS file types

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1041 @cindex Windows file types

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1042 @cindex file types on MS-DOS and Windows

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1043 @cindex text files and binary files

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1044 @cindex binary files and text files

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1045

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

1046 On MS-DOS and Microsoft Windows, Emacs guesses the appropriate

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

1047 end-of-line conversion for a file by looking at the file's name. This

28877

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

1048 feature classifies files as @dfn{text files} and @dfn{binary files}. By

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

1049 ``binary file'' we mean a file of literal byte values that are not

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

1050 necessarily meant to be characters; Emacs does no end-of-line conversion

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

1051 and no character code conversion for them. On the other hand, the bytes

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

1052 in a text file are intended to represent characters; when you create a

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

1053 new file whose name implies that it is a text file, Emacs uses DOS

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

1054 end-of-line conversion.

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1055

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1056 @defvar buffer-file-type

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1057 This variable, automatically buffer-local in each buffer, records the

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1058 file type of the buffer's visited file. When a buffer does not specify

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1059 a coding system with @code{buffer-file-coding-system}, this variable is

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1060 used to determine which coding system to use when writing the contents

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1061 of the buffer. It should be @code{nil} for text, @code{t} for binary.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1062 If it is @code{t}, the coding system is @code{no-conversion}.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1063 Otherwise, @code{undecided-dos} is used.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1064

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1065 Normally this variable is set by visiting a file; it is set to

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1066 @code{nil} if the file was visited without any actual conversion.

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1067 @end defvar

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1068

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1069 @defopt file-name-buffer-file-type-alist

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1070 This variable holds an alist for recognizing text and binary files.

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1071 Each element has the form (@var{regexp} . @var{type}), where

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1072 @var{regexp} is matched against the file name, and @var{type} may be

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1073 @code{nil} for text, @code{t} for binary, or a function to call to

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1074 compute which. If it is a function, then it is called with a single

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1075 argument (the file name) and should return @code{t} or @code{nil}.

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1076

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

1077 When running on MS-DOS or MS-Windows, Emacs checks this alist to decide

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1078 which coding system to use when reading a file. For a text file,

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1079 @code{undecided-dos} is used. For a binary file, @code{no-conversion}

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1080 is used.

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1081

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1082 If no element in this alist matches a given file name, then

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1083 @code{default-buffer-file-type} says how to treat the file.

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1084 @end defopt

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1085

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1086 @defopt default-buffer-file-type

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1087 This variable says how to handle files for which

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1088 @code{file-name-buffer-file-type-alist} says nothing about the type.

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1089

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1090 If this variable is non-@code{nil}, then these files are treated as

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1091 binary: the coding system @code{no-conversion} is used. Otherwise,

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1092 nothing special is done for them---the coding system is deduced solely

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1093 from the file contents, in the usual Emacs fashion.

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1094 @end defopt

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1095

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1096 @node Input Methods

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1097 @section Input Methods

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1098 @cindex input methods

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1099

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

1100 @dfn{Input methods} provide convenient ways of entering non-@sc{ascii}

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1101 characters from the keyboard. Unlike coding systems, which translate

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

1102 non-@sc{ascii} characters to and from encodings meant to be read by

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1103 programs, input methods provide human-friendly commands. (@xref{Input

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1104 Methods,,, emacs, The GNU Emacs Manual}, for information on how users

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1105 use input methods to enter text.) How to define input methods is not

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1106 yet documented in this manual, but here we describe how to use them.

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1107

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1108 Each input method has a name, which is currently a string;

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1109 in the future, symbols may also be usable as input method names.

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1110

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1111 @defvar current-input-method

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1112 This variable holds the name of the input method now active in the

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1113 current buffer. (It automatically becomes local in each buffer when set

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1114 in any fashion.) It is @code{nil} if no input method is active in the

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1115 buffer now.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1116 @end defvar

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1117

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1118 @defvar default-input-method

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1119 This variable holds the default input method for commands that choose an

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1120 input method. Unlike @code{current-input-method}, this variable is

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1121 normally global.

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1122 @end defvar

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1123

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1124 @defun set-input-method input-method

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1125 This function activates input method @var{input-method} for the current

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1126 buffer. It also sets @code{default-input-method} to @var{input-method}.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1127 If @var{input-method} is @code{nil}, this function deactivates any input

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1128 method for the current buffer.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1129 @end defun

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1130

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1131 @defun read-input-method-name prompt &optional default inhibit-null

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1132 This function reads an input method name with the minibuffer, prompting

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1133 with @var{prompt}. If @var{default} is non-@code{nil}, that is returned

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1134 by default, if the user enters empty input. However, if

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1135 @var{inhibit-null} is non-@code{nil}, empty input signals an error.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1136

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1137 The returned value is a string.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1138 @end defun

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1139

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1140 @defvar input-method-alist

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1141 This variable defines all the supported input methods.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1142 Each element defines one input method, and should have the form:

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1143

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1144 @example

22252

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

1145 (@var{input-method} @var{language-env} @var{activate-func}

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

1146 @var{title} @var{description} @var{args}...)

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1147 @end example

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1148

22252

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

1149 Here @var{input-method} is the input method name, a string;

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

1150 @var{language-env} is another string, the name of the language

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

1151 environment this input method is recommended for. (That serves only for

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

1152 documentation purposes.)

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1153

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1154 @var{activate-func} is a function to call to activate this method. The

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1155 @var{args}, if any, are passed as arguments to @var{activate-func}. All

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1156 told, the arguments to @var{activate-func} are @var{input-method} and

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1157 the @var{args}.

28877

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

1158

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

1159 @var{title} is a string to display in the mode line while this method is

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

1160 active. @var{description} is a string describing this method and what

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

1161 it is good for.

22252

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

1162 @end defvar

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1163

23110

0d84817a4973 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22267

diff changeset

1164 The fundamental interface to input methods is through the

0d84817a4973 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22267

diff changeset

1165 variable @code{input-method-function}. @xref{Reading One Event}.

26696

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1166

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1167 @node Locales

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1168 @section Locales

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1169 @cindex locale

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1170

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1171 POSIX defines a concept of ``locales'' which control which language

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1172 to use in language-related features. These Emacs variables control

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1173 how Emacs interacts with these features.

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1174

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1175 @defvar locale-coding-system

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1176 @tindex locale-coding-system

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1177 This variable specifies the coding system to use for decoding system

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1178 error messages, for encoding the format argument to

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1179 @code{format-time-string}, and for decoding the return value of

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1180 @code{format-time-string}.

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1181 @end defvar

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1182

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1183 @defvar system-messages-locale

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1184 @tindex system-messages-locale

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1185 This variable specifies the locale to use for generating system error

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1186 messages. Changing the locale can cause messages to come out in a

27362

ce0641caaa76 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27189

diff changeset

1187 different language or in a different orthography. If the variable is

26696

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1188 @code{nil}, the locale is specified by environment variables in the

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1189 usual POSIX fashion.

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1190 @end defvar

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1191

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1192 @defvar system-time-locale

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1193 @tindex system-time-locale

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1194 This variable specifies the locale to use for formatting time values.

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1195 Changing the locale can cause messages to appear according to the

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1196 conventions of a different language. If the variable is @code{nil}, the

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1197 locale is specified by environment variables in the usual POSIX fashion.

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1198 @end defvar

28877

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

1199

Mercurial > emacs

annotate lispref/nonascii.texi @ 36082:28eec8406e22