emacs: lispref/nonascii.texi annotate

author	Richard M. Stallman <rms@gnu.org>
date	Sun, 01 Sep 2002 13:27:47 +0000 (2002-09-01)
parents	ccaf0199f9dc
children	23a1cea22d13

rev	line source
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	1 @c --texinfo--
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	2 @c This is part of the GNU Emacs Lisp Reference Manual.
27189 d2e5f1b7d8e2 Update copyrights. Gerd Moellmann <gerd@gnu.org> parents: 27187 diff changeset	3 @c Copyright (C) 1998, 1999 Free Software Foundation, Inc.
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	4 @c See the file elisp.texi for copying conditions.
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	5 @setfilename ../info/characters
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	6 @node Non-ASCII Characters, Searching and Matching, Text, Top
27374 0f5edee5242b * empty log message * Richard M. Stallman <rms@gnu.org> parents: 27362 diff changeset	7 @chapter Non-@sc{ascii} Characters
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	8 @cindex multibyte characters
27374 0f5edee5242b * empty log message * Richard M. Stallman <rms@gnu.org> parents: 27362 diff changeset	9 @cindex non-@sc{ascii} characters
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	10
25751 467b88fab665 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 24952 diff changeset	11 This chapter covers the special issues relating to non-@sc{ascii}
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	12 characters and how they are stored in strings and buffers.
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	13
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	14 @menu
28635 cda2b6ed6aec * empty log message * Richard M. Stallman <rms@gnu.org> parents: 27374 diff changeset	15 * Text Representations:: Unibyte and multibyte representations
cda2b6ed6aec * empty log message * Richard M. Stallman <rms@gnu.org> parents: 27374 diff changeset	16 * Converting Representations:: Converting unibyte to multibyte and vice versa.
cda2b6ed6aec * empty log message * Richard M. Stallman <rms@gnu.org> parents: 27374 diff changeset	17 * Selecting a Representation:: Treating a byte sequence as unibyte or multi.
cda2b6ed6aec * empty log message * Richard M. Stallman <rms@gnu.org> parents: 27374 diff changeset	18 * Character Codes:: How unibyte and multibyte relate to
cda2b6ed6aec * empty log message * Richard M. Stallman <rms@gnu.org> parents: 27374 diff changeset	19 codes of individual characters.
cda2b6ed6aec * empty log message * Richard M. Stallman <rms@gnu.org> parents: 27374 diff changeset	20 * Character Sets:: The space of possible characters codes
cda2b6ed6aec * empty log message * Richard M. Stallman <rms@gnu.org> parents: 27374 diff changeset	21 is divided into various character sets.
cda2b6ed6aec * empty log message * Richard M. Stallman <rms@gnu.org> parents: 27374 diff changeset	22 * Chars and Bytes:: More information about multibyte encodings.
cda2b6ed6aec * empty log message * Richard M. Stallman <rms@gnu.org> parents: 27374 diff changeset	23 * Splitting Characters:: Converting a character to its byte sequence.
cda2b6ed6aec * empty log message * Richard M. Stallman <rms@gnu.org> parents: 27374 diff changeset	24 * Scanning Charsets:: Which character sets are used in a buffer?
cda2b6ed6aec * empty log message * Richard M. Stallman <rms@gnu.org> parents: 27374 diff changeset	25 * Translation of Characters:: Translation tables are used for conversion.
cda2b6ed6aec * empty log message * Richard M. Stallman <rms@gnu.org> parents: 27374 diff changeset	26 * Coding Systems:: Coding systems are conversions for saving files.
cda2b6ed6aec * empty log message * Richard M. Stallman <rms@gnu.org> parents: 27374 diff changeset	27 * Input Methods:: Input methods allow users to enter various
40834 9552d64e0367 Fix typo. Richard M. Stallman <rms@gnu.org> parents: 39221 diff changeset	28 non-ASCII characters without special keyboards.
28635 cda2b6ed6aec * empty log message * Richard M. Stallman <rms@gnu.org> parents: 27374 diff changeset	29 * Locales:: Interacting with the POSIX locale.
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	30 @end menu
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	31
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	32 @node Text Representations
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	33 @section Text Representations
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	34 @cindex text representations
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	35
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	36 Emacs has two @dfn{text representations}---two ways to represent text
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	37 in a string or buffer. These are called @dfn{unibyte} and
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	38 @dfn{multibyte}. Each string, and each buffer, uses one of these two
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	39 representations. For most purposes, you can ignore the issue of
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	40 representations, because Emacs converts text between them as
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	41 appropriate. Occasionally in Lisp programming you will need to pay
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	42 attention to the difference.
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	43
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	44 @cindex unibyte text
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	45 In unibyte representation, each character occupies one byte and
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	46 therefore the possible character codes range from 0 to 255. Codes 0
25751 467b88fab665 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 24952 diff changeset	47 through 127 are @sc{ascii} characters; the codes from 128 through 255
467b88fab665 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 24952 diff changeset	48 are used for one non-@sc{ascii} character set (you can choose which
21682 90da2489c498 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21006 diff changeset	49 character set by setting the variable @code{nonascii-insert-offset}).
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	50
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	51 @cindex leading code
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	52 @cindex multibyte text
22252 40089afa2b1d * empty log message * Richard M. Stallman <rms@gnu.org> parents: 22138 diff changeset	53 @cindex trailing codes
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	54 In multibyte representation, a character may occupy more than one
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	55 byte, and as a result, the full range of Emacs character codes can be
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	56 stored. The first byte of a multibyte character is always in the range
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	57 128 through 159 (octal 0200 through 0237). These values are called
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	58 @dfn{leading codes}. The second and subsequent bytes of a multibyte
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	59 character are always in the range 160 through 255 (octal 0240 through
22252 40089afa2b1d * empty log message * Richard M. Stallman <rms@gnu.org> parents: 22138 diff changeset	60 0377); these values are @dfn{trailing codes}.
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	61
28877 607e317d50b5 * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 28635 diff changeset	62 Some sequences of bytes are not valid in multibyte text: for example,
32523 4881cd839f12 * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 29339 diff changeset	63 a single isolated byte in the range 128 through 159 is not allowed. But
4881cd839f12 * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 29339 diff changeset	64 character codes 128 through 159 can appear in multibyte text,
4881cd839f12 * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 29339 diff changeset	65 represented as two-byte sequences. All the character codes 128 through
4881cd839f12 * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 29339 diff changeset	66 255 are possible (though slightly abnormal) in multibyte text; they
28877 607e317d50b5 * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 28635 diff changeset	67 appear in multibyte buffers and strings when you do explicit encoding
607e317d50b5 * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 28635 diff changeset	68 and decoding (@pxref{Explicit Encoding}).
24951 7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	69
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	70 In a buffer, the buffer-local value of the variable
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	71 @code{enable-multibyte-characters} specifies the representation used.
24952 a6db4671c7a0 * empty log message * Karl Heuer <kwzh@gnu.org> parents: 24951 diff changeset	72 The representation for a string is determined and recorded in the string
a6db4671c7a0 * empty log message * Karl Heuer <kwzh@gnu.org> parents: 24951 diff changeset	73 when the string is constructed.
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	74
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	75 @defvar enable-multibyte-characters
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	76 This variable specifies the current buffer's text representation.
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	77 If it is non-@code{nil}, the buffer contains multibyte text; otherwise,
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	78 it contains unibyte text.
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	79
21682 90da2489c498 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21006 diff changeset	80 You cannot set this variable directly; instead, use the function
90da2489c498 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21006 diff changeset	81 @code{set-buffer-multibyte} to change a buffer's representation.
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	82 @end defvar
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	83
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	84 @defvar default-enable-multibyte-characters
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	85 This variable's value is entirely equivalent to @code{(default-value
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	86 'enable-multibyte-characters)}, and setting this variable changes that
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	87 default value. Setting the local binding of
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	88 @code{enable-multibyte-characters} in a specific buffer is not allowed,
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	89 but changing the default value is supported, and it is a reasonable
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	90 thing to do, because it has no effect on existing buffers.
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	91
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	92 The @samp{--unibyte} command line option does its job by setting the
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	93 default value to @code{nil} early in startup.
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	94 @end defvar
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	95
24951 7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	96 @defun position-bytes position
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	97 @tindex position-bytes
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	98 Return the byte-position corresponding to buffer position @var{position}
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	99 in the current buffer.
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	100 @end defun
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	101
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	102 @defun byte-to-position byte-position
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	103 @tindex byte-to-position
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	104 Return the buffer position corresponding to byte-position
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	105 @var{byte-position} in the current buffer.
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	106 @end defun
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	107
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	108 @defun multibyte-string-p string
24951 7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	109 Return @code{t} if @var{string} is a multibyte string.
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	110 @end defun
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	111
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	112 @node Converting Representations
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	113 @section Converting Text Representations
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	114
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	115 Emacs can convert unibyte text to multibyte; it can also convert
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	116 multibyte text to unibyte, though this conversion loses information. In
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	117 general these conversions happen when inserting text into a buffer, or
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	118 when putting text from several strings together in one string. You can
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	119 also explicitly convert a string's contents to either representation.
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	120
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	121 Emacs chooses the representation for a string based on the text that
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	122 it is constructed from. The general rule is to convert unibyte text to
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	123 multibyte text when combining it with other multibyte text, because the
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	124 multibyte representation is more general and can hold whatever
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	125 characters the unibyte text has.
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	126
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	127 When inserting text into a buffer, Emacs converts the text to the
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	128 buffer's representation, as specified by
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	129 @code{enable-multibyte-characters} in that buffer. In particular, when
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	130 you insert multibyte text into a unibyte buffer, Emacs converts the text
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	131 to unibyte, even though this conversion cannot in general preserve all
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	132 the characters that might be in the multibyte text. The other natural
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	133 alternative, to convert the buffer contents to multibyte, is not
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	134 acceptable because the buffer's representation is a choice made by the
21682 90da2489c498 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21006 diff changeset	135 user that cannot be overridden automatically.
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	136
25751 467b88fab665 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 24952 diff changeset	137 Converting unibyte text to multibyte text leaves @sc{ascii} characters
32523 4881cd839f12 * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 29339 diff changeset	138 unchanged, and likewise character codes 128 through 159. It converts
4881cd839f12 * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 29339 diff changeset	139 the non-@sc{ascii} codes 160 through 255 by adding the value
4881cd839f12 * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 29339 diff changeset	140 @code{nonascii-insert-offset} to each character code. By setting this
4881cd839f12 * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 29339 diff changeset	141 variable, you specify which character set the unibyte characters
4881cd839f12 * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 29339 diff changeset	142 correspond to (@pxref{Character Sets}). For example, if
4881cd839f12 * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 29339 diff changeset	143 @code{nonascii-insert-offset} is 2048, which is @code{(- (make-char
4881cd839f12 * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 29339 diff changeset	144 'latin-iso8859-1) 128)}, then the unibyte non-@sc{ascii} characters
4881cd839f12 * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 29339 diff changeset	145 correspond to Latin 1. If it is 2688, which is @code{(- (make-char
4881cd839f12 * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 29339 diff changeset	146 'greek-iso8859-7) 128)}, then they correspond to Greek letters.
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	147
25751 467b88fab665 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 24952 diff changeset	148 Converting multibyte text to unibyte is simpler: it discards all but
467b88fab665 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 24952 diff changeset	149 the low 8 bits of each character code. If @code{nonascii-insert-offset}
467b88fab665 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 24952 diff changeset	150 has a reasonable value, corresponding to the beginning of some character
467b88fab665 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 24952 diff changeset	151 set, this conversion is the inverse of the other: converting unibyte
467b88fab665 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 24952 diff changeset	152 text to multibyte and back to unibyte reproduces the original unibyte
467b88fab665 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 24952 diff changeset	153 text.
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	154
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	155 @defvar nonascii-insert-offset
25751 467b88fab665 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 24952 diff changeset	156 This variable specifies the amount to add to a non-@sc{ascii} character
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	157 when converting unibyte text to multibyte. It also applies when
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	158 @code{self-insert-command} inserts a character in the unibyte
29339 d831c2ad9313 Fix xref Dave Love <fx@gnu.org> parents: 29265 diff changeset	159 non-@sc{ascii} range, 128 through 255. However, the functions
29265 69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	160 @code{insert} and @code{insert-char} do not perform this conversion.
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	161
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	162 The right value to use to select character set @var{cs} is @code{(-
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	163 (make-char @var{cs}) 128)}. If the value of
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	164 @code{nonascii-insert-offset} is zero, then conversion actually uses the
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	165 value for the Latin 1 character set, rather than zero.
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	166 @end defvar
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	167
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	168 @defvar nonascii-translation-table
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	169 This variable provides a more general alternative to
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	170 @code{nonascii-insert-offset}. You can use it to specify independently
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	171 how to translate each code in the range of 128 through 255 into a
29265 69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	172 multibyte character. The value should be a char-table, or @code{nil}.
21682 90da2489c498 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21006 diff changeset	173 If this is non-@code{nil}, it overrides @code{nonascii-insert-offset}.
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	174 @end defvar
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	175
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	176 @defun string-make-unibyte string
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	177 This function converts the text of @var{string} to unibyte
22252 40089afa2b1d * empty log message * Richard M. Stallman <rms@gnu.org> parents: 22138 diff changeset	178 representation, if it isn't already, and returns the result. If
45652 ccaf0199f9dc (Converting Representations): Update the description of what Eli Zaretskii <eliz@gnu.org> parents: 43634 diff changeset	179 @var{string} is a unibyte string, it is returned unchanged. Multibyte
ccaf0199f9dc (Converting Representations): Update the description of what Eli Zaretskii <eliz@gnu.org> parents: 43634 diff changeset	180 character codes are converted to unibyte according to
ccaf0199f9dc (Converting Representations): Update the description of what Eli Zaretskii <eliz@gnu.org> parents: 43634 diff changeset	181 @code{nonascii-translation-table} or, if that is @code{nil}, using
ccaf0199f9dc (Converting Representations): Update the description of what Eli Zaretskii <eliz@gnu.org> parents: 43634 diff changeset	182 @code{nonascii-insert-offset}. If the lookup in the translation table
ccaf0199f9dc (Converting Representations): Update the description of what Eli Zaretskii <eliz@gnu.org> parents: 43634 diff changeset	183 fails, this function takes just the low 8 bits of each character.
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	184 @end defun
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	185
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	186 @defun string-make-multibyte string
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	187 This function converts the text of @var{string} to multibyte
22252 40089afa2b1d * empty log message * Richard M. Stallman <rms@gnu.org> parents: 22138 diff changeset	188 representation, if it isn't already, and returns the result. If
21682 90da2489c498 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21006 diff changeset	189 @var{string} is a multibyte string, it is returned unchanged.
33912 67b6bdbd95c6 8-bit tweaks Dave Love <fx@gnu.org> parents: 32523 diff changeset	190 The function @code{unibyte-char-to-multibyte} is used to convert
67b6bdbd95c6 8-bit tweaks Dave Love <fx@gnu.org> parents: 32523 diff changeset	191 each unibyte character to a multibyte character.
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	192 @end defun
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	193
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	194 @node Selecting a Representation
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	195 @section Selecting a Representation
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	196
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	197 Sometimes it is useful to examine an existing buffer or string as
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	198 multibyte when it was unibyte, or vice versa.
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	199
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	200 @defun set-buffer-multibyte multibyte
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	201 Set the representation type of the current buffer. If @var{multibyte}
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	202 is non-@code{nil}, the buffer becomes multibyte. If @var{multibyte}
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	203 is @code{nil}, the buffer becomes unibyte.
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	204
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	205 This function leaves the buffer contents unchanged when viewed as a
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	206 sequence of bytes. As a consequence, it can change the contents viewed
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	207 as characters; a sequence of two bytes which is treated as one character
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	208 in multibyte representation will count as two characters in unibyte
29265 69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	209 representation. Character codes 128 through 159 are an exception. They
69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	210 are represented by one byte in a unibyte buffer, but when the buffer is
69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	211 set to multibyte, they are converted to two-byte sequences, and vice
69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	212 versa.
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	213
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	214 This function sets @code{enable-multibyte-characters} to record which
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	215 representation is in use. It also adjusts various data in the buffer
21682 90da2489c498 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21006 diff changeset	216 (including overlays, text properties and markers) so that they cover the
90da2489c498 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21006 diff changeset	217 same text as they did before.
24951 7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	218
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	219 You cannot use @code{set-buffer-multibyte} on an indirect buffer,
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	220 because indirect buffers always inherit the representation of the
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	221 base buffer.
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	222 @end defun
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	223
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	224 @defun string-as-unibyte string
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	225 This function returns a string with the same bytes as @var{string} but
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	226 treating each byte as a character. This means that the value may have
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	227 more characters than @var{string} has.
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	228
24951 7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	229 If @var{string} is already a unibyte string, then the value is
33912 67b6bdbd95c6 8-bit tweaks Dave Love <fx@gnu.org> parents: 32523 diff changeset	230 @var{string} itself. Otherwise it is a newly created string, with no
67b6bdbd95c6 8-bit tweaks Dave Love <fx@gnu.org> parents: 32523 diff changeset	231 text properties. If @var{string} is multibyte, any characters it
67b6bdbd95c6 8-bit tweaks Dave Love <fx@gnu.org> parents: 32523 diff changeset	232 contains of charset @var{eight-bit-control} or @var{eight-bit-graphic}
67b6bdbd95c6 8-bit tweaks Dave Love <fx@gnu.org> parents: 32523 diff changeset	233 are converted to the corresponding single byte.
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	234 @end defun
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	235
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	236 @defun string-as-multibyte string
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	237 This function returns a string with the same bytes as @var{string} but
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	238 treating each multibyte sequence as one character. This means that the
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	239 value may have fewer characters than @var{string} has.
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	240
24951 7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	241 If @var{string} is already a multibyte string, then the value is
33912 67b6bdbd95c6 8-bit tweaks Dave Love <fx@gnu.org> parents: 32523 diff changeset	242 @var{string} itself. Otherwise it is a newly created string, with no
67b6bdbd95c6 8-bit tweaks Dave Love <fx@gnu.org> parents: 32523 diff changeset	243 text properties. If @var{string} is unibyte and contains any individual
67b6bdbd95c6 8-bit tweaks Dave Love <fx@gnu.org> parents: 32523 diff changeset	244 8-bit bytes (i.e.@: not part of a multibyte form), they are converted to
67b6bdbd95c6 8-bit tweaks Dave Love <fx@gnu.org> parents: 32523 diff changeset	245 the corresponding multibyte character of charset @var{eight-bit-control}
67b6bdbd95c6 8-bit tweaks Dave Love <fx@gnu.org> parents: 32523 diff changeset	246 or @var{eight-bit-graphic}.
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	247 @end defun
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	248
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	249 @node Character Codes
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	250 @section Character Codes
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	251 @cindex character codes
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	252
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	253 The unibyte and multibyte text representations use different character
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	254 codes. The valid character codes for unibyte representation range from
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	255 0 to 255---the values that can fit in one byte. The valid character
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	256 codes for multibyte representation range from 0 to 524287, but not all
28877 607e317d50b5 * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 28635 diff changeset	257 values in that range are valid. The values 128 through 255 are not
32523 4881cd839f12 * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 29339 diff changeset	258 entirely proper in multibyte text, but they can occur if you do explicit
28877 607e317d50b5 * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 28635 diff changeset	259 encoding and decoding (@pxref{Explicit Encoding}). Some other character
607e317d50b5 * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 28635 diff changeset	260 codes cannot occur at all in multibyte text. Only the @sc{ascii} codes
32523 4881cd839f12 * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 29339 diff changeset	261 0 through 127 are completely legitimate in both representations.
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	262
29265 69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	263 @defun char-valid-p charcode &optional genericp
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	264 This returns @code{t} if @var{charcode} is valid for either one of the two
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	265 text representations.
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	266
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	267 @example
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	268 (char-valid-p 65)
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	269 @result{} t
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	270 (char-valid-p 256)
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	271 @result{} nil
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	272 (char-valid-p 2248)
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	273 @result{} t
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	274 @end example
29265 69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	275
69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	276 If the optional argument @var{genericp} is non-nil, this function
69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	277 returns @code{t} if @var{charcode} is a generic character
29339 d831c2ad9313 Fix xref Dave Love <fx@gnu.org> parents: 29265 diff changeset	278 (@pxref{Splitting Characters}).
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	279 @end defun
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	280
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	281 @node Character Sets
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	282 @section Character Sets
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	283 @cindex character sets
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	284
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	285 Emacs classifies characters into various @dfn{character sets}, each of
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	286 which has a name which is a symbol. Each character belongs to one and
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	287 only one character set.
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	288
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	289 In general, there is one character set for each distinct script. For
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	290 example, @code{latin-iso8859-1} is one character set,
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	291 @code{greek-iso8859-7} is another, and @code{ascii} is another. An
21682 90da2489c498 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21006 diff changeset	292 Emacs character set can hold at most 9025 characters; therefore, in some
90da2489c498 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21006 diff changeset	293 cases, characters that would logically be grouped together are split
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	294 into several character sets. For example, one set of Chinese
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	295 characters, generally known as Big 5, is divided into two Emacs
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	296 character sets, @code{chinese-big5-1} and @code{chinese-big5-2}.
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	297
28900 ac620ff5fd5d * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 28887 diff changeset	298 @sc{ascii} characters are in character set @code{ascii}. The
ac620ff5fd5d * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 28887 diff changeset	299 non-@sc{ascii} characters 128 through 159 are in character set
ac620ff5fd5d * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 28887 diff changeset	300 @code{eight-bit-control}, and codes 160 through 255 are in character set
ac620ff5fd5d * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 28887 diff changeset	301 @code{eight-bit-graphic}.
ac620ff5fd5d * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 28887 diff changeset	302
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	303 @defun charsetp object
25751 467b88fab665 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 24952 diff changeset	304 Returns @code{t} if @var{object} is a symbol that names a character set,
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	305 @code{nil} otherwise.
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	306 @end defun
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	307
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	308 @defun charset-list
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	309 This function returns a list of all defined character set names.
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	310 @end defun
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	311
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	312 @defun char-charset character
24951 7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	313 This function returns the name of the character set that @var{character}
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	314 belongs to.
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	315 @end defun
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	316
25751 467b88fab665 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 24952 diff changeset	317 @defun charset-plist charset
467b88fab665 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 24952 diff changeset	318 @tindex charset-plist
467b88fab665 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 24952 diff changeset	319 This function returns the charset property list of the character set
467b88fab665 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 24952 diff changeset	320 @var{charset}. Although @var{charset} is a symbol, this is not the same
467b88fab665 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 24952 diff changeset	321 as the property list of that symbol. Charset properties are used for
29265 69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	322 special purposes within Emacs; for example,
69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	323 @code{preferred-coding-system} helps determine which coding system to
69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	324 use to encode characters in a charset.
25751 467b88fab665 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 24952 diff changeset	325 @end defun
467b88fab665 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 24952 diff changeset	326
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	327 @node Chars and Bytes
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	328 @section Characters and Bytes
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	329 @cindex bytes and characters
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	330
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	331 @cindex introduction sequence
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	332 @cindex dimension (of character set)
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	333 In multibyte representation, each character occupies one or more
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	334 bytes. Each character set has an @dfn{introduction sequence}, which is
25751 467b88fab665 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 24952 diff changeset	335 normally one or two bytes long. (Exception: the @sc{ascii} character
29265 69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	336 set and the @sc{eight-bit-graphic} character set have a zero-length
69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	337 introduction sequence.) The introduction sequence is the beginning of
69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	338 the byte sequence for any character in the character set. The rest of
69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	339 the character's bytes distinguish it from the other characters in the
69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	340 same character set. Depending on the character set, there are either
69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	341 one or two distinguishing bytes; the number of such bytes is called the
69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	342 @dfn{dimension} of the character set.
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	343
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	344 @defun charset-dimension charset
24951 7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	345 This function returns the dimension of @var{charset}; at present, the
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	346 dimension is always 1 or 2.
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	347 @end defun
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	348
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	349 @defun charset-bytes charset
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	350 @tindex charset-bytes
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	351 This function returns the number of bytes used to represent a character
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	352 in character set @var{charset}.
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	353 @end defun
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	354
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	355 This is the simplest way to determine the byte length of a character
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	356 set's introduction sequence:
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	357
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	358 @example
24951 7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	359 (- (charset-bytes @var{charset})
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	360 (charset-dimension @var{charset}))
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	361 @end example
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	362
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	363 @node Splitting Characters
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	364 @section Splitting Characters
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	365
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	366 The functions in this section convert between characters and the byte
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	367 values used to represent them. For most purposes, there is no need to
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	368 be concerned with the sequence of bytes used to represent a character,
21682 90da2489c498 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21006 diff changeset	369 because Emacs translates automatically when necessary.
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	370
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	371 @defun split-char character
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	372 Return a list containing the name of the character set of
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	373 @var{character}, followed by one or two byte values (integers) which
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	374 identify @var{character} within that character set. The number of byte
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	375 values is the character set's dimension.
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	376
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	377 @example
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	378 (split-char 2248)
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	379 @result{} (latin-iso8859-1 72)
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	380 (split-char 65)
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	381 @result{} (ascii 65)
29265 69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	382 (split-char 128)
69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	383 @result{} (eight-bit-control 128)
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	384 @end example
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	385 @end defun
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	386
34811 c2170032744b make-char change Dave Love <fx@gnu.org> parents: 33912 diff changeset	387 @defun make-char charset &optional code1 code2
c2170032744b make-char change Dave Love <fx@gnu.org> parents: 33912 diff changeset	388 This function returns the character in character set @var{charset} whose
c2170032744b make-char change Dave Love <fx@gnu.org> parents: 33912 diff changeset	389 position codes are @var{code1} and @var{code2}. This is roughly the
c2170032744b make-char change Dave Love <fx@gnu.org> parents: 33912 diff changeset	390 inverse of @code{split-char}. Normally, you should specify either one
c2170032744b make-char change Dave Love <fx@gnu.org> parents: 33912 diff changeset	391 or both of @var{code1} and @var{code2} according to the dimension of
c2170032744b make-char change Dave Love <fx@gnu.org> parents: 33912 diff changeset	392 @var{charset}. For example,
21006 00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	393
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	394 @example
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	395 (make-char 'latin-iso8859-1 72)
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	396 @result{} 2248
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	397 @end example
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	398 @end defun
00022857f529 Initial revision Richard M. Stallman <rms@gnu.org> parents: diff changeset	399
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	400 @cindex generic characters
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	401 If you call @code{make-char} with no @var{byte-values}, the result is
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	402 a @dfn{generic character} which stands for @var{charset}. A generic
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	403 character is an integer, but it is @emph{not} valid for insertion in the
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	404 buffer as a character. It can be used in @code{char-table-range} to
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	405 refer to the whole character set (@pxref{Char-Tables}).
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	406 @code{char-valid-p} returns @code{nil} for generic characters.
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	407 For example:
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	408
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	409 @example
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	410 (make-char 'latin-iso8859-1)
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	411 @result{} 2176
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	412 (char-valid-p 2176)
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	413 @result{} nil
29265 69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	414 (char-valid-p 2176 t)
69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	415 @result{} t
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	416 (split-char 2176)
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	417 @result{} (latin-iso8859-1 0)
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	418 @end example
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	419
29265 69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	420 The character sets @sc{ascii}, @sc{eight-bit-control}, and
34811 c2170032744b make-char change Dave Love <fx@gnu.org> parents: 33912 diff changeset	421 @sc{eight-bit-graphic} don't have corresponding generic characters. If
c2170032744b make-char change Dave Love <fx@gnu.org> parents: 33912 diff changeset	422 @var{charset} is one of them and you don't supply @var{code1},
c2170032744b make-char change Dave Love <fx@gnu.org> parents: 33912 diff changeset	423 @code{make-char} returns the character code corresponding to the
c2170032744b make-char change Dave Love <fx@gnu.org> parents: 33912 diff changeset	424 smallest code in @var{charset}.
29265 69f20c18d6eb * empty log message * Kenichi Handa <handa@m17n.org> parents: 28900 diff changeset	425
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	426 @node Scanning Charsets
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	427 @section Scanning for Character Sets
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	428
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	429 Sometimes it is useful to find out which character sets appear in a
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	430 part of a buffer or a string. One use for this is in determining which
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	431 coding systems (@pxref{Coding Systems}) are capable of representing all
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	432 of the text in question.
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	433
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	434 @defun find-charset-region beg end &optional translation
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	435 This function returns a list of the character sets that appear in the
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	436 current buffer between positions @var{beg} and @var{end}.
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	437
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	438 The optional argument @var{translation} specifies a translation table to
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	439 be used in scanning the text (@pxref{Translation of Characters}). If it
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	440 is non-@code{nil}, then each character in the region is translated
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	441 through this table, and the value returned describes the translated
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	442 characters instead of the characters actually in the buffer.
28887 0778eff185b6 * empty log message * Gerd Moellmann <gerd@gnu.org> parents: 28877 diff changeset	443 @end defun
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	444
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	445 @defun find-charset-string string &optional translation
24951 7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	446 This function returns a list of the character sets that appear in the
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	447 string @var{string}. It is just like @code{find-charset-region}, except
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	448 that it applies to the contents of @var{string} instead of part of the
7451b1458af1 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 23433 diff changeset	449 current buffer.
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	450 @end defun
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	451
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	452 @node Translation of Characters
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	453 @section Translation of Characters
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	454 @cindex character translation tables
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	455 @cindex translation tables
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	456
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	457 A @dfn{translation table} specifies a mapping of characters
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	458 into characters. These tables are used in encoding and decoding, and
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	459 for other purposes. Some coding systems specify their own particular
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	460 translation tables; there are also default translation tables which
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	461 apply to all other coding systems.
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	462
25751 467b88fab665 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 24952 diff changeset	463 @defun make-translation-table &rest translations
467b88fab665 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 24952 diff changeset	464 This function returns a translation table based on the argument
35752 e1d9a16467ae * empty log message * Dave Love <fx@gnu.org> parents: 35493 diff changeset	465 @var{translations}. Each element of @var{translations} should be a
e1d9a16467ae * empty log message * Dave Love <fx@gnu.org> parents: 35493 diff changeset	466 list of elements of the form @code{(@var{from} . @var{to})}; this says
e1d9a16467ae * empty log message * Dave Love <fx@gnu.org> parents: 35493 diff changeset	467 to translate the character @var{from} into @var{to}.
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	468
35493 679a73dad19a make-translation-table addition Dave Love <fx@gnu.org> parents: 34811 diff changeset	469 The arguments and the forms in each argument are processed in order,
679a73dad19a make-translation-table addition Dave Love <fx@gnu.org> parents: 34811 diff changeset	470 and if a previous form already translates @var{to} to some other
679a73dad19a make-translation-table addition Dave Love <fx@gnu.org> parents: 34811 diff changeset	471 character, say @var{to-alt}, @var{from} is also translated to
679a73dad19a make-translation-table addition Dave Love <fx@gnu.org> parents: 34811 diff changeset	472 @var{to-alt}.
679a73dad19a make-translation-table addition Dave Love <fx@gnu.org> parents: 34811 diff changeset	473
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	474 You can also map one whole character set into another character set with
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	475 the same dimension. To do this, you specify a generic character (which
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	476 designates a character set) for @var{from} (@pxref{Splitting Characters}).
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	477 In this case, @var{to} should also be a generic character, for another
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	478 character set of the same dimension. Then the translation table
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	479 translates each character of @var{from}'s character set into the
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	480 corresponding character of @var{to}'s character set.
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	481 @end defun
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	482
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	483 In decoding, the translation table's translations are applied to the
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	484 characters that result from ordinary decoding. If a coding system has
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	485 property @code{character-translation-table-for-decode}, that specifies
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	486 the translation table to use. Otherwise, if
23433 a53274056f20 Fix names of standard-translation-table-for-decode(encode). Richard M. Stallman <rms@gnu.org> parents: 23110 diff changeset	487 @code{standard-translation-table-for-decode} is non-@code{nil}, decoding
a53274056f20 Fix names of standard-translation-table-for-decode(encode). Richard M. Stallman <rms@gnu.org> parents: 23110 diff changeset	488 uses that table.
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	489
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	490 In encoding, the translation table's translations are applied to the
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	491 characters in the buffer, and the result of translation is actually
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	492 encoded. If a coding system has property
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	493 @code{character-translation-table-for-encode}, that specifies the
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	494 translation table to use. Otherwise the variable
23433 a53274056f20 Fix names of standard-translation-table-for-decode(encode). Richard M. Stallman <rms@gnu.org> parents: 23110 diff changeset	495 @code{standard-translation-table-for-encode} specifies the translation
a53274056f20 Fix names of standard-translation-table-for-decode(encode). Richard M. Stallman <rms@gnu.org> parents: 23110 diff changeset	496 table.
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	497
23433 a53274056f20 Fix names of standard-translation-table-for-decode(encode). Richard M. Stallman <rms@gnu.org> parents: 23110 diff changeset	498 @defvar standard-translation-table-for-decode
22138 d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	499 This is the default translation table for decoding, for
d4ac295a98b3 * empty log message * Richard M. Stallman <rms@gnu.org> parents: 21682 diff changeset	500 coding systems that don't specify any other translation table.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

1 @c -*-texinfo-*-

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

2 @c This is part of the GNU Emacs Lisp Reference Manual.

27189

d2e5f1b7d8e2 Update copyrights.

Gerd Moellmann <gerd@gnu.org>

parents: 27187

diff changeset

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

4 @c See the file elisp.texi for copying conditions.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

5 @setfilename ../info/characters

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

6 @node Non-ASCII Characters, Searching and Matching, Text, Top

27374

0f5edee5242b *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27362

diff changeset

7 @chapter Non-@sc{ascii} Characters

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

8 @cindex multibyte characters

27374

0f5edee5242b *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27362

diff changeset

9 @cindex non-@sc{ascii} characters

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

10

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

11 This chapter covers the special issues relating to non-@sc{ascii}

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

12 characters and how they are stored in strings and buffers.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

13

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

14 @menu

28635

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

15 * Text Representations:: Unibyte and multibyte representations

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

16 * Converting Representations:: Converting unibyte to multibyte and vice versa.

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

17 * Selecting a Representation:: Treating a byte sequence as unibyte or multi.

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

18 * Character Codes:: How unibyte and multibyte relate to

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

19 codes of individual characters.

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

20 * Character Sets:: The space of possible characters codes

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

21 is divided into various character sets.

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

22 * Chars and Bytes:: More information about multibyte encodings.

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

23 * Splitting Characters:: Converting a character to its byte sequence.

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

24 * Scanning Charsets:: Which character sets are used in a buffer?

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

25 * Translation of Characters:: Translation tables are used for conversion.

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

26 * Coding Systems:: Coding systems are conversions for saving files.

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

27 * Input Methods:: Input methods allow users to enter various

40834

9552d64e0367 Fix typo.

Richard M. Stallman <rms@gnu.org>

parents: 39221

diff changeset

28 non-ASCII characters without special keyboards.

28635

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

29 * Locales:: Interacting with the POSIX locale.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

30 @end menu

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

31

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

32 @node Text Representations

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

33 @section Text Representations

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

34 @cindex text representations

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

35

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

36 Emacs has two @dfn{text representations}---two ways to represent text

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

37 in a string or buffer. These are called @dfn{unibyte} and

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

38 @dfn{multibyte}. Each string, and each buffer, uses one of these two

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

39 representations. For most purposes, you can ignore the issue of

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

40 representations, because Emacs converts text between them as

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

41 appropriate. Occasionally in Lisp programming you will need to pay

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

42 attention to the difference.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

43

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

44 @cindex unibyte text

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

45 In unibyte representation, each character occupies one byte and

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

46 therefore the possible character codes range from 0 to 255. Codes 0

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

47 through 127 are @sc{ascii} characters; the codes from 128 through 255

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

48 are used for one non-@sc{ascii} character set (you can choose which

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

49 character set by setting the variable @code{nonascii-insert-offset}).

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

50

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

51 @cindex leading code

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

52 @cindex multibyte text

22252

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

53 @cindex trailing codes

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

54 In multibyte representation, a character may occupy more than one

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

55 byte, and as a result, the full range of Emacs character codes can be

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

56 stored. The first byte of a multibyte character is always in the range

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

57 128 through 159 (octal 0200 through 0237). These values are called

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

58 @dfn{leading codes}. The second and subsequent bytes of a multibyte

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

59 character are always in the range 160 through 255 (octal 0240 through

22252

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

60 0377); these values are @dfn{trailing codes}.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

61

28877

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

62 Some sequences of bytes are not valid in multibyte text: for example,

32523

4881cd839f12 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 29339

diff changeset

63 a single isolated byte in the range 128 through 159 is not allowed. But

4881cd839f12 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 29339

diff changeset

64 character codes 128 through 159 can appear in multibyte text,

4881cd839f12 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 29339

diff changeset

65 represented as two-byte sequences. All the character codes 128 through

4881cd839f12 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 29339

diff changeset

66 255 are possible (though slightly abnormal) in multibyte text; they

28877

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

67 appear in multibyte buffers and strings when you do explicit encoding

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

68 and decoding (@pxref{Explicit Encoding}).

24951

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

69

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

70 In a buffer, the buffer-local value of the variable

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

71 @code{enable-multibyte-characters} specifies the representation used.

24952

a6db4671c7a0 *** empty log message ***

Karl Heuer <kwzh@gnu.org>

parents: 24951

diff changeset

72 The representation for a string is determined and recorded in the string

a6db4671c7a0 *** empty log message ***

Karl Heuer <kwzh@gnu.org>

parents: 24951

diff changeset

73 when the string is constructed.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

74

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

75 @defvar enable-multibyte-characters

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

76 This variable specifies the current buffer's text representation.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

77 If it is non-@code{nil}, the buffer contains multibyte text; otherwise,

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

78 it contains unibyte text.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

79

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

80 You cannot set this variable directly; instead, use the function

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

81 @code{set-buffer-multibyte} to change a buffer's representation.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

82 @end defvar

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

83

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

84 @defvar default-enable-multibyte-characters

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

85 This variable's value is entirely equivalent to @code{(default-value

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

86 'enable-multibyte-characters)}, and setting this variable changes that

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

87 default value. Setting the local binding of

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

88 @code{enable-multibyte-characters} in a specific buffer is not allowed,

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

89 but changing the default value is supported, and it is a reasonable

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

90 thing to do, because it has no effect on existing buffers.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

91

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

92 The @samp{--unibyte} command line option does its job by setting the

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

93 default value to @code{nil} early in startup.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

94 @end defvar

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

95

24951

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

96 @defun position-bytes position

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

97 @tindex position-bytes

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

98 Return the byte-position corresponding to buffer position @var{position}

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

99 in the current buffer.

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

100 @end defun

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

101

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

102 @defun byte-to-position byte-position

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

103 @tindex byte-to-position

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

104 Return the buffer position corresponding to byte-position

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

105 @var{byte-position} in the current buffer.

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

106 @end defun

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

107

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

108 @defun multibyte-string-p string

24951

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

109 Return @code{t} if @var{string} is a multibyte string.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

110 @end defun

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

111

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

112 @node Converting Representations

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

113 @section Converting Text Representations

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

114

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

115 Emacs can convert unibyte text to multibyte; it can also convert

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

116 multibyte text to unibyte, though this conversion loses information. In

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

117 general these conversions happen when inserting text into a buffer, or

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

118 when putting text from several strings together in one string. You can

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

119 also explicitly convert a string's contents to either representation.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

120

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

121 Emacs chooses the representation for a string based on the text that

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

122 it is constructed from. The general rule is to convert unibyte text to

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

123 multibyte text when combining it with other multibyte text, because the

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

124 multibyte representation is more general and can hold whatever

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

125 characters the unibyte text has.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

126

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

127 When inserting text into a buffer, Emacs converts the text to the

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

128 buffer's representation, as specified by

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

129 @code{enable-multibyte-characters} in that buffer. In particular, when

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

130 you insert multibyte text into a unibyte buffer, Emacs converts the text

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

131 to unibyte, even though this conversion cannot in general preserve all

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

132 the characters that might be in the multibyte text. The other natural

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

133 alternative, to convert the buffer contents to multibyte, is not

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

134 acceptable because the buffer's representation is a choice made by the

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

135 user that cannot be overridden automatically.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

136

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

137 Converting unibyte text to multibyte text leaves @sc{ascii} characters

32523

4881cd839f12 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 29339

diff changeset

138 unchanged, and likewise character codes 128 through 159. It converts

4881cd839f12 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 29339

diff changeset

139 the non-@sc{ascii} codes 160 through 255 by adding the value

4881cd839f12 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 29339

diff changeset

140 @code{nonascii-insert-offset} to each character code. By setting this

4881cd839f12 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 29339

diff changeset

141 variable, you specify which character set the unibyte characters

4881cd839f12 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 29339

diff changeset

142 correspond to (@pxref{Character Sets}). For example, if

4881cd839f12 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 29339

diff changeset

143 @code{nonascii-insert-offset} is 2048, which is @code{(- (make-char

4881cd839f12 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 29339

diff changeset

144 'latin-iso8859-1) 128)}, then the unibyte non-@sc{ascii} characters

4881cd839f12 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 29339

diff changeset

145 correspond to Latin 1. If it is 2688, which is @code{(- (make-char

4881cd839f12 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 29339

diff changeset

146 'greek-iso8859-7) 128)}, then they correspond to Greek letters.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

147

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

148 Converting multibyte text to unibyte is simpler: it discards all but

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

149 the low 8 bits of each character code. If @code{nonascii-insert-offset}

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

150 has a reasonable value, corresponding to the beginning of some character

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

151 set, this conversion is the inverse of the other: converting unibyte

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

152 text to multibyte and back to unibyte reproduces the original unibyte

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

153 text.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

154

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

155 @defvar nonascii-insert-offset

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

156 This variable specifies the amount to add to a non-@sc{ascii} character

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

157 when converting unibyte text to multibyte. It also applies when

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

158 @code{self-insert-command} inserts a character in the unibyte

29339

d831c2ad9313 Fix xref

Dave Love <fx@gnu.org>

parents: 29265

diff changeset

159 non-@sc{ascii} range, 128 through 255. However, the functions

29265

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

160 @code{insert} and @code{insert-char} do not perform this conversion.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

161

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

162 The right value to use to select character set @var{cs} is @code{(-

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

163 (make-char @var{cs}) 128)}. If the value of

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

164 @code{nonascii-insert-offset} is zero, then conversion actually uses the

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

165 value for the Latin 1 character set, rather than zero.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

166 @end defvar

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

167

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

168 @defvar nonascii-translation-table

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

169 This variable provides a more general alternative to

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

170 @code{nonascii-insert-offset}. You can use it to specify independently

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

171 how to translate each code in the range of 128 through 255 into a

29265

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

172 multibyte character. The value should be a char-table, or @code{nil}.

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

173 If this is non-@code{nil}, it overrides @code{nonascii-insert-offset}.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

174 @end defvar

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

175

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

176 @defun string-make-unibyte string

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

177 This function converts the text of @var{string} to unibyte

22252

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

178 representation, if it isn't already, and returns the result. If

45652

ccaf0199f9dc (Converting Representations): Update the description of what

Eli Zaretskii <eliz@gnu.org>

parents: 43634

diff changeset

179 @var{string} is a unibyte string, it is returned unchanged. Multibyte

ccaf0199f9dc (Converting Representations): Update the description of what

Eli Zaretskii <eliz@gnu.org>

parents: 43634

diff changeset

180 character codes are converted to unibyte according to

ccaf0199f9dc (Converting Representations): Update the description of what

Eli Zaretskii <eliz@gnu.org>

parents: 43634

diff changeset

181 @code{nonascii-translation-table} or, if that is @code{nil}, using

ccaf0199f9dc (Converting Representations): Update the description of what

Eli Zaretskii <eliz@gnu.org>

parents: 43634

diff changeset

182 @code{nonascii-insert-offset}. If the lookup in the translation table

ccaf0199f9dc (Converting Representations): Update the description of what

Eli Zaretskii <eliz@gnu.org>

parents: 43634

diff changeset

183 fails, this function takes just the low 8 bits of each character.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

184 @end defun

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

185

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

186 @defun string-make-multibyte string

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

187 This function converts the text of @var{string} to multibyte

22252

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

188 representation, if it isn't already, and returns the result. If

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

189 @var{string} is a multibyte string, it is returned unchanged.

33912

67b6bdbd95c6 8-bit tweaks

Dave Love <fx@gnu.org>

parents: 32523

diff changeset

190 The function @code{unibyte-char-to-multibyte} is used to convert

67b6bdbd95c6 8-bit tweaks

Dave Love <fx@gnu.org>

parents: 32523

diff changeset

191 each unibyte character to a multibyte character.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

192 @end defun

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

193

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

194 @node Selecting a Representation

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

195 @section Selecting a Representation

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

196

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

197 Sometimes it is useful to examine an existing buffer or string as

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

198 multibyte when it was unibyte, or vice versa.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

199

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

200 @defun set-buffer-multibyte multibyte

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

201 Set the representation type of the current buffer. If @var{multibyte}

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

202 is non-@code{nil}, the buffer becomes multibyte. If @var{multibyte}

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

203 is @code{nil}, the buffer becomes unibyte.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

204

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

205 This function leaves the buffer contents unchanged when viewed as a

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

206 sequence of bytes. As a consequence, it can change the contents viewed

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

207 as characters; a sequence of two bytes which is treated as one character

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

208 in multibyte representation will count as two characters in unibyte

29265

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

209 representation. Character codes 128 through 159 are an exception. They

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

210 are represented by one byte in a unibyte buffer, but when the buffer is

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

211 set to multibyte, they are converted to two-byte sequences, and vice

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

212 versa.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

213

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

214 This function sets @code{enable-multibyte-characters} to record which

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

215 representation is in use. It also adjusts various data in the buffer

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

216 (including overlays, text properties and markers) so that they cover the

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

217 same text as they did before.

24951

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

218

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

219 You cannot use @code{set-buffer-multibyte} on an indirect buffer,

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

220 because indirect buffers always inherit the representation of the

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

221 base buffer.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

222 @end defun

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

223

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

224 @defun string-as-unibyte string

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

225 This function returns a string with the same bytes as @var{string} but

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

226 treating each byte as a character. This means that the value may have

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

227 more characters than @var{string} has.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

228

24951

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

229 If @var{string} is already a unibyte string, then the value is

33912

67b6bdbd95c6 8-bit tweaks

Dave Love <fx@gnu.org>

parents: 32523

diff changeset

230 @var{string} itself. Otherwise it is a newly created string, with no

67b6bdbd95c6 8-bit tweaks

Dave Love <fx@gnu.org>

parents: 32523

diff changeset

231 text properties. If @var{string} is multibyte, any characters it

67b6bdbd95c6 8-bit tweaks

Dave Love <fx@gnu.org>

parents: 32523

diff changeset

232 contains of charset @var{eight-bit-control} or @var{eight-bit-graphic}

67b6bdbd95c6 8-bit tweaks

Dave Love <fx@gnu.org>

parents: 32523

diff changeset

233 are converted to the corresponding single byte.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

234 @end defun

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

235

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

236 @defun string-as-multibyte string

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

237 This function returns a string with the same bytes as @var{string} but

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

238 treating each multibyte sequence as one character. This means that the

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

239 value may have fewer characters than @var{string} has.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

240

24951

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

241 If @var{string} is already a multibyte string, then the value is

33912

67b6bdbd95c6 8-bit tweaks

Dave Love <fx@gnu.org>

parents: 32523

diff changeset

242 @var{string} itself. Otherwise it is a newly created string, with no

67b6bdbd95c6 8-bit tweaks

Dave Love <fx@gnu.org>

parents: 32523

diff changeset

243 text properties. If @var{string} is unibyte and contains any individual

67b6bdbd95c6 8-bit tweaks

Dave Love <fx@gnu.org>

parents: 32523

diff changeset

244 8-bit bytes (i.e.@: not part of a multibyte form), they are converted to

67b6bdbd95c6 8-bit tweaks

Dave Love <fx@gnu.org>

parents: 32523

diff changeset

245 the corresponding multibyte character of charset @var{eight-bit-control}

67b6bdbd95c6 8-bit tweaks

Dave Love <fx@gnu.org>

parents: 32523

diff changeset

246 or @var{eight-bit-graphic}.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

247 @end defun

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

248

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

249 @node Character Codes

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

250 @section Character Codes

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

251 @cindex character codes

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

252

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

253 The unibyte and multibyte text representations use different character

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

254 codes. The valid character codes for unibyte representation range from

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

255 0 to 255---the values that can fit in one byte. The valid character

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

256 codes for multibyte representation range from 0 to 524287, but not all

28877

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

257 values in that range are valid. The values 128 through 255 are not

32523

4881cd839f12 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 29339

diff changeset

258 entirely proper in multibyte text, but they can occur if you do explicit

28877

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

259 encoding and decoding (@pxref{Explicit Encoding}). Some other character

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

260 codes cannot occur at all in multibyte text. Only the @sc{ascii} codes

32523

4881cd839f12 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 29339

diff changeset

261 0 through 127 are completely legitimate in both representations.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

262

29265

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

263 @defun char-valid-p charcode &optional genericp

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

264 This returns @code{t} if @var{charcode} is valid for either one of the two

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

265 text representations.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

266

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

267 @example

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

268 (char-valid-p 65)

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

269 @result{} t

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

270 (char-valid-p 256)

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

271 @result{} nil

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

272 (char-valid-p 2248)

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

273 @result{} t

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

274 @end example

29265

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

275

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

276 If the optional argument @var{genericp} is non-nil, this function

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

277 returns @code{t} if @var{charcode} is a generic character

29339

d831c2ad9313 Fix xref

Dave Love <fx@gnu.org>

parents: 29265

diff changeset

278 (@pxref{Splitting Characters}).

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

279 @end defun

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

280

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

281 @node Character Sets

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

282 @section Character Sets

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

283 @cindex character sets

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

284

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

285 Emacs classifies characters into various @dfn{character sets}, each of

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

286 which has a name which is a symbol. Each character belongs to one and

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

287 only one character set.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

288

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

289 In general, there is one character set for each distinct script. For

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

290 example, @code{latin-iso8859-1} is one character set,

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

291 @code{greek-iso8859-7} is another, and @code{ascii} is another. An

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

292 Emacs character set can hold at most 9025 characters; therefore, in some

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

293 cases, characters that would logically be grouped together are split

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

294 into several character sets. For example, one set of Chinese

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

295 characters, generally known as Big 5, is divided into two Emacs

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

296 character sets, @code{chinese-big5-1} and @code{chinese-big5-2}.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

297

28900

ac620ff5fd5d *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28887

diff changeset

298 @sc{ascii} characters are in character set @code{ascii}. The

ac620ff5fd5d *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28887

diff changeset

299 non-@sc{ascii} characters 128 through 159 are in character set

ac620ff5fd5d *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28887

diff changeset

300 @code{eight-bit-control}, and codes 160 through 255 are in character set

ac620ff5fd5d *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28887

diff changeset

301 @code{eight-bit-graphic}.

ac620ff5fd5d *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28887

diff changeset

302

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

303 @defun charsetp object

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

304 Returns @code{t} if @var{object} is a symbol that names a character set,

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

305 @code{nil} otherwise.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

306 @end defun

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

307

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

308 @defun charset-list

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

309 This function returns a list of all defined character set names.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

310 @end defun

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

311

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

312 @defun char-charset character

24951

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

313 This function returns the name of the character set that @var{character}

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

314 belongs to.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

315 @end defun

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

316

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

317 @defun charset-plist charset

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

318 @tindex charset-plist

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

319 This function returns the charset property list of the character set

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

320 @var{charset}. Although @var{charset} is a symbol, this is not the same

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

321 as the property list of that symbol. Charset properties are used for

29265

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

322 special purposes within Emacs; for example,

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

323 @code{preferred-coding-system} helps determine which coding system to

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

324 use to encode characters in a charset.

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

325 @end defun

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

326

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

327 @node Chars and Bytes

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

328 @section Characters and Bytes

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

329 @cindex bytes and characters

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

330

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

331 @cindex introduction sequence

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

332 @cindex dimension (of character set)

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

333 In multibyte representation, each character occupies one or more

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

334 bytes. Each character set has an @dfn{introduction sequence}, which is

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

335 normally one or two bytes long. (Exception: the @sc{ascii} character

29265

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

336 set and the @sc{eight-bit-graphic} character set have a zero-length

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

337 introduction sequence.) The introduction sequence is the beginning of

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

338 the byte sequence for any character in the character set. The rest of

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

339 the character's bytes distinguish it from the other characters in the

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

340 same character set. Depending on the character set, there are either

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

341 one or two distinguishing bytes; the number of such bytes is called the

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

342 @dfn{dimension} of the character set.

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

343

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

344 @defun charset-dimension charset

24951

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

345 This function returns the dimension of @var{charset}; at present, the

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

346 dimension is always 1 or 2.

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

347 @end defun

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

348

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

349 @defun charset-bytes charset

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

350 @tindex charset-bytes

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

351 This function returns the number of bytes used to represent a character

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

352 in character set @var{charset}.

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

353 @end defun

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

354

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

355 This is the simplest way to determine the byte length of a character

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

356 set's introduction sequence:

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

357

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

358 @example

24951

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

359 (- (charset-bytes @var{charset})

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

360 (charset-dimension @var{charset}))

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

361 @end example

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

362

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

363 @node Splitting Characters

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

364 @section Splitting Characters

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

365

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

366 The functions in this section convert between characters and the byte

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

367 values used to represent them. For most purposes, there is no need to

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

368 be concerned with the sequence of bytes used to represent a character,

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

369 because Emacs translates automatically when necessary.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

370

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

371 @defun split-char character

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

372 Return a list containing the name of the character set of

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

373 @var{character}, followed by one or two byte values (integers) which

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

374 identify @var{character} within that character set. The number of byte

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

375 values is the character set's dimension.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

376

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

377 @example

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

378 (split-char 2248)

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

379 @result{} (latin-iso8859-1 72)

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

380 (split-char 65)

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

381 @result{} (ascii 65)

29265

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

382 (split-char 128)

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

383 @result{} (eight-bit-control 128)

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

384 @end example

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

385 @end defun

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

386

34811

c2170032744b make-char change

Dave Love <fx@gnu.org>

parents: 33912

diff changeset

387 @defun make-char charset &optional code1 code2

c2170032744b make-char change

Dave Love <fx@gnu.org>

parents: 33912

diff changeset

388 This function returns the character in character set @var{charset} whose

c2170032744b make-char change

Dave Love <fx@gnu.org>

parents: 33912

diff changeset

389 position codes are @var{code1} and @var{code2}. This is roughly the

c2170032744b make-char change

Dave Love <fx@gnu.org>

parents: 33912

diff changeset

390 inverse of @code{split-char}. Normally, you should specify either one

c2170032744b make-char change

Dave Love <fx@gnu.org>

parents: 33912

diff changeset

391 or both of @var{code1} and @var{code2} according to the dimension of

c2170032744b make-char change

Dave Love <fx@gnu.org>

parents: 33912

diff changeset

392 @var{charset}. For example,

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

393

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

394 @example

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

395 (make-char 'latin-iso8859-1 72)

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

396 @result{} 2248

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

397 @end example

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

398 @end defun

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

399

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

400 @cindex generic characters

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

401 If you call @code{make-char} with no @var{byte-values}, the result is

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

402 a @dfn{generic character} which stands for @var{charset}. A generic

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

403 character is an integer, but it is @emph{not} valid for insertion in the

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

404 buffer as a character. It can be used in @code{char-table-range} to

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

405 refer to the whole character set (@pxref{Char-Tables}).

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

406 @code{char-valid-p} returns @code{nil} for generic characters.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

407 For example:

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

408

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

409 @example

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

410 (make-char 'latin-iso8859-1)

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

411 @result{} 2176

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

412 (char-valid-p 2176)

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

413 @result{} nil

29265

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

414 (char-valid-p 2176 t)

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

415 @result{} t

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

416 (split-char 2176)

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

417 @result{} (latin-iso8859-1 0)

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

418 @end example

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

419

29265

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

420 The character sets @sc{ascii}, @sc{eight-bit-control}, and

34811

c2170032744b make-char change

Dave Love <fx@gnu.org>

parents: 33912

diff changeset

421 @sc{eight-bit-graphic} don't have corresponding generic characters. If

c2170032744b make-char change

Dave Love <fx@gnu.org>

parents: 33912

diff changeset

422 @var{charset} is one of them and you don't supply @var{code1},

c2170032744b make-char change

Dave Love <fx@gnu.org>

parents: 33912

diff changeset

423 @code{make-char} returns the character code corresponding to the

c2170032744b make-char change

Dave Love <fx@gnu.org>

parents: 33912

diff changeset

424 smallest code in @var{charset}.

29265

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

425

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

426 @node Scanning Charsets

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

427 @section Scanning for Character Sets

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

428

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

429 Sometimes it is useful to find out which character sets appear in a

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

430 part of a buffer or a string. One use for this is in determining which

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

431 coding systems (@pxref{Coding Systems}) are capable of representing all

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

432 of the text in question.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

433

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

434 @defun find-charset-region beg end &optional translation

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

435 This function returns a list of the character sets that appear in the

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

436 current buffer between positions @var{beg} and @var{end}.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

437

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

438 The optional argument @var{translation} specifies a translation table to

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

439 be used in scanning the text (@pxref{Translation of Characters}). If it

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

440 is non-@code{nil}, then each character in the region is translated

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

441 through this table, and the value returned describes the translated

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

442 characters instead of the characters actually in the buffer.

28887

0778eff185b6 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28877

diff changeset

443 @end defun

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

444

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

445 @defun find-charset-string string &optional translation

24951

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

446 This function returns a list of the character sets that appear in the

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

447 string @var{string}. It is just like @code{find-charset-region}, except

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

448 that it applies to the contents of @var{string} instead of part of the

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

449 current buffer.

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

450 @end defun

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

451

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

452 @node Translation of Characters

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

453 @section Translation of Characters

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

454 @cindex character translation tables

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

455 @cindex translation tables

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

456

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

457 A @dfn{translation table} specifies a mapping of characters

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

458 into characters. These tables are used in encoding and decoding, and

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

459 for other purposes. Some coding systems specify their own particular

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

460 translation tables; there are also default translation tables which

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

461 apply to all other coding systems.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

462

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

463 @defun make-translation-table &rest translations

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

464 This function returns a translation table based on the argument

35752

e1d9a16467ae *** empty log message ***

Dave Love <fx@gnu.org>

parents: 35493

diff changeset

465 @var{translations}. Each element of @var{translations} should be a

e1d9a16467ae *** empty log message ***

Dave Love <fx@gnu.org>

parents: 35493

diff changeset

466 list of elements of the form @code{(@var{from} . @var{to})}; this says

e1d9a16467ae *** empty log message ***

Dave Love <fx@gnu.org>

parents: 35493

diff changeset

467 to translate the character @var{from} into @var{to}.

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

468

35493

679a73dad19a make-translation-table addition

Dave Love <fx@gnu.org>

parents: 34811

diff changeset

469 The arguments and the forms in each argument are processed in order,

679a73dad19a make-translation-table addition

Dave Love <fx@gnu.org>

parents: 34811

diff changeset

470 and if a previous form already translates @var{to} to some other

679a73dad19a make-translation-table addition

Dave Love <fx@gnu.org>

parents: 34811

diff changeset

471 character, say @var{to-alt}, @var{from} is also translated to

679a73dad19a make-translation-table addition

Dave Love <fx@gnu.org>

parents: 34811

diff changeset

472 @var{to-alt}.

679a73dad19a make-translation-table addition

Dave Love <fx@gnu.org>

parents: 34811

diff changeset

473

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

474 You can also map one whole character set into another character set with

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

475 the same dimension. To do this, you specify a generic character (which

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

476 designates a character set) for @var{from} (@pxref{Splitting Characters}).

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

477 In this case, @var{to} should also be a generic character, for another

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

478 character set of the same dimension. Then the translation table

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

479 translates each character of @var{from}'s character set into the

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

480 corresponding character of @var{to}'s character set.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

481 @end defun

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

482

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

483 In decoding, the translation table's translations are applied to the

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

484 characters that result from ordinary decoding. If a coding system has

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

485 property @code{character-translation-table-for-decode}, that specifies

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

486 the translation table to use. Otherwise, if

23433

a53274056f20 Fix names of standard-translation-table-for-decode(encode).

Richard M. Stallman <rms@gnu.org>

parents: 23110

diff changeset

487 @code{standard-translation-table-for-decode} is non-@code{nil}, decoding

a53274056f20 Fix names of standard-translation-table-for-decode(encode).

Richard M. Stallman <rms@gnu.org>

parents: 23110

diff changeset

488 uses that table.

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

489

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

490 In encoding, the translation table's translations are applied to the

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

491 characters in the buffer, and the result of translation is actually

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

492 encoded. If a coding system has property

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

493 @code{character-translation-table-for-encode}, that specifies the

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

494 translation table to use. Otherwise the variable

23433

a53274056f20 Fix names of standard-translation-table-for-decode(encode).

Richard M. Stallman <rms@gnu.org>

parents: 23110

diff changeset

495 @code{standard-translation-table-for-encode} specifies the translation

a53274056f20 Fix names of standard-translation-table-for-decode(encode).

Richard M. Stallman <rms@gnu.org>

parents: 23110

diff changeset

496 table.

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

497

23433

a53274056f20 Fix names of standard-translation-table-for-decode(encode).

Richard M. Stallman <rms@gnu.org>

parents: 23110

diff changeset

498 @defvar standard-translation-table-for-decode

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

499 This is the default translation table for decoding, for

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

500 coding systems that don't specify any other translation table.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

501 @end defvar

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

502

23433

a53274056f20 Fix names of standard-translation-table-for-decode(encode).

Richard M. Stallman <rms@gnu.org>

parents: 23110

diff changeset

503 @defvar standard-translation-table-for-encode

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

504 This is the default translation table for encoding, for

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

505 coding systems that don't specify any other translation table.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

506 @end defvar

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

507

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

508 @node Coding Systems

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

509 @section Coding Systems

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

510

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

511 @cindex coding system

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

512 When Emacs reads or writes a file, and when Emacs sends text to a

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

513 subprocess or receives text from a subprocess, it normally performs

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

514 character code conversion and end-of-line conversion as specified

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

515 by a particular @dfn{coding system}.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

516

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

517 How to define a coding system is an arcane matter, and is not

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

518 documented here.

24951

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

519

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

520 @menu

28635

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

521 * Coding System Basics:: Basic concepts.

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

522 * Encoding and I/O:: How file I/O functions handle coding systems.

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

523 * Lisp and Coding Systems:: Functions to operate on coding system names.

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

524 * User-Chosen Coding Systems:: Asking the user to choose a coding system.

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

525 * Default Coding Systems:: Controlling the default choices.

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

526 * Specifying Coding Systems:: Requesting a particular coding system

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

527 for a single file operation.

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

528 * Explicit Encoding:: Encoding or decoding text without doing I/O.

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

529 * Terminal I/O Encoding:: Use of encoding for terminal I/O.

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

530 * MS-DOS File Types:: How DOS "text" and "binary" files

cda2b6ed6aec *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27374

diff changeset

531 relate to coding systems.

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

532 @end menu

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

533

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

534 @node Coding System Basics

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

535 @subsection Basic Concepts of Coding Systems

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

536

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

537 @cindex character code conversion

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

538 @dfn{Character code conversion} involves conversion between the encoding

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

539 used inside Emacs and some other encoding. Emacs supports many

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

540 different encodings, in that it can convert to and from them. For

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

541 example, it can convert text to or from encodings such as Latin 1, Latin

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

542 2, Latin 3, Latin 4, Latin 5, and several variants of ISO 2022. In some

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

543 cases, Emacs supports several alternative encodings for the same

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

544 characters; for example, there are three coding systems for the Cyrillic

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

545 (Russian) alphabet: ISO, Alternativnyj, and KOI8.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

546

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

547 Most coding systems specify a particular character code for

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

548 conversion, but some of them leave the choice unspecified---to be chosen

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

549 heuristically for each file, based on the data.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

550

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

551 @cindex end of line conversion

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

552 @dfn{End of line conversion} handles three different conventions used

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

553 on various systems for representing end of line in files. The Unix

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

554 convention is to use the linefeed character (also called newline). The

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

555 DOS convention is to use a carriage-return and a linefeed at the end of

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

556 a line. The Mac convention is to use just carriage-return.

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

557

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

558 @cindex base coding system

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

559 @cindex variant coding system

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

560 @dfn{Base coding systems} such as @code{latin-1} leave the end-of-line

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

561 conversion unspecified, to be chosen based on the data. @dfn{Variant

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

562 coding systems} such as @code{latin-1-unix}, @code{latin-1-dos} and

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

563 @code{latin-1-mac} specify the end-of-line conversion explicitly as

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

564 well. Most base coding systems have three corresponding variants whose

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

565 names are formed by adding @samp{-unix}, @samp{-dos} and @samp{-mac}.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

566

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

567 The coding system @code{raw-text} is special in that it prevents

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

568 character code conversion, and causes the buffer visited with that

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

569 coding system to be a unibyte buffer. It does not specify the

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

570 end-of-line conversion, allowing that to be determined as usual by the

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

571 data, and has the usual three variants which specify the end-of-line

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

572 conversion. @code{no-conversion} is equivalent to @code{raw-text-unix}:

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

573 it specifies no conversion of either character codes or end-of-line.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

574

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

575 The coding system @code{emacs-mule} specifies that the data is

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

576 represented in the internal Emacs encoding. This is like

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

577 @code{raw-text} in that no code conversion happens, but different in

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

578 that the result is multibyte data.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

579

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

580 @defun coding-system-get coding-system property

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

581 This function returns the specified property of the coding system

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

582 @var{coding-system}. Most coding system properties exist for internal

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

583 purposes, but one that you might find useful is @code{mime-charset}.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

584 That property's value is the name used in MIME for the character coding

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

585 which this coding system can read and write. Examples:

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

586

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

587 @example

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

588 (coding-system-get 'iso-latin-1 'mime-charset)

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

589 @result{} iso-8859-1

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

590 (coding-system-get 'iso-2022-cn 'mime-charset)

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

591 @result{} iso-2022-cn

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

592 (coding-system-get 'cyrillic-koi8 'mime-charset)

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

593 @result{} koi8-r

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

594 @end example

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

595

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

596 The value of the @code{mime-charset} property is also defined

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

597 as an alias for the coding system.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

598 @end defun

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

599

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

600 @node Encoding and I/O

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

601 @subsection Encoding and I/O

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

602

22252

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

603 The principal purpose of coding systems is for use in reading and

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

604 writing files. The function @code{insert-file-contents} uses

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

605 a coding system for decoding the file data, and @code{write-region}

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

606 uses one to encode the buffer contents.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

607

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

608 You can specify the coding system to use either explicitly

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

609 (@pxref{Specifying Coding Systems}), or implicitly using the defaulting

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

610 mechanism (@pxref{Default Coding Systems}). But these methods may not

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

611 completely specify what to do. For example, they may choose a coding

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

612 system such as @code{undefined} which leaves the character code

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

613 conversion to be determined from the data. In these cases, the I/O

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

614 operation finishes the job of choosing a coding system. Very often

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

615 you will want to find out afterwards which coding system was chosen.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

616

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

617 @defvar buffer-file-coding-system

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

618 This variable records the coding system that was used for visiting the

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

619 current buffer. It is used for saving the buffer, and for writing part

43632

faa7540b3866 (Encoding and I/O): Mention select-safe-coding-system in the documentation

Eli Zaretskii <eliz@gnu.org>

parents: 40855

diff changeset

620 of the buffer with @code{write-region}. If the text to be written

faa7540b3866 (Encoding and I/O): Mention select-safe-coding-system in the documentation

Eli Zaretskii <eliz@gnu.org>

parents: 40855

diff changeset

621 cannot be safely encoded using the coding system specified by this

faa7540b3866 (Encoding and I/O): Mention select-safe-coding-system in the documentation

Eli Zaretskii <eliz@gnu.org>

parents: 40855

diff changeset

622 variable, these operations select an alternative encoding by calling

faa7540b3866 (Encoding and I/O): Mention select-safe-coding-system in the documentation

Eli Zaretskii <eliz@gnu.org>

parents: 40855

diff changeset

623 the function @code{select-safe-coding-system} (@pxref{User-Chosen

faa7540b3866 (Encoding and I/O): Mention select-safe-coding-system in the documentation

Eli Zaretskii <eliz@gnu.org>

parents: 40855

diff changeset

624 Coding Systems}). If selecting a different encoding requires to ask

faa7540b3866 (Encoding and I/O): Mention select-safe-coding-system in the documentation

Eli Zaretskii <eliz@gnu.org>

parents: 40855

diff changeset

625 the user to specify a coding system, @code{buffer-file-coding-system}

faa7540b3866 (Encoding and I/O): Mention select-safe-coding-system in the documentation

Eli Zaretskii <eliz@gnu.org>

parents: 40855

diff changeset

626 is updated to the newly selected coding system.

24951

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

627

43632

faa7540b3866 (Encoding and I/O): Mention select-safe-coding-system in the documentation

Eli Zaretskii <eliz@gnu.org>

parents: 40855

diff changeset

628 @code{buffer-file-coding-system} does @emph{not} affect sending text

24951

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

629 to a subprocess.

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

630 @end defvar

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

631

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

632 @defvar save-buffer-coding-system

29265

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

633 This variable specifies the coding system for saving the buffer (by

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

634 overriding @code{buffer-file-coding-system}). Note that it is not used

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

635 for @code{write-region}.

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

636

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

637 When a command to save the buffer starts out to use

29265

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

638 @code{buffer-file-coding-system} (or @code{save-buffer-coding-system}),

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

639 and that coding system cannot handle

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

640 the actual text in the buffer, the command asks the user to choose

43632

faa7540b3866 (Encoding and I/O): Mention select-safe-coding-system in the documentation

Eli Zaretskii <eliz@gnu.org>

parents: 40855

diff changeset

641 another coding system (by calling @code{select-safe-coding-system}).

faa7540b3866 (Encoding and I/O): Mention select-safe-coding-system in the documentation

Eli Zaretskii <eliz@gnu.org>

parents: 40855

diff changeset

642 After that happens, the command also updates

faa7540b3866 (Encoding and I/O): Mention select-safe-coding-system in the documentation

Eli Zaretskii <eliz@gnu.org>

parents: 40855

diff changeset

643 @code{buffer-file-coding-system} to represent the coding system that

faa7540b3866 (Encoding and I/O): Mention select-safe-coding-system in the documentation

Eli Zaretskii <eliz@gnu.org>

parents: 40855

diff changeset

644 the user specified.

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

645 @end defvar

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

646

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

647 @defvar last-coding-system-used

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

648 I/O operations for files and subprocesses set this variable to the

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

649 coding system name that was used. The explicit encoding and decoding

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

650 functions (@pxref{Explicit Encoding}) set it too.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

651

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

652 @strong{Warning:} Since receiving subprocess output sets this variable,

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

653 it can change whenever Emacs waits; therefore, you should copy the

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

654 value shortly after the function call that stores the value you are

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

655 interested in.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

656 @end defvar

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

657

23110

0d84817a4973 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22267

diff changeset

658 The variable @code{selection-coding-system} specifies how to encode

0d84817a4973 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22267

diff changeset

659 selections for the window system. @xref{Window System Selections}.

0d84817a4973 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22267

diff changeset

660

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

661 @node Lisp and Coding Systems

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

662 @subsection Coding Systems in Lisp

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

663

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

664 Here are the Lisp facilities for working with coding systems:

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

665

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

666 @defun coding-system-list &optional base-only

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

667 This function returns a list of all coding system names (symbols). If

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

668 @var{base-only} is non-@code{nil}, the value includes only the

29265

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

669 base coding systems. Otherwise, it includes alias and variant coding

69f20c18d6eb *** empty log message ***

Kenichi Handa <handa@m17n.org>

parents: 28900

diff changeset

670 systems as well.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

671 @end defun

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

672

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

673 @defun coding-system-p object

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

674 This function returns @code{t} if @var{object} is a coding system

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

675 name.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

676 @end defun

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

677

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

678 @defun check-coding-system coding-system

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

679 This function checks the validity of @var{coding-system}.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

680 If that is valid, it returns @var{coding-system}.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

681 Otherwise it signals an error with condition @code{coding-system-error}.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

682 @end defun

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

683

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

684 @defun coding-system-change-eol-conversion coding-system eol-type

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

685 This function returns a coding system which is like @var{coding-system}

22252

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

686 except for its eol conversion, which is specified by @code{eol-type}.

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

687 @var{eol-type} should be @code{unix}, @code{dos}, @code{mac}, or

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

688 @code{nil}. If it is @code{nil}, the returned coding system determines

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

689 the end-of-line conversion from the data.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

690 @end defun

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

691

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

692 @defun coding-system-change-text-conversion eol-coding text-coding

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

693 This function returns a coding system which uses the end-of-line

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

694 conversion of @var{eol-coding}, and the text conversion of

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

695 @var{text-coding}. If @var{text-coding} is @code{nil}, it returns

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

696 @code{undecided}, or one of its variants according to @var{eol-coding}.

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

697 @end defun

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

698

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

699 @defun find-coding-systems-region from to

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

700 This function returns a list of coding systems that could be used to

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

701 encode a text between @var{from} and @var{to}. All coding systems in

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

702 the list can safely encode any multibyte characters in that portion of

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

703 the text.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

704

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

705 If the text contains no multibyte characters, the function returns the

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

706 list @code{(undecided)}.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

707 @end defun

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

708

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

709 @defun find-coding-systems-string string

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

710 This function returns a list of coding systems that could be used to

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

711 encode the text of @var{string}. All coding systems in the list can

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

712 safely encode any multibyte characters in @var{string}. If the text

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

713 contains no multibyte characters, this returns the list

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

714 @code{(undecided)}.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

715 @end defun

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

716

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

717 @defun find-coding-systems-for-charsets charsets

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

718 This function returns a list of coding systems that could be used to

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

719 encode all the character sets in the list @var{charsets}.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

720 @end defun

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

721

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

722 @defun detect-coding-region start end &optional highest

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

723 This function chooses a plausible coding system for decoding the text

28877

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

724 from @var{start} to @var{end}. This text should be a byte sequence

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

725 (@pxref{Explicit Encoding}).

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

726

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

727 Normally this function returns a list of coding systems that could

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

728 handle decoding the text that was scanned. They are listed in order of

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

729 decreasing priority. But if @var{highest} is non-@code{nil}, then the

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

730 return value is just one coding system, the one that is highest in

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

731 priority.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

732

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

733 If the region contains only @sc{ascii} characters, the value

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

734 is @code{undecided} or @code{(undecided)}.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

735 @end defun

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

736

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

737 @defun detect-coding-string string highest

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

738 This function is like @code{detect-coding-region} except that it

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

739 operates on the contents of @var{string} instead of bytes in the buffer.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

740 @end defun

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

741

22252

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

742 @xref{Process Information}, for how to examine or set the coding

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

743 systems used for I/O to a subprocess.

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

744

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

745 @node User-Chosen Coding Systems

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

746 @subsection User-Chosen Coding Systems

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

747

43632

faa7540b3866 (Encoding and I/O): Mention select-safe-coding-system in the documentation

Eli Zaretskii <eliz@gnu.org>

parents: 40855

diff changeset

748 @cindex select safe coding system

39204

8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document

Eli Zaretskii <eliz@gnu.org>

parents: 35752

diff changeset

749 @defun select-safe-coding-system from to &optional default-coding-system accept-default-p

8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document

Eli Zaretskii <eliz@gnu.org>

parents: 35752

diff changeset

750 This function selects a coding system for encoding specified text,

8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document

Eli Zaretskii <eliz@gnu.org>

parents: 35752

diff changeset

751 asking the user to choose if necessary. Normally the specified text

8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document

Eli Zaretskii <eliz@gnu.org>

parents: 35752

diff changeset

752 is the text in the current buffer between @var{from} and @var{to},

8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document

Eli Zaretskii <eliz@gnu.org>

parents: 35752

diff changeset

753 defaulting to the whole buffer if they are @code{nil}. If @var{from}

40855

0dddc8f93861 Minor cleanup.

Richard M. Stallman <rms@gnu.org>

parents: 40834

diff changeset

754 is a string, the string specifies the text to encode, and @var{to} is

0dddc8f93861 Minor cleanup.

Richard M. Stallman <rms@gnu.org>

parents: 40834

diff changeset

755 ignored.

39204

8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document

Eli Zaretskii <eliz@gnu.org>

parents: 35752

diff changeset

756

8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document

Eli Zaretskii <eliz@gnu.org>

parents: 35752

diff changeset

757 If @var{default-coding-system} is non-@code{nil}, that is the first

8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document

Eli Zaretskii <eliz@gnu.org>

parents: 35752

diff changeset

758 coding system to try; if that can handle the text,

8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document

Eli Zaretskii <eliz@gnu.org>

parents: 35752

diff changeset

759 @code{select-safe-coding-system} returns that coding system. It can

8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document

Eli Zaretskii <eliz@gnu.org>

parents: 35752

diff changeset

760 also be a list of coding systems; then the function tries each of them

8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document

Eli Zaretskii <eliz@gnu.org>

parents: 35752

diff changeset

761 one by one. After trying all of them, it next tries the user's most

8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document

Eli Zaretskii <eliz@gnu.org>

parents: 35752

diff changeset

762 preferred coding system (@pxref{Recognize Coding,

8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document

Eli Zaretskii <eliz@gnu.org>

parents: 35752

diff changeset

763 prefer-coding-system, the description of @code{prefer-coding-system},

8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document

Eli Zaretskii <eliz@gnu.org>

parents: 35752

diff changeset

764 emacs, GNU Emacs Manual}), and after that the current buffer's value

8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document

Eli Zaretskii <eliz@gnu.org>

parents: 35752

diff changeset

765 of @code{buffer-file-coding-system} (if it is not @code{undecided}).

22252

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

766

39204

8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document

Eli Zaretskii <eliz@gnu.org>

parents: 35752

diff changeset

767 If one of those coding systems can safely encode all the specified

8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document

Eli Zaretskii <eliz@gnu.org>

parents: 35752

diff changeset

768 text, @code{select-safe-coding-system} chooses it and returns it.

8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document

Eli Zaretskii <eliz@gnu.org>

parents: 35752

diff changeset

769 Otherwise, it asks the user to choose from a list of coding systems

8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document

Eli Zaretskii <eliz@gnu.org>

parents: 35752

diff changeset

770 which can encode all the text, and returns the user's choice.

22252

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

771

39204

8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document

Eli Zaretskii <eliz@gnu.org>

parents: 35752

diff changeset

772 The optional argument @var{accept-default-p}, if non-@code{nil},

8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document

Eli Zaretskii <eliz@gnu.org>

parents: 35752

diff changeset

773 should be a function to determine whether the coding system selected

8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document

Eli Zaretskii <eliz@gnu.org>

parents: 35752

diff changeset

774 without user interaction is acceptable. If this function returns

8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document

Eli Zaretskii <eliz@gnu.org>

parents: 35752

diff changeset

775 @code{nil}, the silently selected coding system is rejected, and the

8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document

Eli Zaretskii <eliz@gnu.org>

parents: 35752

diff changeset

776 user is asked to select a coding system from a list of possible

8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document

Eli Zaretskii <eliz@gnu.org>

parents: 35752

diff changeset

777 candidates.

22252

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

778

39204

8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document

Eli Zaretskii <eliz@gnu.org>

parents: 35752

diff changeset

779 @vindex select-safe-coding-system-accept-default-p

8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document

Eli Zaretskii <eliz@gnu.org>

parents: 35752

diff changeset

780 If the variable @code{select-safe-coding-system-accept-default-p} is

8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document

Eli Zaretskii <eliz@gnu.org>

parents: 35752

diff changeset

781 non-@code{nil}, its value overrides the value of

8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document

Eli Zaretskii <eliz@gnu.org>

parents: 35752

diff changeset

782 @var{accept-default-p}.

22252

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

783 @end defun

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

784

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

785 Here are two functions you can use to let the user specify a coding

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

786 system, with completion. @xref{Completion}.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

787

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

788 @defun read-coding-system prompt &optional default

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

789 This function reads a coding system using the minibuffer, prompting with

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

790 string @var{prompt}, and returns the coding system name as a symbol. If

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

791 the user enters null input, @var{default} specifies which coding system

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

792 to return. It should be a symbol or a string.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

793 @end defun

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

794

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

795 @defun read-non-nil-coding-system prompt

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

796 This function reads a coding system using the minibuffer, prompting with

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

797 string @var{prompt}, and returns the coding system name as a symbol. If

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

798 the user tries to enter null input, it asks the user to try again.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

799 @xref{Coding Systems}.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

800 @end defun

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

801

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

802 @node Default Coding Systems

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

803 @subsection Default Coding Systems

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

804

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

805 This section describes variables that specify the default coding

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

806 system for certain files or when running certain subprograms, and the

22252

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

807 function that I/O operations use to access them.

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

808

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

809 The idea of these variables is that you set them once and for all to the

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

810 defaults you want, and then do not change them again. To specify a

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

811 particular coding system for a particular operation in a Lisp program,

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

812 don't change these variables; instead, override them using

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

813 @code{coding-system-for-read} and @code{coding-system-for-write}

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

814 (@pxref{Specifying Coding Systems}).

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

815

39204

8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document

Eli Zaretskii <eliz@gnu.org>

parents: 35752

diff changeset

816 @defvar auto-coding-regexp-alist

8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document

Eli Zaretskii <eliz@gnu.org>

parents: 35752

diff changeset

817 This variable is an alist of text patterns and corresponding coding

8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document

Eli Zaretskii <eliz@gnu.org>

parents: 35752

diff changeset

818 systems. Each element has the form @code{(@var{regexp}

8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document

Eli Zaretskii <eliz@gnu.org>

parents: 35752

diff changeset

819 . @var{coding-system})}; a file whose first few kilobytes match

8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document

Eli Zaretskii <eliz@gnu.org>

parents: 35752

diff changeset

820 @var{regexp} is decoded with @var{coding-system} when its contents are

8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document

Eli Zaretskii <eliz@gnu.org>

parents: 35752

diff changeset

821 read into a buffer. The settings in this alist take priority over

8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document

Eli Zaretskii <eliz@gnu.org>

parents: 35752

diff changeset

822 @code{coding:} tags in the files and the contents of

8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document

Eli Zaretskii <eliz@gnu.org>

parents: 35752

diff changeset

823 @code{file-coding-system-alist} (see below). The default value is set

8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document

Eli Zaretskii <eliz@gnu.org>

parents: 35752

diff changeset

824 so that Emacs automatically recognizes mail files in Babyl format and

8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document

Eli Zaretskii <eliz@gnu.org>

parents: 35752

diff changeset

825 reads them with no code conversions.

8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document

Eli Zaretskii <eliz@gnu.org>

parents: 35752

diff changeset

826 @end defvar

8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document

Eli Zaretskii <eliz@gnu.org>

parents: 35752

diff changeset

827

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

828 @defvar file-coding-system-alist

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

829 This variable is an alist that specifies the coding systems to use for

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

830 reading and writing particular files. Each element has the form

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

831 @code{(@var{pattern} . @var{coding})}, where @var{pattern} is a regular

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

832 expression that matches certain file names. The element applies to file

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

833 names that match @var{pattern}.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

834

22252

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

835 The @sc{cdr} of the element, @var{coding}, should be either a coding

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

836 system, a cons cell containing two coding systems, or a function name (a

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

837 symbol with a function definition). If @var{coding} is a coding system,

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

838 that coding system is used for both reading the file and writing it. If

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

839 @var{coding} is a cons cell containing two coding systems, its @sc{car}

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

840 specifies the coding system for decoding, and its @sc{cdr} specifies the

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

841 coding system for encoding.

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

842

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

843 If @var{coding} is a function name, the function must return a coding

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

844 system or a cons cell containing two coding systems. This value is used

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

845 as described above.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

846 @end defvar

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

847

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

848 @defvar process-coding-system-alist

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

849 This variable is an alist specifying which coding systems to use for a

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

850 subprocess, depending on which program is running in the subprocess. It

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

851 works like @code{file-coding-system-alist}, except that @var{pattern} is

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

852 matched against the program name used to start the subprocess. The coding

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

853 system or systems specified in this alist are used to initialize the

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

854 coding systems used for I/O to the subprocess, but you can specify

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

855 other coding systems later using @code{set-process-coding-system}.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

856 @end defvar

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

857

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

858 @strong{Warning:} Coding systems such as @code{undecided}, which

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

859 determine the coding system from the data, do not work entirely reliably

22252

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

860 with asynchronous subprocess output. This is because Emacs handles

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

861 asynchronous subprocess output in batches, as it arrives. If the coding

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

862 system leaves the character code conversion unspecified, or leaves the

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

863 end-of-line conversion unspecified, Emacs must try to detect the proper

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

864 conversion from one batch at a time, and this does not always work.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

865

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

866 Therefore, with an asynchronous subprocess, if at all possible, use a

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

867 coding system which determines both the character code conversion and

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

868 the end of line conversion---that is, one like @code{latin-1-unix},

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

869 rather than @code{undecided} or @code{latin-1}.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

870

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

871 @defvar network-coding-system-alist

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

872 This variable is an alist that specifies the coding system to use for

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

873 network streams. It works much like @code{file-coding-system-alist},

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

874 with the difference that the @var{pattern} in an element may be either a

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

875 port number or a regular expression. If it is a regular expression, it

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

876 is matched against the network service name used to open the network

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

877 stream.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

878 @end defvar

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

879

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

880 @defvar default-process-coding-system

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

881 This variable specifies the coding systems to use for subprocess (and

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

882 network stream) input and output, when nothing else specifies what to

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

883 do.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

884

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

885 The value should be a cons cell of the form @code{(@var{input-coding}

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

886 . @var{output-coding})}. Here @var{input-coding} applies to input from

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

887 the subprocess, and @var{output-coding} applies to output to it.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

888 @end defvar

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

889

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

890 @defun find-operation-coding-system operation &rest arguments

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

891 This function returns the coding system to use (by default) for

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

892 performing @var{operation} with @var{arguments}. The value has this

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

893 form:

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

894

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

895 @example

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

896 (@var{decoding-system} @var{encoding-system})

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

897 @end example

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

898

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

899 The first element, @var{decoding-system}, is the coding system to use

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

900 for decoding (in case @var{operation} does decoding), and

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

901 @var{encoding-system} is the coding system for encoding (in case

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

902 @var{operation} does encoding).

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

903

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

904 The argument @var{operation} should be a symbol, one of

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

905 @code{insert-file-contents}, @code{write-region}, @code{call-process},

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

906 @code{call-process-region}, @code{start-process}, or

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

907 @code{open-network-stream}. These are the names of the Emacs I/O primitives

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

908 that can do coding system conversion.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

909

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

910 The remaining arguments should be the same arguments that might be given

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

911 to that I/O primitive. Depending on the primitive, one of those

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

912 arguments is selected as the @dfn{target}. For example, if

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

913 @var{operation} does file I/O, whichever argument specifies the file

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

914 name is the target. For subprocess primitives, the process name is the

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

915 target. For @code{open-network-stream}, the target is the service name

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

916 or port number.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

917

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

918 This function looks up the target in @code{file-coding-system-alist},

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

919 @code{process-coding-system-alist}, or

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

920 @code{network-coding-system-alist}, depending on @var{operation}.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

921 @xref{Default Coding Systems}.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

922 @end defun

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

923

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

924 @node Specifying Coding Systems

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

925 @subsection Specifying a Coding System for One Operation

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

926

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

927 You can specify the coding system for a specific operation by binding

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

928 the variables @code{coding-system-for-read} and/or

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

929 @code{coding-system-for-write}.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

930

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

931 @defvar coding-system-for-read

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

932 If this variable is non-@code{nil}, it specifies the coding system to

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

933 use for reading a file, or for input from a synchronous subprocess.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

934

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

935 It also applies to any asynchronous subprocess or network stream, but in

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

936 a different way: the value of @code{coding-system-for-read} when you

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

937 start the subprocess or open the network stream specifies the input

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

938 decoding method for that subprocess or network stream. It remains in

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

939 use for that subprocess or network stream unless and until overridden.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

940

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

941 The right way to use this variable is to bind it with @code{let} for a

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

942 specific I/O operation. Its global value is normally @code{nil}, and

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

943 you should not globally set it to any other value. Here is an example

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

944 of the right way to use the variable:

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

945

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

946 @example

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

947 ;; @r{Read the file with no character code conversion.}

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

948 ;; @r{Assume @sc{crlf} represents end-of-line.}

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

949 (let ((coding-system-for-write 'emacs-mule-dos))

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

950 (insert-file-contents filename))

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

951 @end example

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

952

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

953 When its value is non-@code{nil}, @code{coding-system-for-read} takes

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

954 precedence over all other methods of specifying a coding system to use for

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

955 input, including @code{file-coding-system-alist},

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

956 @code{process-coding-system-alist} and

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

957 @code{network-coding-system-alist}.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

958 @end defvar

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

959

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

960 @defvar coding-system-for-write

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

961 This works much like @code{coding-system-for-read}, except that it

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

962 applies to output rather than input. It affects writing to files,

24951

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

963 as well as sending output to subprocesses and net connections.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

964

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

965 When a single operation does both input and output, as do

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

966 @code{call-process-region} and @code{start-process}, both

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

967 @code{coding-system-for-read} and @code{coding-system-for-write}

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

968 affect it.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

969 @end defvar

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

970

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

971 @defvar inhibit-eol-conversion

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

972 When this variable is non-@code{nil}, no end-of-line conversion is done,

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

973 no matter which coding system is specified. This applies to all the

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

974 Emacs I/O and subprocess primitives, and to the explicit encoding and

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

975 decoding functions (@pxref{Explicit Encoding}).

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

976 @end defvar

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

977

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

978 @node Explicit Encoding

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

979 @subsection Explicit Encoding and Decoding

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

980 @cindex encoding text

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

981 @cindex decoding text

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

982

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

983 All the operations that transfer text in and out of Emacs have the

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

984 ability to use a coding system to encode or decode the text.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

985 You can also explicitly encode and decode text using the functions

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

986 in this section.

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

987

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

988 The result of encoding, and the input to decoding, are not ordinary

28877

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

989 text. They logically consist of a series of byte values; that is, a

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

990 series of characters whose codes are in the range 0 through 255. In a

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

991 multibyte buffer or string, character codes 128 through 159 are

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

992 represented by multibyte sequences, but this is invisible to Lisp

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

993 programs.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

994

28877

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

995 The usual way to read a file into a buffer as a sequence of bytes, so

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

996 you can decode the contents explicitly, is with

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

997 @code{insert-file-contents-literally} (@pxref{Reading from Files});

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

998 alternatively, specify a non-@code{nil} @var{rawfile} argument when

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

999 visiting a file with @code{find-file-noselect}. These methods result in

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

1000 a unibyte buffer.

24951

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

1001

28877

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

1002 The usual way to use the byte sequence that results from explicitly

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

1003 encoding text is to copy it to a file or process---for example, to write

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

1004 it with @code{write-region} (@pxref{Writing to Files}), and suppress

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

1005 encoding by binding @code{coding-system-for-write} to

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

1006 @code{no-conversion}.

24951

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

1007

7451b1458af1 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 23433

diff changeset

1008 Here are the functions to perform explicit encoding or decoding. The

28877

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

1009 decoding functions produce sequences of bytes; the encoding functions

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

1010 are meant to operate on sequences of bytes. All of these functions

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

1011 discard text properties.

22252

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

1012

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1013 @defun encode-coding-region start end coding-system

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

1014 This function encodes the text from @var{start} to @var{end} according

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1015 to coding system @var{coding-system}. The encoded text replaces the

28877

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

1016 original text in the buffer. The result of encoding is logically a

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

1017 sequence of bytes, but the buffer remains multibyte if it was multibyte

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

1018 before.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

1019 @end defun

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

1020

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1021 @defun encode-coding-string string coding-system

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

1022 This function encodes the text in @var{string} according to coding

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

1023 system @var{coding-system}. It returns a new string containing the

28877

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

1024 encoded text. The result of encoding is a unibyte string.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

1025 @end defun

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

1026

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1027 @defun decode-coding-region start end coding-system

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

1028 This function decodes the text from @var{start} to @var{end} according

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

1029 to coding system @var{coding-system}. The decoded text replaces the

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

1030 original text in the buffer. To make explicit decoding useful, the text

28877

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

1031 before decoding ought to be a sequence of byte values, but both

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

1032 multibyte and unibyte buffers are acceptable.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

1033 @end defun

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

1034

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1035 @defun decode-coding-string string coding-system

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

1036 This function decodes the text in @var{string} according to coding

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

1037 system @var{coding-system}. It returns a new string containing the

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

1038 decoded text. To make explicit decoding useful, the contents of

28877

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

1039 @var{string} ought to be a sequence of byte values, but a multibyte

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

1040 string is acceptable.

21006

00022857f529 Initial revision

Richard M. Stallman <rms@gnu.org>

parents:

diff changeset

1041 @end defun

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1042

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1043 @node Terminal I/O Encoding

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1044 @subsection Terminal I/O Encoding

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1045

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1046 Emacs can decode keyboard input using a coding system, and encode

23110

0d84817a4973 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22267

diff changeset

1047 terminal output. This is useful for terminals that transmit or display

0d84817a4973 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22267

diff changeset

1048 text using a particular encoding such as Latin-1. Emacs does not set

0d84817a4973 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22267

diff changeset

1049 @code{last-coding-system-used} for encoding or decoding for the

0d84817a4973 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22267

diff changeset

1050 terminal.

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1051

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1052 @defun keyboard-coding-system

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1053 This function returns the coding system that is in use for decoding

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1054 keyboard input---or @code{nil} if no coding system is to be used.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1055 @end defun

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1056

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1057 @defun set-keyboard-coding-system coding-system

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1058 This function specifies @var{coding-system} as the coding system to

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1059 use for decoding keyboard input. If @var{coding-system} is @code{nil},

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1060 that means do not decode keyboard input.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1061 @end defun

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1062

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1063 @defun terminal-coding-system

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1064 This function returns the coding system that is in use for encoding

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1065 terminal output---or @code{nil} for no encoding.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1066 @end defun

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1067

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1068 @defun set-terminal-coding-system coding-system

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1069 This function specifies @var{coding-system} as the coding system to use

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1070 for encoding terminal output. If @var{coding-system} is @code{nil},

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1071 that means do not encode terminal output.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1072 @end defun

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1073

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1074 @node MS-DOS File Types

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1075 @subsection MS-DOS File Types

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1076 @cindex DOS file types

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1077 @cindex MS-DOS file types

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1078 @cindex Windows file types

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1079 @cindex file types on MS-DOS and Windows

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1080 @cindex text files and binary files

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1081 @cindex binary files and text files

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1082

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

1083 On MS-DOS and Microsoft Windows, Emacs guesses the appropriate

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

1084 end-of-line conversion for a file by looking at the file's name. This

28877

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

1085 feature classifies files as @dfn{text files} and @dfn{binary files}. By

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

1086 ``binary file'' we mean a file of literal byte values that are not

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

1087 necessarily meant to be characters; Emacs does no end-of-line conversion

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

1088 and no character code conversion for them. On the other hand, the bytes

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

1089 in a text file are intended to represent characters; when you create a

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

1090 new file whose name implies that it is a text file, Emacs uses DOS

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

1091 end-of-line conversion.

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1092

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1093 @defvar buffer-file-type

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1094 This variable, automatically buffer-local in each buffer, records the

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1095 file type of the buffer's visited file. When a buffer does not specify

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1096 a coding system with @code{buffer-file-coding-system}, this variable is

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1097 used to determine which coding system to use when writing the contents

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1098 of the buffer. It should be @code{nil} for text, @code{t} for binary.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1099 If it is @code{t}, the coding system is @code{no-conversion}.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1100 Otherwise, @code{undecided-dos} is used.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1101

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1102 Normally this variable is set by visiting a file; it is set to

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1103 @code{nil} if the file was visited without any actual conversion.

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1104 @end defvar

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1105

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1106 @defopt file-name-buffer-file-type-alist

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1107 This variable holds an alist for recognizing text and binary files.

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1108 Each element has the form (@var{regexp} . @var{type}), where

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1109 @var{regexp} is matched against the file name, and @var{type} may be

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1110 @code{nil} for text, @code{t} for binary, or a function to call to

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1111 compute which. If it is a function, then it is called with a single

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1112 argument (the file name) and should return @code{t} or @code{nil}.

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1113

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

1114 When running on MS-DOS or MS-Windows, Emacs checks this alist to decide

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1115 which coding system to use when reading a file. For a text file,

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1116 @code{undecided-dos} is used. For a binary file, @code{no-conversion}

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1117 is used.

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1118

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1119 If no element in this alist matches a given file name, then

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1120 @code{default-buffer-file-type} says how to treat the file.

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1121 @end defopt

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1122

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1123 @defopt default-buffer-file-type

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1124 This variable says how to handle files for which

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1125 @code{file-name-buffer-file-type-alist} says nothing about the type.

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1126

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1127 If this variable is non-@code{nil}, then these files are treated as

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1128 binary: the coding system @code{no-conversion} is used. Otherwise,

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1129 nothing special is done for them---the coding system is deduced solely

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1130 from the file contents, in the usual Emacs fashion.

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1131 @end defopt

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1132

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1133 @node Input Methods

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1134 @section Input Methods

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1135 @cindex input methods

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1136

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

1137 @dfn{Input methods} provide convenient ways of entering non-@sc{ascii}

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1138 characters from the keyboard. Unlike coding systems, which translate

25751

467b88fab665 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 24952

diff changeset

1139 non-@sc{ascii} characters to and from encodings meant to be read by

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1140 programs, input methods provide human-friendly commands. (@xref{Input

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1141 Methods,,, emacs, The GNU Emacs Manual}, for information on how users

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1142 use input methods to enter text.) How to define input methods is not

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1143 yet documented in this manual, but here we describe how to use them.

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1144

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1145 Each input method has a name, which is currently a string;

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1146 in the future, symbols may also be usable as input method names.

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1147

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1148 @defvar current-input-method

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1149 This variable holds the name of the input method now active in the

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1150 current buffer. (It automatically becomes local in each buffer when set

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1151 in any fashion.) It is @code{nil} if no input method is active in the

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1152 buffer now.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1153 @end defvar

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1154

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1155 @defvar default-input-method

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1156 This variable holds the default input method for commands that choose an

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1157 input method. Unlike @code{current-input-method}, this variable is

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1158 normally global.

21682

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1159 @end defvar

90da2489c498 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21006

diff changeset

1160

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1161 @defun set-input-method input-method

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1162 This function activates input method @var{input-method} for the current

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1163 buffer. It also sets @code{default-input-method} to @var{input-method}.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1164 If @var{input-method} is @code{nil}, this function deactivates any input

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1165 method for the current buffer.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1166 @end defun

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1167

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1168 @defun read-input-method-name prompt &optional default inhibit-null

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1169 This function reads an input method name with the minibuffer, prompting

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1170 with @var{prompt}. If @var{default} is non-@code{nil}, that is returned

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1171 by default, if the user enters empty input. However, if

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1172 @var{inhibit-null} is non-@code{nil}, empty input signals an error.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1173

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1174 The returned value is a string.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1175 @end defun

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1176

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1177 @defvar input-method-alist

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1178 This variable defines all the supported input methods.

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1179 Each element defines one input method, and should have the form:

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1180

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1181 @example

22252

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

1182 (@var{input-method} @var{language-env} @var{activate-func}

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

1183 @var{title} @var{description} @var{args}...)

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1184 @end example

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1185

22252

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

1186 Here @var{input-method} is the input method name, a string;

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

1187 @var{language-env} is another string, the name of the language

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

1188 environment this input method is recommended for. (That serves only for

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

1189 documentation purposes.)

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1190

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1191 @var{activate-func} is a function to call to activate this method. The

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1192 @var{args}, if any, are passed as arguments to @var{activate-func}. All

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1193 told, the arguments to @var{activate-func} are @var{input-method} and

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1194 the @var{args}.

28877

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

1195

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

1196 @var{title} is a string to display in the mode line while this method is

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

1197 active. @var{description} is a string describing this method and what

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

1198 it is good for.

22252

40089afa2b1d *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22138

diff changeset

1199 @end defvar

22138

d4ac295a98b3 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 21682

diff changeset

1200

23110

0d84817a4973 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22267

diff changeset

1201 The fundamental interface to input methods is through the

0d84817a4973 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 22267

diff changeset

1202 variable @code{input-method-function}. @xref{Reading One Event}.

26696

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1203

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1204 @node Locales

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1205 @section Locales

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1206 @cindex locale

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1207

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1208 POSIX defines a concept of ``locales'' which control which language

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1209 to use in language-related features. These Emacs variables control

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1210 how Emacs interacts with these features.

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1211

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1212 @defvar locale-coding-system

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1213 @tindex locale-coding-system

43634

f55024232f5d (Locales): locale-coding-system is used for decoding keyboard input on X.

Eli Zaretskii <eliz@gnu.org>

parents: 43632

diff changeset

1214 @cindex keyboard input decoding on X

26696

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1215 This variable specifies the coding system to use for decoding system

43634

f55024232f5d (Locales): locale-coding-system is used for decoding keyboard input on X.

Eli Zaretskii <eliz@gnu.org>

parents: 43632

diff changeset

1216 error messages and---on X Window system only---keyboard input, for

f55024232f5d (Locales): locale-coding-system is used for decoding keyboard input on X.

Eli Zaretskii <eliz@gnu.org>

parents: 43632

diff changeset

1217 encoding the format argument to @code{format-time-string}, and for

f55024232f5d (Locales): locale-coding-system is used for decoding keyboard input on X.

Eli Zaretskii <eliz@gnu.org>

parents: 43632

diff changeset

1218 decoding the return value of @code{format-time-string}.

26696

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1219 @end defvar

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1220

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1221 @defvar system-messages-locale

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1222 @tindex system-messages-locale

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1223 This variable specifies the locale to use for generating system error

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1224 messages. Changing the locale can cause messages to come out in a

27362

ce0641caaa76 *** empty log message ***

Richard M. Stallman <rms@gnu.org>

parents: 27189

diff changeset

1225 different language or in a different orthography. If the variable is

26696

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1226 @code{nil}, the locale is specified by environment variables in the

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1227 usual POSIX fashion.

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1228 @end defvar

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1229

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1230 @defvar system-time-locale

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1231 @tindex system-time-locale

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1232 This variable specifies the locale to use for formatting time values.

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1233 Changing the locale can cause messages to appear according to the

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1234 conventions of a different language. If the variable is @code{nil}, the

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1235 locale is specified by environment variables in the usual POSIX fashion.

ef5e7bbe6f19 Current version from /gd/gnu/elisp.

Dave Love <fx@gnu.org>

parents: 25751

diff changeset

1236 @end defvar

28877

607e317d50b5 *** empty log message ***

Gerd Moellmann <gerd@gnu.org>

parents: 28635

diff changeset

1237

Mercurial > emacs

annotate lispref/nonascii.texi @ 47181:e17812b1a993