annotate lispref/nonascii.texi @ 25427:dde5fcbfa2af

(Info-tagify): Don't insert more than one newline before the tag table. (Info-tagify): Start by widening. Match node headers that don't list the file name, and more kinds of page separations. Strip properties during tagification. Use start of node header line as tag's position. Fix the "done" message. (Info-validate): Save and restore match data around narrowing down.
author Richard M. Stallman <rms@gnu.org>
date Sun, 29 Aug 1999 19:19:00 +0000
parents a6db4671c7a0
children 467b88fab665
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1 @c -*-texinfo-*-
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
2 @c This is part of the GNU Emacs Lisp Reference Manual.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
3 @c Copyright (C) 1998 Free Software Foundation, Inc.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
4 @c See the file elisp.texi for copying conditions.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
5 @setfilename ../info/characters
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
6 @node Non-ASCII Characters, Searching and Matching, Text, Top
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
7 @chapter Non-ASCII Characters
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
8 @cindex multibyte characters
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
9 @cindex non-ASCII characters
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
10
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
11 This chapter covers the special issues relating to non-@sc{ASCII}
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
12 characters and how they are stored in strings and buffers.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
13
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
14 @menu
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
15 * Text Representations::
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
16 * Converting Representations::
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
17 * Selecting a Representation::
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
18 * Character Codes::
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
19 * Character Sets::
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
20 * Chars and Bytes::
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
21 * Splitting Characters::
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
22 * Scanning Charsets::
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
23 * Translation of Characters::
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
24 * Coding Systems::
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
25 * Input Methods::
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
26 @end menu
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
27
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
28 @node Text Representations
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
29 @section Text Representations
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
30 @cindex text representations
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
31
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
32 Emacs has two @dfn{text representations}---two ways to represent text
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
33 in a string or buffer. These are called @dfn{unibyte} and
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
34 @dfn{multibyte}. Each string, and each buffer, uses one of these two
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
35 representations. For most purposes, you can ignore the issue of
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
36 representations, because Emacs converts text between them as
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
37 appropriate. Occasionally in Lisp programming you will need to pay
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
38 attention to the difference.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
39
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
40 @cindex unibyte text
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
41 In unibyte representation, each character occupies one byte and
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
42 therefore the possible character codes range from 0 to 255. Codes 0
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
43 through 127 are @sc{ASCII} characters; the codes from 128 through 255
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
44 are used for one non-@sc{ASCII} character set (you can choose which
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
45 character set by setting the variable @code{nonascii-insert-offset}).
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
46
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
47 @cindex leading code
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
48 @cindex multibyte text
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
49 @cindex trailing codes
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
50 In multibyte representation, a character may occupy more than one
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
51 byte, and as a result, the full range of Emacs character codes can be
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
52 stored. The first byte of a multibyte character is always in the range
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
53 128 through 159 (octal 0200 through 0237). These values are called
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
54 @dfn{leading codes}. The second and subsequent bytes of a multibyte
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
55 character are always in the range 160 through 255 (octal 0240 through
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
56 0377); these values are @dfn{trailing codes}.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
57
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
58 Some sequences of bytes do not form meaningful multibyte characters:
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
59 for example, a single isolated byte in the range 128 through 255 is
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
60 never meaningful. Such byte sequences are not entirely valid, and never
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
61 appear in proper multibyte text (since that consists of a sequence of
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
62 @emph{characters}); but they can appear as part of ``raw bytes''
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
63 (@pxref{Explicit Encoding}).
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
64
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
65 In a buffer, the buffer-local value of the variable
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
66 @code{enable-multibyte-characters} specifies the representation used.
24952
a6db4671c7a0 *** empty log message ***
Karl Heuer <kwzh@gnu.org>
parents: 24951
diff changeset
67 The representation for a string is determined and recorded in the string
a6db4671c7a0 *** empty log message ***
Karl Heuer <kwzh@gnu.org>
parents: 24951
diff changeset
68 when the string is constructed.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
69
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
70 @defvar enable-multibyte-characters
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
71 @tindex enable-multibyte-characters
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
72 This variable specifies the current buffer's text representation.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
73 If it is non-@code{nil}, the buffer contains multibyte text; otherwise,
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
74 it contains unibyte text.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
75
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
76 You cannot set this variable directly; instead, use the function
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
77 @code{set-buffer-multibyte} to change a buffer's representation.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
78 @end defvar
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
79
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
80 @defvar default-enable-multibyte-characters
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
81 @tindex default-enable-multibyte-characters
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
82 This variable's value is entirely equivalent to @code{(default-value
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
83 'enable-multibyte-characters)}, and setting this variable changes that
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
84 default value. Setting the local binding of
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
85 @code{enable-multibyte-characters} in a specific buffer is not allowed,
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
86 but changing the default value is supported, and it is a reasonable
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
87 thing to do, because it has no effect on existing buffers.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
88
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
89 The @samp{--unibyte} command line option does its job by setting the
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
90 default value to @code{nil} early in startup.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
91 @end defvar
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
92
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
93 @defun position-bytes position
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
94 @tindex position-bytes
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
95 Return the byte-position corresponding to buffer position @var{position}
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
96 in the current buffer.
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
97 @end defun
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
98
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
99 @defun byte-to-position byte-position
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
100 @tindex byte-to-position
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
101 Return the buffer position corresponding to byte-position
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
102 @var{byte-position} in the current buffer.
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
103 @end defun
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
104
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
105 @defun multibyte-string-p string
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
106 @tindex multibyte-string-p
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
107 Return @code{t} if @var{string} is a multibyte string.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
108 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
109
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
110 @node Converting Representations
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
111 @section Converting Text Representations
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
112
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
113 Emacs can convert unibyte text to multibyte; it can also convert
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
114 multibyte text to unibyte, though this conversion loses information. In
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
115 general these conversions happen when inserting text into a buffer, or
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
116 when putting text from several strings together in one string. You can
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
117 also explicitly convert a string's contents to either representation.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
118
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
119 Emacs chooses the representation for a string based on the text that
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
120 it is constructed from. The general rule is to convert unibyte text to
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
121 multibyte text when combining it with other multibyte text, because the
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
122 multibyte representation is more general and can hold whatever
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
123 characters the unibyte text has.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
124
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
125 When inserting text into a buffer, Emacs converts the text to the
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
126 buffer's representation, as specified by
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
127 @code{enable-multibyte-characters} in that buffer. In particular, when
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
128 you insert multibyte text into a unibyte buffer, Emacs converts the text
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
129 to unibyte, even though this conversion cannot in general preserve all
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
130 the characters that might be in the multibyte text. The other natural
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
131 alternative, to convert the buffer contents to multibyte, is not
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
132 acceptable because the buffer's representation is a choice made by the
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
133 user that cannot be overridden automatically.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
134
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
135 Converting unibyte text to multibyte text leaves @sc{ASCII} characters
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
136 unchanged, and likewise 128 through 159. It converts the non-@sc{ASCII}
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
137 codes 160 through 255 by adding the value @code{nonascii-insert-offset}
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
138 to each character code. By setting this variable, you specify which
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
139 character set the unibyte characters correspond to (@pxref{Character
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
140 Sets}). For example, if @code{nonascii-insert-offset} is 2048, which is
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
141 @code{(- (make-char 'latin-iso8859-1) 128)}, then the unibyte
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
142 non-@sc{ASCII} characters correspond to Latin 1. If it is 2688, which
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
143 is @code{(- (make-char 'greek-iso8859-7) 128)}, then they correspond to
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
144 Greek letters.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
145
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
146 Converting multibyte text to unibyte is simpler: it performs
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
147 logical-and of each character code with 255. If
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
148 @code{nonascii-insert-offset} has a reasonable value, corresponding to
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
149 the beginning of some character set, this conversion is the inverse of
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
150 the other: converting unibyte text to multibyte and back to unibyte
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
151 reproduces the original unibyte text.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
152
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
153 @defvar nonascii-insert-offset
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
154 @tindex nonascii-insert-offset
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
155 This variable specifies the amount to add to a non-@sc{ASCII} character
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
156 when converting unibyte text to multibyte. It also applies when
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
157 @code{self-insert-command} inserts a character in the unibyte
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
158 non-@sc{ASCII} range, 128 through 255. However, the function
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
159 @code{insert-char} does not perform this conversion.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
160
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
161 The right value to use to select character set @var{cs} is @code{(-
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
162 (make-char @var{cs}) 128)}. If the value of
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
163 @code{nonascii-insert-offset} is zero, then conversion actually uses the
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
164 value for the Latin 1 character set, rather than zero.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
165 @end defvar
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
166
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
167 @defvar nonascii-translation-table
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
168 @tindex nonascii-translation-table
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
169 This variable provides a more general alternative to
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
170 @code{nonascii-insert-offset}. You can use it to specify independently
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
171 how to translate each code in the range of 128 through 255 into a
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
172 multibyte character. The value should be a vector, or @code{nil}.
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
173 If this is non-@code{nil}, it overrides @code{nonascii-insert-offset}.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
174 @end defvar
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
175
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
176 @defun string-make-unibyte string
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
177 @tindex string-make-unibyte
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
178 This function converts the text of @var{string} to unibyte
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
179 representation, if it isn't already, and returns the result. If
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
180 @var{string} is a unibyte string, it is returned unchanged.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
181 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
182
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
183 @defun string-make-multibyte string
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
184 @tindex string-make-multibyte
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
185 This function converts the text of @var{string} to multibyte
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
186 representation, if it isn't already, and returns the result. If
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
187 @var{string} is a multibyte string, it is returned unchanged.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
188 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
189
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
190 @node Selecting a Representation
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
191 @section Selecting a Representation
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
192
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
193 Sometimes it is useful to examine an existing buffer or string as
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
194 multibyte when it was unibyte, or vice versa.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
195
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
196 @defun set-buffer-multibyte multibyte
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
197 @tindex set-buffer-multibyte
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
198 Set the representation type of the current buffer. If @var{multibyte}
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
199 is non-@code{nil}, the buffer becomes multibyte. If @var{multibyte}
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
200 is @code{nil}, the buffer becomes unibyte.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
201
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
202 This function leaves the buffer contents unchanged when viewed as a
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
203 sequence of bytes. As a consequence, it can change the contents viewed
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
204 as characters; a sequence of two bytes which is treated as one character
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
205 in multibyte representation will count as two characters in unibyte
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
206 representation.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
207
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
208 This function sets @code{enable-multibyte-characters} to record which
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
209 representation is in use. It also adjusts various data in the buffer
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
210 (including overlays, text properties and markers) so that they cover the
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
211 same text as they did before.
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
212
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
213 You cannot use @code{set-buffer-multibyte} on an indirect buffer,
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
214 because indirect buffers always inherit the representation of the
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
215 base buffer.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
216 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
217
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
218 @defun string-as-unibyte string
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
219 @tindex string-as-unibyte
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
220 This function returns a string with the same bytes as @var{string} but
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
221 treating each byte as a character. This means that the value may have
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
222 more characters than @var{string} has.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
223
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
224 If @var{string} is already a unibyte string, then the value is
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
225 @var{string} itself.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
226 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
227
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
228 @defun string-as-multibyte string
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
229 @tindex string-as-multibyte
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
230 This function returns a string with the same bytes as @var{string} but
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
231 treating each multibyte sequence as one character. This means that the
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
232 value may have fewer characters than @var{string} has.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
233
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
234 If @var{string} is already a multibyte string, then the value is
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
235 @var{string} itself.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
236 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
237
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
238 @node Character Codes
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
239 @section Character Codes
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
240 @cindex character codes
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
241
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
242 The unibyte and multibyte text representations use different character
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
243 codes. The valid character codes for unibyte representation range from
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
244 0 to 255---the values that can fit in one byte. The valid character
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
245 codes for multibyte representation range from 0 to 524287, but not all
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
246 values in that range are valid. In particular, the values 128 through
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
247 255 are not legitimate in multibyte text (though they can occur in ``raw
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
248 bytes''; @pxref{Explicit Encoding}). Only the @sc{ASCII} codes 0
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
249 through 127 are fully legitimate in both representations.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
250
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
251 @defun char-valid-p charcode
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
252 This returns @code{t} if @var{charcode} is valid for either one of the two
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
253 text representations.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
254
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
255 @example
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
256 (char-valid-p 65)
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
257 @result{} t
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
258 (char-valid-p 256)
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
259 @result{} nil
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
260 (char-valid-p 2248)
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
261 @result{} t
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
262 @end example
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
263 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
264
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
265 @node Character Sets
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
266 @section Character Sets
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
267 @cindex character sets
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
268
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
269 Emacs classifies characters into various @dfn{character sets}, each of
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
270 which has a name which is a symbol. Each character belongs to one and
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
271 only one character set.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
272
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
273 In general, there is one character set for each distinct script. For
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
274 example, @code{latin-iso8859-1} is one character set,
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
275 @code{greek-iso8859-7} is another, and @code{ascii} is another. An
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
276 Emacs character set can hold at most 9025 characters; therefore, in some
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
277 cases, characters that would logically be grouped together are split
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
278 into several character sets. For example, one set of Chinese
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
279 characters, generally known as Big 5, is divided into two Emacs
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
280 character sets, @code{chinese-big5-1} and @code{chinese-big5-2}.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
281
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
282 @defun charsetp object
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
283 @tindex charsetp
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
284 Return @code{t} if @var{object} is a character set name symbol,
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
285 @code{nil} otherwise.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
286 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
287
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
288 @defun charset-list
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
289 @tindex charset-list
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
290 This function returns a list of all defined character set names.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
291 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
292
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
293 @defun char-charset character
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
294 @tindex char-charset
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
295 This function returns the name of the character set that @var{character}
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
296 belongs to.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
297 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
298
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
299 @node Chars and Bytes
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
300 @section Characters and Bytes
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
301 @cindex bytes and characters
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
302
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
303 @cindex introduction sequence
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
304 @cindex dimension (of character set)
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
305 In multibyte representation, each character occupies one or more
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
306 bytes. Each character set has an @dfn{introduction sequence}, which is
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
307 normally one or two bytes long. (Exception: the @sc{ASCII} character
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
308 set has a zero-length introduction sequence.) The introduction sequence
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
309 is the beginning of the byte sequence for any character in the character
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
310 set. The rest of the character's bytes distinguish it from the other
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
311 characters in the same character set. Depending on the character set,
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
312 there are either one or two distinguishing bytes; the number of such
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
313 bytes is called the @dfn{dimension} of the character set.
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
314
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
315 @defun charset-dimension charset
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
316 @tindex charset-dimension
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
317 This function returns the dimension of @var{charset}; at present, the
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
318 dimension is always 1 or 2.
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
319 @end defun
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
320
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
321 @defun charset-bytes charset
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
322 @tindex charset-bytes
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
323 This function returns the number of bytes used to represent a character
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
324 in character set @var{charset}.
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
325 @end defun
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
326
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
327 This is the simplest way to determine the byte length of a character
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
328 set's introduction sequence:
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
329
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
330 @example
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
331 (- (charset-bytes @var{charset})
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
332 (charset-dimension @var{charset}))
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
333 @end example
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
334
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
335 @node Splitting Characters
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
336 @section Splitting Characters
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
337
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
338 The functions in this section convert between characters and the byte
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
339 values used to represent them. For most purposes, there is no need to
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
340 be concerned with the sequence of bytes used to represent a character,
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
341 because Emacs translates automatically when necessary.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
342
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
343 @defun split-char character
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
344 @tindex split-char
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
345 Return a list containing the name of the character set of
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
346 @var{character}, followed by one or two byte values (integers) which
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
347 identify @var{character} within that character set. The number of byte
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
348 values is the character set's dimension.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
349
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
350 @example
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
351 (split-char 2248)
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
352 @result{} (latin-iso8859-1 72)
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
353 (split-char 65)
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
354 @result{} (ascii 65)
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
355 @end example
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
356
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
357 Unibyte non-@sc{ASCII} characters are considered as part of
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
358 the @code{ascii} character set:
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
359
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
360 @example
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
361 (split-char 192)
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
362 @result{} (ascii 192)
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
363 @end example
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
364 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
365
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
366 @defun make-char charset &rest byte-values
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
367 @tindex make-char
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
368 This function returns the character in character set @var{charset}
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
369 identified by @var{byte-values}. This is roughly the inverse of
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
370 @code{split-char}. Normally, you should specify either one or two
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
371 @var{byte-values}, according to the dimension of @var{charset}. For
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
372 example,
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
373
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
374 @example
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
375 (make-char 'latin-iso8859-1 72)
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
376 @result{} 2248
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
377 @end example
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
378 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
379
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
380 @cindex generic characters
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
381 If you call @code{make-char} with no @var{byte-values}, the result is
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
382 a @dfn{generic character} which stands for @var{charset}. A generic
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
383 character is an integer, but it is @emph{not} valid for insertion in the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
384 buffer as a character. It can be used in @code{char-table-range} to
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
385 refer to the whole character set (@pxref{Char-Tables}).
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
386 @code{char-valid-p} returns @code{nil} for generic characters.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
387 For example:
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
388
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
389 @example
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
390 (make-char 'latin-iso8859-1)
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
391 @result{} 2176
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
392 (char-valid-p 2176)
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
393 @result{} nil
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
394 (split-char 2176)
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
395 @result{} (latin-iso8859-1 0)
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
396 @end example
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
397
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
398 @node Scanning Charsets
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
399 @section Scanning for Character Sets
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
400
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
401 Sometimes it is useful to find out which character sets appear in a
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
402 part of a buffer or a string. One use for this is in determining which
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
403 coding systems (@pxref{Coding Systems}) are capable of representing all
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
404 of the text in question.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
405
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
406 @defun find-charset-region beg end &optional translation
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
407 @tindex find-charset-region
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
408 This function returns a list of the character sets that appear in the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
409 current buffer between positions @var{beg} and @var{end}.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
410
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
411 The optional argument @var{translation} specifies a translation table to
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
412 be used in scanning the text (@pxref{Translation of Characters}). If it
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
413 is non-@code{nil}, then each character in the region is translated
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
414 through this table, and the value returned describes the translated
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
415 characters instead of the characters actually in the buffer.
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
416
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
417 In two peculiar cases, the value includes the symbol @code{unknown}:
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
418
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
419 @itemize @bullet
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
420 @item
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
421 When a unibyte buffer contains non-@sc{ASCII} characters.
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
422
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
423 @item
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
424 When a multibyte buffer contains invalid byte-sequences (raw bytes).
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
425 @xref{Explicit Encoding}.
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
426 @end itemize
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
427 @end defun
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
428
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
429 @defun find-charset-string string &optional translation
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
430 @tindex find-charset-string
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
431 This function returns a list of the character sets that appear in the
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
432 string @var{string}. It is just like @code{find-charset-region}, except
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
433 that it applies to the contents of @var{string} instead of part of the
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
434 current buffer.
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
435 @end defun
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
436
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
437 @node Translation of Characters
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
438 @section Translation of Characters
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
439 @cindex character translation tables
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
440 @cindex translation tables
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
441
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
442 A @dfn{translation table} specifies a mapping of characters
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
443 into characters. These tables are used in encoding and decoding, and
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
444 for other purposes. Some coding systems specify their own particular
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
445 translation tables; there are also default translation tables which
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
446 apply to all other coding systems.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
447
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
448 @defun make-translation-table translations
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
449 This function returns a translation table based on the arguments
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
450 @var{translations}. Each argument---each element of
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
451 @var{translations}---should be a list of the form @code{(@var{from}
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
452 . @var{to})}; this says to translate the character @var{from} into
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
453 @var{to}.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
454
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
455 You can also map one whole character set into another character set with
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
456 the same dimension. To do this, you specify a generic character (which
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
457 designates a character set) for @var{from} (@pxref{Splitting Characters}).
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
458 In this case, @var{to} should also be a generic character, for another
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
459 character set of the same dimension. Then the translation table
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
460 translates each character of @var{from}'s character set into the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
461 corresponding character of @var{to}'s character set.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
462 @end defun
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
463
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
464 In decoding, the translation table's translations are applied to the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
465 characters that result from ordinary decoding. If a coding system has
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
466 property @code{character-translation-table-for-decode}, that specifies
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
467 the translation table to use. Otherwise, if
23433
a53274056f20 Fix names of standard-translation-table-for-decode(encode).
Richard M. Stallman <rms@gnu.org>
parents: 23110
diff changeset
468 @code{standard-translation-table-for-decode} is non-@code{nil}, decoding
a53274056f20 Fix names of standard-translation-table-for-decode(encode).
Richard M. Stallman <rms@gnu.org>
parents: 23110
diff changeset
469 uses that table.
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
470
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
471 In encoding, the translation table's translations are applied to the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
472 characters in the buffer, and the result of translation is actually
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
473 encoded. If a coding system has property
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
474 @code{character-translation-table-for-encode}, that specifies the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
475 translation table to use. Otherwise the variable
23433
a53274056f20 Fix names of standard-translation-table-for-decode(encode).
Richard M. Stallman <rms@gnu.org>
parents: 23110
diff changeset
476 @code{standard-translation-table-for-encode} specifies the translation
a53274056f20 Fix names of standard-translation-table-for-decode(encode).
Richard M. Stallman <rms@gnu.org>
parents: 23110
diff changeset
477 table.
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
478
23433
a53274056f20 Fix names of standard-translation-table-for-decode(encode).
Richard M. Stallman <rms@gnu.org>
parents: 23110
diff changeset
479 @defvar standard-translation-table-for-decode
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
480 This is the default translation table for decoding, for
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
481 coding systems that don't specify any other translation table.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
482 @end defvar
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
483
23433
a53274056f20 Fix names of standard-translation-table-for-decode(encode).
Richard M. Stallman <rms@gnu.org>
parents: 23110
diff changeset
484 @defvar standard-translation-table-for-encode
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
485 This is the default translation table for encoding, for
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
486 coding systems that don't specify any other translation table.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
487 @end defvar
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
488
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
489 @node Coding Systems
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
490 @section Coding Systems
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
491
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
492 @cindex coding system
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
493 When Emacs reads or writes a file, and when Emacs sends text to a
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
494 subprocess or receives text from a subprocess, it normally performs
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
495 character code conversion and end-of-line conversion as specified
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
496 by a particular @dfn{coding system}.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
497
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
498 How to define a coding system is an arcane matter, not yet documented.
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
499
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
500 @menu
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
501 * Coding System Basics::
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
502 * Encoding and I/O::
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
503 * Lisp and Coding Systems::
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
504 * User-Chosen Coding Systems::
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
505 * Default Coding Systems::
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
506 * Specifying Coding Systems::
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
507 * Explicit Encoding::
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
508 * Terminal I/O Encoding::
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
509 * MS-DOS File Types::
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
510 @end menu
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
511
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
512 @node Coding System Basics
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
513 @subsection Basic Concepts of Coding Systems
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
514
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
515 @cindex character code conversion
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
516 @dfn{Character code conversion} involves conversion between the encoding
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
517 used inside Emacs and some other encoding. Emacs supports many
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
518 different encodings, in that it can convert to and from them. For
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
519 example, it can convert text to or from encodings such as Latin 1, Latin
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
520 2, Latin 3, Latin 4, Latin 5, and several variants of ISO 2022. In some
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
521 cases, Emacs supports several alternative encodings for the same
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
522 characters; for example, there are three coding systems for the Cyrillic
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
523 (Russian) alphabet: ISO, Alternativnyj, and KOI8.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
524
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
525 Most coding systems specify a particular character code for
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
526 conversion, but some of them leave this unspecified---to be chosen
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
527 heuristically based on the data.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
528
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
529 @cindex end of line conversion
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
530 @dfn{End of line conversion} handles three different conventions used
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
531 on various systems for representing end of line in files. The Unix
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
532 convention is to use the linefeed character (also called newline). The
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
533 DOS convention is to use the two character sequence, carriage-return
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
534 linefeed, at the end of a line. The Mac convention is to use just
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
535 carriage-return.
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
536
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
537 @cindex base coding system
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
538 @cindex variant coding system
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
539 @dfn{Base coding systems} such as @code{latin-1} leave the end-of-line
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
540 conversion unspecified, to be chosen based on the data. @dfn{Variant
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
541 coding systems} such as @code{latin-1-unix}, @code{latin-1-dos} and
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
542 @code{latin-1-mac} specify the end-of-line conversion explicitly as
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
543 well. Most base coding systems have three corresponding variants whose
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
544 names are formed by adding @samp{-unix}, @samp{-dos} and @samp{-mac}.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
545
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
546 The coding system @code{raw-text} is special in that it prevents
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
547 character code conversion, and causes the buffer visited with that
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
548 coding system to be a unibyte buffer. It does not specify the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
549 end-of-line conversion, allowing that to be determined as usual by the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
550 data, and has the usual three variants which specify the end-of-line
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
551 conversion. @code{no-conversion} is equivalent to @code{raw-text-unix}:
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
552 it specifies no conversion of either character codes or end-of-line.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
553
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
554 The coding system @code{emacs-mule} specifies that the data is
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
555 represented in the internal Emacs encoding. This is like
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
556 @code{raw-text} in that no code conversion happens, but different in
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
557 that the result is multibyte data.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
558
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
559 @defun coding-system-get coding-system property
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
560 @tindex coding-system-get
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
561 This function returns the specified property of the coding system
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
562 @var{coding-system}. Most coding system properties exist for internal
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
563 purposes, but one that you might find useful is @code{mime-charset}.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
564 That property's value is the name used in MIME for the character coding
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
565 which this coding system can read and write. Examples:
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
566
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
567 @example
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
568 (coding-system-get 'iso-latin-1 'mime-charset)
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
569 @result{} iso-8859-1
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
570 (coding-system-get 'iso-2022-cn 'mime-charset)
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
571 @result{} iso-2022-cn
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
572 (coding-system-get 'cyrillic-koi8 'mime-charset)
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
573 @result{} koi8-r
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
574 @end example
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
575
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
576 The value of the @code{mime-charset} property is also defined
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
577 as an alias for the coding system.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
578 @end defun
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
579
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
580 @node Encoding and I/O
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
581 @subsection Encoding and I/O
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
582
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
583 The principal purpose of coding systems is for use in reading and
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
584 writing files. The function @code{insert-file-contents} uses
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
585 a coding system for decoding the file data, and @code{write-region}
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
586 uses one to encode the buffer contents.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
587
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
588 You can specify the coding system to use either explicitly
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
589 (@pxref{Specifying Coding Systems}), or implicitly using the defaulting
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
590 mechanism (@pxref{Default Coding Systems}). But these methods may not
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
591 completely specify what to do. For example, they may choose a coding
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
592 system such as @code{undefined} which leaves the character code
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
593 conversion to be determined from the data. In these cases, the I/O
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
594 operation finishes the job of choosing a coding system. Very often
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
595 you will want to find out afterwards which coding system was chosen.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
596
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
597 @defvar buffer-file-coding-system
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
598 @tindex buffer-file-coding-system
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
599 This variable records the coding system that was used for visiting the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
600 current buffer. It is used for saving the buffer, and for writing part
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
601 of the buffer with @code{write-region}. When those operations ask the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
602 user to specify a different coding system,
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
603 @code{buffer-file-coding-system} is updated to the coding system
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
604 specified.
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
605
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
606 However, @code{buffer-file-coding-system} does not affect sending text
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
607 to a subprocess.
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
608 @end defvar
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
609
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
610 @defvar save-buffer-coding-system
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
611 @tindex save-buffer-coding-system
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
612 This variable specifies the coding system for saving the buffer---but it
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
613 is not used for @code{write-region}. When saving the buffer asks the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
614 user to specify a different coding system, and
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
615 @code{save-buffer-coding-system} was used, then it is updated to the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
616 coding system that was specified.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
617 @end defvar
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
618
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
619 @defvar last-coding-system-used
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
620 @tindex last-coding-system-used
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
621 I/O operations for files and subprocesses set this variable to the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
622 coding system name that was used. The explicit encoding and decoding
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
623 functions (@pxref{Explicit Encoding}) set it too.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
624
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
625 @strong{Warning:} Since receiving subprocess output sets this variable,
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
626 it can change whenever Emacs waits; therefore, you should use copy the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
627 value shortly after the function call which stores the value you are
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
628 interested in.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
629 @end defvar
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
630
23110
0d84817a4973 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22267
diff changeset
631 The variable @code{selection-coding-system} specifies how to encode
0d84817a4973 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22267
diff changeset
632 selections for the window system. @xref{Window System Selections}.
0d84817a4973 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22267
diff changeset
633
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
634 @node Lisp and Coding Systems
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
635 @subsection Coding Systems in Lisp
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
636
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
637 Here are Lisp facilities for working with coding systems;
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
638
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
639 @defun coding-system-list &optional base-only
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
640 @tindex coding-system-list
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
641 This function returns a list of all coding system names (symbols). If
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
642 @var{base-only} is non-@code{nil}, the value includes only the
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
643 base coding systems. Otherwise, it includes variant coding systems as well.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
644 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
645
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
646 @defun coding-system-p object
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
647 @tindex coding-system-p
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
648 This function returns @code{t} if @var{object} is a coding system
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
649 name.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
650 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
651
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
652 @defun check-coding-system coding-system
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
653 @tindex check-coding-system
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
654 This function checks the validity of @var{coding-system}.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
655 If that is valid, it returns @var{coding-system}.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
656 Otherwise it signals an error with condition @code{coding-system-error}.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
657 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
658
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
659 @defun coding-system-change-eol-conversion coding-system eol-type
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
660 @tindex coding-system-change-eol-conversion
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
661 This function returns a coding system which is like @var{coding-system}
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
662 except for its eol conversion, which is specified by @code{eol-type}.
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
663 @var{eol-type} should be @code{unix}, @code{dos}, @code{mac}, or
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
664 @code{nil}. If it is @code{nil}, the returned coding system determines
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
665 the end-of-line conversion from the data.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
666 @end defun
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
667
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
668 @defun coding-system-change-text-conversion eol-coding text-coding
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
669 @tindex coding-system-change-text-conversion
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
670 This function returns a coding system which uses the end-of-line
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
671 conversion of @var{eol-coding}, and the text conversion of
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
672 @var{text-coding}. If @var{text-coding} is @code{nil}, it returns
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
673 @code{undecided}, or one of its variants according to @var{eol-coding}.
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
674 @end defun
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
675
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
676 @defun find-coding-systems-region from to
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
677 @tindex find-coding-systems-region
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
678 This function returns a list of coding systems that could be used to
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
679 encode a text between @var{from} and @var{to}. All coding systems in
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
680 the list can safely encode any multibyte characters in that portion of
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
681 the text.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
682
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
683 If the text contains no multibyte characters, the function returns the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
684 list @code{(undecided)}.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
685 @end defun
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
686
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
687 @defun find-coding-systems-string string
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
688 @tindex find-coding-systems-string
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
689 This function returns a list of coding systems that could be used to
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
690 encode the text of @var{string}. All coding systems in the list can
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
691 safely encode any multibyte characters in @var{string}. If the text
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
692 contains no multibyte characters, this returns the list
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
693 @code{(undecided)}.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
694 @end defun
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
695
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
696 @defun find-coding-systems-for-charsets charsets
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
697 @tindex find-coding-systems-for-charsets
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
698 This function returns a list of coding systems that could be used to
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
699 encode all the character sets in the list @var{charsets}.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
700 @end defun
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
701
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
702 @defun detect-coding-region start end &optional highest
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
703 @tindex detect-coding-region
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
704 This function chooses a plausible coding system for decoding the text
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
705 from @var{start} to @var{end}. This text should be ``raw bytes''
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
706 (@pxref{Explicit Encoding}).
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
707
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
708 Normally this function returns a list of coding systems that could
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
709 handle decoding the text that was scanned. They are listed in order of
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
710 decreasing priority. But if @var{highest} is non-@code{nil}, then the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
711 return value is just one coding system, the one that is highest in
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
712 priority.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
713
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
714 If the region contains only @sc{ASCII} characters, the value
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
715 is @code{undecided} or @code{(undecided)}.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
716 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
717
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
718 @defun detect-coding-string string highest
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
719 @tindex detect-coding-string
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
720 This function is like @code{detect-coding-region} except that it
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
721 operates on the contents of @var{string} instead of bytes in the buffer.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
722 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
723
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
724 @xref{Process Information}, for how to examine or set the coding
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
725 systems used for I/O to a subprocess.
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
726
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
727 @node User-Chosen Coding Systems
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
728 @subsection User-Chosen Coding Systems
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
729
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
730 @tindex select-safe-coding-system
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
731 @defun select-safe-coding-system from to &optional preferred-coding-system
22267
dfac7398266b *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22252
diff changeset
732 This function selects a coding system for encoding the text between
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
733 @var{from} and @var{to}, asking the user to choose if necessary.
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
734
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
735 The optional argument @var{preferred-coding-system} specifies a coding
22267
dfac7398266b *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22252
diff changeset
736 system to try first. If that one can handle the text in the specified
dfac7398266b *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22252
diff changeset
737 region, then it is used. If this argument is omitted, the current
dfac7398266b *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22252
diff changeset
738 buffer's value of @code{buffer-file-coding-system} is tried first.
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
739
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
740 If the region contains some multibyte characters that the preferred
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
741 coding system cannot encode, this function asks the user to choose from
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
742 a list of coding systems which can encode the text, and returns the
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
743 user's choice.
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
744
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
745 One other kludgy feature: if @var{from} is a string, the string is the
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
746 target text, and @var{to} is ignored.
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
747 @end defun
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
748
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
749 Here are two functions you can use to let the user specify a coding
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
750 system, with completion. @xref{Completion}.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
751
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
752 @defun read-coding-system prompt &optional default
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
753 @tindex read-coding-system
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
754 This function reads a coding system using the minibuffer, prompting with
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
755 string @var{prompt}, and returns the coding system name as a symbol. If
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
756 the user enters null input, @var{default} specifies which coding system
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
757 to return. It should be a symbol or a string.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
758 @end defun
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
759
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
760 @defun read-non-nil-coding-system prompt
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
761 @tindex read-non-nil-coding-system
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
762 This function reads a coding system using the minibuffer, prompting with
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
763 string @var{prompt}, and returns the coding system name as a symbol. If
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
764 the user tries to enter null input, it asks the user to try again.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
765 @xref{Coding Systems}.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
766 @end defun
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
767
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
768 @node Default Coding Systems
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
769 @subsection Default Coding Systems
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
770
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
771 This section describes variables that specify the default coding
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
772 system for certain files or when running certain subprograms, and the
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
773 function that I/O operations use to access them.
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
774
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
775 The idea of these variables is that you set them once and for all to the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
776 defaults you want, and then do not change them again. To specify a
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
777 particular coding system for a particular operation in a Lisp program,
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
778 don't change these variables; instead, override them using
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
779 @code{coding-system-for-read} and @code{coding-system-for-write}
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
780 (@pxref{Specifying Coding Systems}).
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
781
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
782 @defvar file-coding-system-alist
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
783 @tindex file-coding-system-alist
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
784 This variable is an alist that specifies the coding systems to use for
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
785 reading and writing particular files. Each element has the form
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
786 @code{(@var{pattern} . @var{coding})}, where @var{pattern} is a regular
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
787 expression that matches certain file names. The element applies to file
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
788 names that match @var{pattern}.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
789
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
790 The @sc{cdr} of the element, @var{coding}, should be either a coding
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
791 system, a cons cell containing two coding systems, or a function symbol.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
792 If @var{val} is a coding system, that coding system is used for both
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
793 reading the file and writing it. If @var{val} is a cons cell containing
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
794 two coding systems, its @sc{car} specifies the coding system for
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
795 decoding, and its @sc{cdr} specifies the coding system for encoding.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
796
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
797 If @var{val} is a function symbol, the function must return a coding
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
798 system or a cons cell containing two coding systems. This value is used
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
799 as described above.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
800 @end defvar
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
801
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
802 @defvar process-coding-system-alist
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
803 @tindex process-coding-system-alist
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
804 This variable is an alist specifying which coding systems to use for a
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
805 subprocess, depending on which program is running in the subprocess. It
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
806 works like @code{file-coding-system-alist}, except that @var{pattern} is
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
807 matched against the program name used to start the subprocess. The coding
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
808 system or systems specified in this alist are used to initialize the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
809 coding systems used for I/O to the subprocess, but you can specify
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
810 other coding systems later using @code{set-process-coding-system}.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
811 @end defvar
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
812
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
813 @strong{Warning:} Coding systems such as @code{undecided} which
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
814 determine the coding system from the data do not work entirely reliably
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
815 with asynchronous subprocess output. This is because Emacs handles
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
816 asynchronous subprocess output in batches, as it arrives. If the coding
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
817 system leaves the character code conversion unspecified, or leaves the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
818 end-of-line conversion unspecified, Emacs must try to detect the proper
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
819 conversion from one batch at a time, and this does not always work.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
820
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
821 Therefore, with an asynchronous subprocess, if at all possible, use a
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
822 coding system which determines both the character code conversion and
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
823 the end of line conversion---that is, one like @code{latin-1-unix},
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
824 rather than @code{undecided} or @code{latin-1}.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
825
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
826 @defvar network-coding-system-alist
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
827 @tindex network-coding-system-alist
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
828 This variable is an alist that specifies the coding system to use for
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
829 network streams. It works much like @code{file-coding-system-alist},
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
830 with the difference that the @var{pattern} in an element may be either a
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
831 port number or a regular expression. If it is a regular expression, it
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
832 is matched against the network service name used to open the network
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
833 stream.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
834 @end defvar
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
835
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
836 @defvar default-process-coding-system
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
837 @tindex default-process-coding-system
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
838 This variable specifies the coding systems to use for subprocess (and
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
839 network stream) input and output, when nothing else specifies what to
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
840 do.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
841
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
842 The value should be a cons cell of the form @code{(@var{input-coding}
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
843 . @var{output-coding})}. Here @var{input-coding} applies to input from
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
844 the subprocess, and @var{output-coding} applies to output to it.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
845 @end defvar
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
846
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
847 @defun find-operation-coding-system operation &rest arguments
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
848 @tindex find-operation-coding-system
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
849 This function returns the coding system to use (by default) for
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
850 performing @var{operation} with @var{arguments}. The value has this
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
851 form:
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
852
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
853 @example
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
854 (@var{decoding-system} @var{encoding-system})
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
855 @end example
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
856
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
857 The first element, @var{decoding-system}, is the coding system to use
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
858 for decoding (in case @var{operation} does decoding), and
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
859 @var{encoding-system} is the coding system for encoding (in case
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
860 @var{operation} does encoding).
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
861
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
862 The argument @var{operation} should be an Emacs I/O primitive:
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
863 @code{insert-file-contents}, @code{write-region}, @code{call-process},
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
864 @code{call-process-region}, @code{start-process}, or
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
865 @code{open-network-stream}.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
866
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
867 The remaining arguments should be the same arguments that might be given
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
868 to that I/O primitive. Depending on which primitive, one of those
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
869 arguments is selected as the @dfn{target}. For example, if
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
870 @var{operation} does file I/O, whichever argument specifies the file
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
871 name is the target. For subprocess primitives, the process name is the
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
872 target. For @code{open-network-stream}, the target is the service name
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
873 or port number.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
874
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
875 This function looks up the target in @code{file-coding-system-alist},
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
876 @code{process-coding-system-alist}, or
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
877 @code{network-coding-system-alist}, depending on @var{operation}.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
878 @xref{Default Coding Systems}.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
879 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
880
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
881 @node Specifying Coding Systems
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
882 @subsection Specifying a Coding System for One Operation
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
883
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
884 You can specify the coding system for a specific operation by binding
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
885 the variables @code{coding-system-for-read} and/or
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
886 @code{coding-system-for-write}.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
887
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
888 @defvar coding-system-for-read
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
889 @tindex coding-system-for-read
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
890 If this variable is non-@code{nil}, it specifies the coding system to
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
891 use for reading a file, or for input from a synchronous subprocess.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
892
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
893 It also applies to any asynchronous subprocess or network stream, but in
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
894 a different way: the value of @code{coding-system-for-read} when you
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
895 start the subprocess or open the network stream specifies the input
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
896 decoding method for that subprocess or network stream. It remains in
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
897 use for that subprocess or network stream unless and until overridden.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
898
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
899 The right way to use this variable is to bind it with @code{let} for a
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
900 specific I/O operation. Its global value is normally @code{nil}, and
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
901 you should not globally set it to any other value. Here is an example
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
902 of the right way to use the variable:
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
903
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
904 @example
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
905 ;; @r{Read the file with no character code conversion.}
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
906 ;; @r{Assume @sc{crlf} represents end-of-line.}
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
907 (let ((coding-system-for-write 'emacs-mule-dos))
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
908 (insert-file-contents filename))
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
909 @end example
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
910
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
911 When its value is non-@code{nil}, @code{coding-system-for-read} takes
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
912 precedence over all other methods of specifying a coding system to use for
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
913 input, including @code{file-coding-system-alist},
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
914 @code{process-coding-system-alist} and
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
915 @code{network-coding-system-alist}.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
916 @end defvar
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
917
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
918 @defvar coding-system-for-write
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
919 @tindex coding-system-for-write
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
920 This works much like @code{coding-system-for-read}, except that it
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
921 applies to output rather than input. It affects writing to files,
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
922 as well as sending output to subprocesses and net connections.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
923
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
924 When a single operation does both input and output, as do
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
925 @code{call-process-region} and @code{start-process}, both
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
926 @code{coding-system-for-read} and @code{coding-system-for-write}
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
927 affect it.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
928 @end defvar
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
929
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
930 @defvar inhibit-eol-conversion
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
931 @tindex inhibit-eol-conversion
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
932 When this variable is non-@code{nil}, no end-of-line conversion is done,
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
933 no matter which coding system is specified. This applies to all the
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
934 Emacs I/O and subprocess primitives, and to the explicit encoding and
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
935 decoding functions (@pxref{Explicit Encoding}).
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
936 @end defvar
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
937
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
938 @node Explicit Encoding
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
939 @subsection Explicit Encoding and Decoding
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
940 @cindex encoding text
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
941 @cindex decoding text
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
942
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
943 All the operations that transfer text in and out of Emacs have the
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
944 ability to use a coding system to encode or decode the text.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
945 You can also explicitly encode and decode text using the functions
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
946 in this section.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
947
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
948 @cindex raw bytes
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
949 The result of encoding, and the input to decoding, are not ordinary
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
950 text. They are ``raw bytes''---bytes that represent text in the same
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
951 way that an external file would. When a buffer contains raw bytes, it
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
952 is most natural to mark that buffer as using unibyte representation,
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
953 using @code{set-buffer-multibyte} (@pxref{Selecting a Representation}),
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
954 but this is not required. If the buffer's contents are only temporarily
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
955 raw, leave the buffer multibyte, which will be correct after you decode
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
956 them.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
957
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
958 The usual way to get raw bytes in a buffer, for explicit decoding, is
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
959 to read them from a file with @code{insert-file-contents-literally}
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
960 (@pxref{Reading from Files}) or specify a non-@code{nil} @var{rawfile}
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
961 argument when visiting a file with @code{find-file-noselect}.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
962
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
963 The usual way to use the raw bytes that result from explicitly
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
964 encoding text is to copy them to a file or process---for example, to
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
965 write them with @code{write-region} (@pxref{Writing to Files}), and
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
966 suppress encoding for that @code{write-region} call by binding
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
967 @code{coding-system-for-write} to @code{no-conversion}.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
968
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
969 Raw bytes typically contain stray individual bytes with values in the
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
970 range 128 through 255, that are legitimate only as part of multibyte
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
971 sequences. Even if the buffer is multibyte, Emacs treats each such
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
972 individual byte as a character and uses the byte value as its character
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
973 code. In this way, character codes 128 through 255 can be found in a
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
974 multibyte buffer, even though they are not legitimate multibyte
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
975 character codes.
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
976
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
977 Raw bytes sometimes contain overlong byte-sequences that look like a
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
978 proper multibyte character plus extra superfluous trailing codes. For
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
979 most purposes, Emacs treats such a sequence in a buffer or string as a
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
980 single character, and if you look at its character code, you get the
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
981 value that corresponds to the multibyte character
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
982 sequence---disregarding the extra trailing codes. This is not quite
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
983 clean, but raw bytes are used only in limited ways, so as a practical
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
984 matter it is not worth the trouble to treat this case differently.
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
985
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
986 When a multibyte buffer contains illegitimate byte sequences,
24952
a6db4671c7a0 *** empty log message ***
Karl Heuer <kwzh@gnu.org>
parents: 24951
diff changeset
987 sometimes insertion or deletion can cause them to coalesce into a
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
988 legitimate multibyte character. For example, suppose the buffer
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
989 contains the sequence 129 68 192, 68 being the character @samp{D}. If
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
990 you delete the @samp{D}, the bytes 129 and 192 become adjacent, and thus
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
991 become one multibyte character (Latin-1 A with grave accent). Point
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
992 moves to one side or the other of the character, since it cannot be
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
993 within a character. Don't be alarmed by this.
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
994
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
995 Some really peculiar situations prevent proper coalescence. For
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
996 example, if you narrow the buffer so that the accessible portion begins
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
997 just before the @samp{D}, then delete the @samp{D}, the two surrounding
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
998 bytes cannot coalesce because one of them is outside the accessible
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
999 portion of the buffer. In this case, the deletion cannot be done, so
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
1000 @code{delete-region} signals an error.
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
1001
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
1002 Here are the functions to perform explicit encoding or decoding. The
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
1003 decoding functions produce ``raw bytes''; the encoding functions are
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
1004 meant to operate on ``raw bytes''. All of these functions discard text
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
1005 properties.
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
1006
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1007 @defun encode-coding-region start end coding-system
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1008 @tindex encode-coding-region
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1009 This function encodes the text from @var{start} to @var{end} according
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1010 to coding system @var{coding-system}. The encoded text replaces the
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1011 original text in the buffer. The result of encoding is ``raw bytes,''
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1012 but the buffer remains multibyte if it was multibyte before.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1013 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1014
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1015 @defun encode-coding-string string coding-system
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1016 @tindex encode-coding-string
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1017 This function encodes the text in @var{string} according to coding
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1018 system @var{coding-system}. It returns a new string containing the
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1019 encoded text. The result of encoding is a unibyte string of ``raw bytes.''
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1020 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1021
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1022 @defun decode-coding-region start end coding-system
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1023 @tindex decode-coding-region
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1024 This function decodes the text from @var{start} to @var{end} according
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1025 to coding system @var{coding-system}. The decoded text replaces the
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1026 original text in the buffer. To make explicit decoding useful, the text
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1027 before decoding ought to be ``raw bytes.''
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1028 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1029
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1030 @defun decode-coding-string string coding-system
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1031 @tindex decode-coding-string
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1032 This function decodes the text in @var{string} according to coding
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1033 system @var{coding-system}. It returns a new string containing the
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1034 decoded text. To make explicit decoding useful, the contents of
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1035 @var{string} ought to be ``raw bytes.''
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1036 @end defun
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1037
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1038 @node Terminal I/O Encoding
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1039 @subsection Terminal I/O Encoding
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1040
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1041 Emacs can decode keyboard input using a coding system, and encode
23110
0d84817a4973 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22267
diff changeset
1042 terminal output. This is useful for terminals that transmit or display
0d84817a4973 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22267
diff changeset
1043 text using a particular encoding such as Latin-1. Emacs does not set
0d84817a4973 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22267
diff changeset
1044 @code{last-coding-system-used} for encoding or decoding for the
0d84817a4973 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22267
diff changeset
1045 terminal.
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1046
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1047 @defun keyboard-coding-system
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1048 @tindex keyboard-coding-system
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1049 This function returns the coding system that is in use for decoding
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1050 keyboard input---or @code{nil} if no coding system is to be used.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1051 @end defun
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1052
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1053 @defun set-keyboard-coding-system coding-system
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1054 @tindex set-keyboard-coding-system
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1055 This function specifies @var{coding-system} as the coding system to
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1056 use for decoding keyboard input. If @var{coding-system} is @code{nil},
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1057 that means do not decode keyboard input.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1058 @end defun
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1059
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1060 @defun terminal-coding-system
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1061 @tindex terminal-coding-system
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1062 This function returns the coding system that is in use for encoding
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1063 terminal output---or @code{nil} for no encoding.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1064 @end defun
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1065
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1066 @defun set-terminal-coding-system coding-system
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1067 @tindex set-terminal-coding-system
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1068 This function specifies @var{coding-system} as the coding system to use
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1069 for encoding terminal output. If @var{coding-system} is @code{nil},
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1070 that means do not encode terminal output.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1071 @end defun
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1072
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1073 @node MS-DOS File Types
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1074 @subsection MS-DOS File Types
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1075 @cindex DOS file types
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1076 @cindex MS-DOS file types
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1077 @cindex Windows file types
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1078 @cindex file types on MS-DOS and Windows
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1079 @cindex text files and binary files
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1080 @cindex binary files and text files
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1081
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1082 Emacs on MS-DOS and on MS-Windows recognizes certain file names as
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1083 text files or binary files. By ``binary file'' we mean a file of
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1084 literal byte values that are not necessary meant to be characters.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1085 Emacs does no end-of-line conversion and no character code conversion
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1086 for a binary file. Meanwhile, when you create a new file which is
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1087 marked by its name as a ``text file'', Emacs uses DOS end-of-line
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1088 conversion.
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1089
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1090 @defvar buffer-file-type
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1091 This variable, automatically buffer-local in each buffer, records the
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1092 file type of the buffer's visited file. When a buffer does not specify
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1093 a coding system with @code{buffer-file-coding-system}, this variable is
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1094 used to determine which coding system to use when writing the contents
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1095 of the buffer. It should be @code{nil} for text, @code{t} for binary.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1096 If it is @code{t}, the coding system is @code{no-conversion}.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1097 Otherwise, @code{undecided-dos} is used.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1098
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1099 Normally this variable is set by visiting a file; it is set to
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1100 @code{nil} if the file was visited without any actual conversion.
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1101 @end defvar
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1102
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1103 @defopt file-name-buffer-file-type-alist
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1104 This variable holds an alist for recognizing text and binary files.
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1105 Each element has the form (@var{regexp} . @var{type}), where
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1106 @var{regexp} is matched against the file name, and @var{type} may be
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1107 @code{nil} for text, @code{t} for binary, or a function to call to
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1108 compute which. If it is a function, then it is called with a single
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1109 argument (the file name) and should return @code{t} or @code{nil}.
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1110
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1111 Emacs when running on MS-DOS or MS-Windows checks this alist to decide
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1112 which coding system to use when reading a file. For a text file,
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1113 @code{undecided-dos} is used. For a binary file, @code{no-conversion}
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1114 is used.
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1115
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1116 If no element in this alist matches a given file name, then
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1117 @code{default-buffer-file-type} says how to treat the file.
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1118 @end defopt
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1119
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1120 @defopt default-buffer-file-type
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1121 This variable says how to handle files for which
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1122 @code{file-name-buffer-file-type-alist} says nothing about the type.
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1123
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1124 If this variable is non-@code{nil}, then these files are treated as
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1125 binary: the coding system @code{no-conversion} is used. Otherwise,
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1126 nothing special is done for them---the coding system is deduced solely
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1127 from the file contents, in the usual Emacs fashion.
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1128 @end defopt
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1129
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1130 @node Input Methods
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1131 @section Input Methods
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1132 @cindex input methods
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1133
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1134 @dfn{Input methods} provide convenient ways of entering non-@sc{ASCII}
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1135 characters from the keyboard. Unlike coding systems, which translate
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1136 non-@sc{ASCII} characters to and from encodings meant to be read by
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1137 programs, input methods provide human-friendly commands. (@xref{Input
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1138 Methods,,, emacs, The GNU Emacs Manual}, for information on how users
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1139 use input methods to enter text.) How to define input methods is not
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1140 yet documented in this manual, but here we describe how to use them.
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1141
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1142 Each input method has a name, which is currently a string;
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1143 in the future, symbols may also be usable as input method names.
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1144
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1145 @tindex current-input-method
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1146 @defvar current-input-method
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1147 This variable holds the name of the input method now active in the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1148 current buffer. (It automatically becomes local in each buffer when set
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1149 in any fashion.) It is @code{nil} if no input method is active in the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1150 buffer now.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1151 @end defvar
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1152
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1153 @tindex default-input-method
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1154 @defvar default-input-method
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1155 This variable holds the default input method for commands that choose an
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1156 input method. Unlike @code{current-input-method}, this variable is
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1157 normally global.
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1158 @end defvar
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1159
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1160 @tindex set-input-method
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1161 @defun set-input-method input-method
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1162 This function activates input method @var{input-method} for the current
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1163 buffer. It also sets @code{default-input-method} to @var{input-method}.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1164 If @var{input-method} is @code{nil}, this function deactivates any input
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1165 method for the current buffer.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1166 @end defun
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1167
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1168 @tindex read-input-method-name
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1169 @defun read-input-method-name prompt &optional default inhibit-null
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1170 This function reads an input method name with the minibuffer, prompting
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1171 with @var{prompt}. If @var{default} is non-@code{nil}, that is returned
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1172 by default, if the user enters empty input. However, if
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1173 @var{inhibit-null} is non-@code{nil}, empty input signals an error.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1174
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1175 The returned value is a string.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1176 @end defun
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1177
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1178 @tindex input-method-alist
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1179 @defvar input-method-alist
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1180 This variable defines all the supported input methods.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1181 Each element defines one input method, and should have the form:
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1182
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1183 @example
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
1184 (@var{input-method} @var{language-env} @var{activate-func}
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
1185 @var{title} @var{description} @var{args}...)
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1186 @end example
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1187
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
1188 Here @var{input-method} is the input method name, a string;
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
1189 @var{language-env} is another string, the name of the language
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
1190 environment this input method is recommended for. (That serves only for
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
1191 documentation purposes.)
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1192
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1193 @var{title} is a string to display in the mode line while this method is
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1194 active. @var{description} is a string describing this method and what
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1195 it is good for.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1196
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1197 @var{activate-func} is a function to call to activate this method. The
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1198 @var{args}, if any, are passed as arguments to @var{activate-func}. All
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1199 told, the arguments to @var{activate-func} are @var{input-method} and
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1200 the @var{args}.
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
1201 @end defvar
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1202
23110
0d84817a4973 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22267
diff changeset
1203 The fundamental interface to input methods is through the
0d84817a4973 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22267
diff changeset
1204 variable @code{input-method-function}. @xref{Reading One Event}.