Mercurial > emacs
annotate lispref/nonascii.texi @ 25427:dde5fcbfa2af
(Info-tagify): Don't insert more than one newline before the tag table.
(Info-tagify): Start by widening.
Match node headers that don't list the file name,
and more kinds of page separations.
Strip properties during tagification.
Use start of node header line as tag's position.
Fix the "done" message.
(Info-validate): Save and restore match data around narrowing down.
author | Richard M. Stallman <rms@gnu.org> |
---|---|
date | Sun, 29 Aug 1999 19:19:00 +0000 |
parents | a6db4671c7a0 |
children | 467b88fab665 |
rev | line source |
---|---|
21006 | 1 @c -*-texinfo-*- |
2 @c This is part of the GNU Emacs Lisp Reference Manual. | |
3 @c Copyright (C) 1998 Free Software Foundation, Inc. | |
4 @c See the file elisp.texi for copying conditions. | |
5 @setfilename ../info/characters | |
6 @node Non-ASCII Characters, Searching and Matching, Text, Top | |
7 @chapter Non-ASCII Characters | |
8 @cindex multibyte characters | |
9 @cindex non-ASCII characters | |
10 | |
11 This chapter covers the special issues relating to non-@sc{ASCII} | |
12 characters and how they are stored in strings and buffers. | |
13 | |
14 @menu | |
15 * Text Representations:: | |
16 * Converting Representations:: | |
17 * Selecting a Representation:: | |
18 * Character Codes:: | |
19 * Character Sets:: | |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
20 * Chars and Bytes:: |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
21 * Splitting Characters:: |
21006 | 22 * Scanning Charsets:: |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
23 * Translation of Characters:: |
21006 | 24 * Coding Systems:: |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
25 * Input Methods:: |
21006 | 26 @end menu |
27 | |
28 @node Text Representations | |
29 @section Text Representations | |
30 @cindex text representations | |
31 | |
32 Emacs has two @dfn{text representations}---two ways to represent text | |
33 in a string or buffer. These are called @dfn{unibyte} and | |
34 @dfn{multibyte}. Each string, and each buffer, uses one of these two | |
35 representations. For most purposes, you can ignore the issue of | |
36 representations, because Emacs converts text between them as | |
37 appropriate. Occasionally in Lisp programming you will need to pay | |
38 attention to the difference. | |
39 | |
40 @cindex unibyte text | |
41 In unibyte representation, each character occupies one byte and | |
42 therefore the possible character codes range from 0 to 255. Codes 0 | |
43 through 127 are @sc{ASCII} characters; the codes from 128 through 255 | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
44 are used for one non-@sc{ASCII} character set (you can choose which |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
45 character set by setting the variable @code{nonascii-insert-offset}). |
21006 | 46 |
47 @cindex leading code | |
48 @cindex multibyte text | |
22252
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
49 @cindex trailing codes |
21006 | 50 In multibyte representation, a character may occupy more than one |
51 byte, and as a result, the full range of Emacs character codes can be | |
52 stored. The first byte of a multibyte character is always in the range | |
53 128 through 159 (octal 0200 through 0237). These values are called | |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
54 @dfn{leading codes}. The second and subsequent bytes of a multibyte |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
55 character are always in the range 160 through 255 (octal 0240 through |
22252
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
56 0377); these values are @dfn{trailing codes}. |
21006 | 57 |
24951
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
58 Some sequences of bytes do not form meaningful multibyte characters: |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
59 for example, a single isolated byte in the range 128 through 255 is |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
60 never meaningful. Such byte sequences are not entirely valid, and never |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
61 appear in proper multibyte text (since that consists of a sequence of |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
62 @emph{characters}); but they can appear as part of ``raw bytes'' |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
63 (@pxref{Explicit Encoding}). |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
64 |
21006 | 65 In a buffer, the buffer-local value of the variable |
66 @code{enable-multibyte-characters} specifies the representation used. | |
24952 | 67 The representation for a string is determined and recorded in the string |
68 when the string is constructed. | |
21006 | 69 |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
70 @defvar enable-multibyte-characters |
21006 | 71 @tindex enable-multibyte-characters |
72 This variable specifies the current buffer's text representation. | |
73 If it is non-@code{nil}, the buffer contains multibyte text; otherwise, | |
74 it contains unibyte text. | |
75 | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
76 You cannot set this variable directly; instead, use the function |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
77 @code{set-buffer-multibyte} to change a buffer's representation. |
21006 | 78 @end defvar |
79 | |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
80 @defvar default-enable-multibyte-characters |
21006 | 81 @tindex default-enable-multibyte-characters |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
82 This variable's value is entirely equivalent to @code{(default-value |
21006 | 83 'enable-multibyte-characters)}, and setting this variable changes that |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
84 default value. Setting the local binding of |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
85 @code{enable-multibyte-characters} in a specific buffer is not allowed, |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
86 but changing the default value is supported, and it is a reasonable |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
87 thing to do, because it has no effect on existing buffers. |
21006 | 88 |
89 The @samp{--unibyte} command line option does its job by setting the | |
90 default value to @code{nil} early in startup. | |
91 @end defvar | |
92 | |
24951
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
93 @defun position-bytes position |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
94 @tindex position-bytes |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
95 Return the byte-position corresponding to buffer position @var{position} |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
96 in the current buffer. |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
97 @end defun |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
98 |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
99 @defun byte-to-position byte-position |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
100 @tindex byte-to-position |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
101 Return the buffer position corresponding to byte-position |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
102 @var{byte-position} in the current buffer. |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
103 @end defun |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
104 |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
105 @defun multibyte-string-p string |
21006 | 106 @tindex multibyte-string-p |
24951
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
107 Return @code{t} if @var{string} is a multibyte string. |
21006 | 108 @end defun |
109 | |
110 @node Converting Representations | |
111 @section Converting Text Representations | |
112 | |
113 Emacs can convert unibyte text to multibyte; it can also convert | |
114 multibyte text to unibyte, though this conversion loses information. In | |
115 general these conversions happen when inserting text into a buffer, or | |
116 when putting text from several strings together in one string. You can | |
117 also explicitly convert a string's contents to either representation. | |
118 | |
119 Emacs chooses the representation for a string based on the text that | |
120 it is constructed from. The general rule is to convert unibyte text to | |
121 multibyte text when combining it with other multibyte text, because the | |
122 multibyte representation is more general and can hold whatever | |
123 characters the unibyte text has. | |
124 | |
125 When inserting text into a buffer, Emacs converts the text to the | |
126 buffer's representation, as specified by | |
127 @code{enable-multibyte-characters} in that buffer. In particular, when | |
128 you insert multibyte text into a unibyte buffer, Emacs converts the text | |
129 to unibyte, even though this conversion cannot in general preserve all | |
130 the characters that might be in the multibyte text. The other natural | |
131 alternative, to convert the buffer contents to multibyte, is not | |
132 acceptable because the buffer's representation is a choice made by the | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
133 user that cannot be overridden automatically. |
21006 | 134 |
135 Converting unibyte text to multibyte text leaves @sc{ASCII} characters | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
136 unchanged, and likewise 128 through 159. It converts the non-@sc{ASCII} |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
137 codes 160 through 255 by adding the value @code{nonascii-insert-offset} |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
138 to each character code. By setting this variable, you specify which |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
139 character set the unibyte characters correspond to (@pxref{Character |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
140 Sets}). For example, if @code{nonascii-insert-offset} is 2048, which is |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
141 @code{(- (make-char 'latin-iso8859-1) 128)}, then the unibyte |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
142 non-@sc{ASCII} characters correspond to Latin 1. If it is 2688, which |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
143 is @code{(- (make-char 'greek-iso8859-7) 128)}, then they correspond to |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
144 Greek letters. |
21006 | 145 |
146 Converting multibyte text to unibyte is simpler: it performs | |
147 logical-and of each character code with 255. If | |
148 @code{nonascii-insert-offset} has a reasonable value, corresponding to | |
149 the beginning of some character set, this conversion is the inverse of | |
150 the other: converting unibyte text to multibyte and back to unibyte | |
151 reproduces the original unibyte text. | |
152 | |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
153 @defvar nonascii-insert-offset |
21006 | 154 @tindex nonascii-insert-offset |
155 This variable specifies the amount to add to a non-@sc{ASCII} character | |
156 when converting unibyte text to multibyte. It also applies when | |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
157 @code{self-insert-command} inserts a character in the unibyte |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
158 non-@sc{ASCII} range, 128 through 255. However, the function |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
159 @code{insert-char} does not perform this conversion. |
21006 | 160 |
161 The right value to use to select character set @var{cs} is @code{(- | |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
162 (make-char @var{cs}) 128)}. If the value of |
21006 | 163 @code{nonascii-insert-offset} is zero, then conversion actually uses the |
164 value for the Latin 1 character set, rather than zero. | |
165 @end defvar | |
166 | |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
167 @defvar nonascii-translation-table |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
168 @tindex nonascii-translation-table |
21006 | 169 This variable provides a more general alternative to |
170 @code{nonascii-insert-offset}. You can use it to specify independently | |
171 how to translate each code in the range of 128 through 255 into a | |
172 multibyte character. The value should be a vector, or @code{nil}. | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
173 If this is non-@code{nil}, it overrides @code{nonascii-insert-offset}. |
21006 | 174 @end defvar |
175 | |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
176 @defun string-make-unibyte string |
21006 | 177 @tindex string-make-unibyte |
178 This function converts the text of @var{string} to unibyte | |
22252
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
179 representation, if it isn't already, and returns the result. If |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
180 @var{string} is a unibyte string, it is returned unchanged. |
21006 | 181 @end defun |
182 | |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
183 @defun string-make-multibyte string |
21006 | 184 @tindex string-make-multibyte |
185 This function converts the text of @var{string} to multibyte | |
22252
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
186 representation, if it isn't already, and returns the result. If |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
187 @var{string} is a multibyte string, it is returned unchanged. |
21006 | 188 @end defun |
189 | |
190 @node Selecting a Representation | |
191 @section Selecting a Representation | |
192 | |
193 Sometimes it is useful to examine an existing buffer or string as | |
194 multibyte when it was unibyte, or vice versa. | |
195 | |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
196 @defun set-buffer-multibyte multibyte |
21006 | 197 @tindex set-buffer-multibyte |
198 Set the representation type of the current buffer. If @var{multibyte} | |
199 is non-@code{nil}, the buffer becomes multibyte. If @var{multibyte} | |
200 is @code{nil}, the buffer becomes unibyte. | |
201 | |
202 This function leaves the buffer contents unchanged when viewed as a | |
203 sequence of bytes. As a consequence, it can change the contents viewed | |
204 as characters; a sequence of two bytes which is treated as one character | |
205 in multibyte representation will count as two characters in unibyte | |
206 representation. | |
207 | |
208 This function sets @code{enable-multibyte-characters} to record which | |
209 representation is in use. It also adjusts various data in the buffer | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
210 (including overlays, text properties and markers) so that they cover the |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
211 same text as they did before. |
24951
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
212 |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
213 You cannot use @code{set-buffer-multibyte} on an indirect buffer, |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
214 because indirect buffers always inherit the representation of the |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
215 base buffer. |
21006 | 216 @end defun |
217 | |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
218 @defun string-as-unibyte string |
21006 | 219 @tindex string-as-unibyte |
220 This function returns a string with the same bytes as @var{string} but | |
221 treating each byte as a character. This means that the value may have | |
222 more characters than @var{string} has. | |
223 | |
24951
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
224 If @var{string} is already a unibyte string, then the value is |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
225 @var{string} itself. |
21006 | 226 @end defun |
227 | |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
228 @defun string-as-multibyte string |
21006 | 229 @tindex string-as-multibyte |
230 This function returns a string with the same bytes as @var{string} but | |
231 treating each multibyte sequence as one character. This means that the | |
232 value may have fewer characters than @var{string} has. | |
233 | |
24951
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
234 If @var{string} is already a multibyte string, then the value is |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
235 @var{string} itself. |
21006 | 236 @end defun |
237 | |
238 @node Character Codes | |
239 @section Character Codes | |
240 @cindex character codes | |
241 | |
242 The unibyte and multibyte text representations use different character | |
243 codes. The valid character codes for unibyte representation range from | |
244 0 to 255---the values that can fit in one byte. The valid character | |
245 codes for multibyte representation range from 0 to 524287, but not all | |
246 values in that range are valid. In particular, the values 128 through | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
247 255 are not legitimate in multibyte text (though they can occur in ``raw |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
248 bytes''; @pxref{Explicit Encoding}). Only the @sc{ASCII} codes 0 |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
249 through 127 are fully legitimate in both representations. |
21006 | 250 |
251 @defun char-valid-p charcode | |
252 This returns @code{t} if @var{charcode} is valid for either one of the two | |
253 text representations. | |
254 | |
255 @example | |
256 (char-valid-p 65) | |
257 @result{} t | |
258 (char-valid-p 256) | |
259 @result{} nil | |
260 (char-valid-p 2248) | |
261 @result{} t | |
262 @end example | |
263 @end defun | |
264 | |
265 @node Character Sets | |
266 @section Character Sets | |
267 @cindex character sets | |
268 | |
269 Emacs classifies characters into various @dfn{character sets}, each of | |
270 which has a name which is a symbol. Each character belongs to one and | |
271 only one character set. | |
272 | |
273 In general, there is one character set for each distinct script. For | |
274 example, @code{latin-iso8859-1} is one character set, | |
275 @code{greek-iso8859-7} is another, and @code{ascii} is another. An | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
276 Emacs character set can hold at most 9025 characters; therefore, in some |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
277 cases, characters that would logically be grouped together are split |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
278 into several character sets. For example, one set of Chinese |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
279 characters, generally known as Big 5, is divided into two Emacs |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
280 character sets, @code{chinese-big5-1} and @code{chinese-big5-2}. |
21006 | 281 |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
282 @defun charsetp object |
21006 | 283 @tindex charsetp |
284 Return @code{t} if @var{object} is a character set name symbol, | |
285 @code{nil} otherwise. | |
286 @end defun | |
287 | |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
288 @defun charset-list |
21006 | 289 @tindex charset-list |
290 This function returns a list of all defined character set names. | |
291 @end defun | |
292 | |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
293 @defun char-charset character |
21006 | 294 @tindex char-charset |
24951
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
295 This function returns the name of the character set that @var{character} |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
296 belongs to. |
21006 | 297 @end defun |
298 | |
299 @node Chars and Bytes | |
300 @section Characters and Bytes | |
301 @cindex bytes and characters | |
302 | |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
303 @cindex introduction sequence |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
304 @cindex dimension (of character set) |
21006 | 305 In multibyte representation, each character occupies one or more |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
306 bytes. Each character set has an @dfn{introduction sequence}, which is |
22252
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
307 normally one or two bytes long. (Exception: the @sc{ASCII} character |
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
308 set has a zero-length introduction sequence.) The introduction sequence |
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
309 is the beginning of the byte sequence for any character in the character |
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
310 set. The rest of the character's bytes distinguish it from the other |
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
311 characters in the same character set. Depending on the character set, |
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
312 there are either one or two distinguishing bytes; the number of such |
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
313 bytes is called the @dfn{dimension} of the character set. |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
314 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
315 @defun charset-dimension charset |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
316 @tindex charset-dimension |
24951
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
317 This function returns the dimension of @var{charset}; at present, the |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
318 dimension is always 1 or 2. |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
319 @end defun |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
320 |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
321 @defun charset-bytes charset |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
322 @tindex charset-bytes |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
323 This function returns the number of bytes used to represent a character |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
324 in character set @var{charset}. |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
325 @end defun |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
326 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
327 This is the simplest way to determine the byte length of a character |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
328 set's introduction sequence: |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
329 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
330 @example |
24951
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
331 (- (charset-bytes @var{charset}) |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
332 (charset-dimension @var{charset})) |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
333 @end example |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
334 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
335 @node Splitting Characters |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
336 @section Splitting Characters |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
337 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
338 The functions in this section convert between characters and the byte |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
339 values used to represent them. For most purposes, there is no need to |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
340 be concerned with the sequence of bytes used to represent a character, |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
341 because Emacs translates automatically when necessary. |
21006 | 342 |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
343 @defun split-char character |
21006 | 344 @tindex split-char |
345 Return a list containing the name of the character set of | |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
346 @var{character}, followed by one or two byte values (integers) which |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
347 identify @var{character} within that character set. The number of byte |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
348 values is the character set's dimension. |
21006 | 349 |
350 @example | |
351 (split-char 2248) | |
352 @result{} (latin-iso8859-1 72) | |
353 (split-char 65) | |
354 @result{} (ascii 65) | |
355 @end example | |
356 | |
357 Unibyte non-@sc{ASCII} characters are considered as part of | |
358 the @code{ascii} character set: | |
359 | |
360 @example | |
361 (split-char 192) | |
362 @result{} (ascii 192) | |
363 @end example | |
364 @end defun | |
365 | |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
366 @defun make-char charset &rest byte-values |
21006 | 367 @tindex make-char |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
368 This function returns the character in character set @var{charset} |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
369 identified by @var{byte-values}. This is roughly the inverse of |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
370 @code{split-char}. Normally, you should specify either one or two |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
371 @var{byte-values}, according to the dimension of @var{charset}. For |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
372 example, |
21006 | 373 |
374 @example | |
375 (make-char 'latin-iso8859-1 72) | |
376 @result{} 2248 | |
377 @end example | |
378 @end defun | |
379 | |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
380 @cindex generic characters |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
381 If you call @code{make-char} with no @var{byte-values}, the result is |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
382 a @dfn{generic character} which stands for @var{charset}. A generic |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
383 character is an integer, but it is @emph{not} valid for insertion in the |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
384 buffer as a character. It can be used in @code{char-table-range} to |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
385 refer to the whole character set (@pxref{Char-Tables}). |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
386 @code{char-valid-p} returns @code{nil} for generic characters. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
387 For example: |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
388 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
389 @example |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
390 (make-char 'latin-iso8859-1) |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
391 @result{} 2176 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
392 (char-valid-p 2176) |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
393 @result{} nil |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
394 (split-char 2176) |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
395 @result{} (latin-iso8859-1 0) |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
396 @end example |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
397 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
398 @node Scanning Charsets |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
399 @section Scanning for Character Sets |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
400 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
401 Sometimes it is useful to find out which character sets appear in a |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
402 part of a buffer or a string. One use for this is in determining which |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
403 coding systems (@pxref{Coding Systems}) are capable of representing all |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
404 of the text in question. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
405 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
406 @defun find-charset-region beg end &optional translation |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
407 @tindex find-charset-region |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
408 This function returns a list of the character sets that appear in the |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
409 current buffer between positions @var{beg} and @var{end}. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
410 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
411 The optional argument @var{translation} specifies a translation table to |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
412 be used in scanning the text (@pxref{Translation of Characters}). If it |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
413 is non-@code{nil}, then each character in the region is translated |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
414 through this table, and the value returned describes the translated |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
415 characters instead of the characters actually in the buffer. |
24951
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
416 |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
417 In two peculiar cases, the value includes the symbol @code{unknown}: |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
418 |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
419 @itemize @bullet |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
420 @item |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
421 When a unibyte buffer contains non-@sc{ASCII} characters. |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
422 |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
423 @item |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
424 When a multibyte buffer contains invalid byte-sequences (raw bytes). |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
425 @xref{Explicit Encoding}. |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
426 @end itemize |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
427 @end defun |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
428 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
429 @defun find-charset-string string &optional translation |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
430 @tindex find-charset-string |
24951
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
431 This function returns a list of the character sets that appear in the |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
432 string @var{string}. It is just like @code{find-charset-region}, except |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
433 that it applies to the contents of @var{string} instead of part of the |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
434 current buffer. |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
435 @end defun |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
436 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
437 @node Translation of Characters |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
438 @section Translation of Characters |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
439 @cindex character translation tables |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
440 @cindex translation tables |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
441 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
442 A @dfn{translation table} specifies a mapping of characters |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
443 into characters. These tables are used in encoding and decoding, and |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
444 for other purposes. Some coding systems specify their own particular |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
445 translation tables; there are also default translation tables which |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
446 apply to all other coding systems. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
447 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
448 @defun make-translation-table translations |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
449 This function returns a translation table based on the arguments |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
450 @var{translations}. Each argument---each element of |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
451 @var{translations}---should be a list of the form @code{(@var{from} |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
452 . @var{to})}; this says to translate the character @var{from} into |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
453 @var{to}. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
454 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
455 You can also map one whole character set into another character set with |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
456 the same dimension. To do this, you specify a generic character (which |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
457 designates a character set) for @var{from} (@pxref{Splitting Characters}). |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
458 In this case, @var{to} should also be a generic character, for another |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
459 character set of the same dimension. Then the translation table |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
460 translates each character of @var{from}'s character set into the |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
461 corresponding character of @var{to}'s character set. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
462 @end defun |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
463 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
464 In decoding, the translation table's translations are applied to the |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
465 characters that result from ordinary decoding. If a coding system has |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
466 property @code{character-translation-table-for-decode}, that specifies |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
467 the translation table to use. Otherwise, if |
23433
a53274056f20
Fix names of standard-translation-table-for-decode(encode).
Richard M. Stallman <rms@gnu.org>
parents:
23110
diff
changeset
|
468 @code{standard-translation-table-for-decode} is non-@code{nil}, decoding |
a53274056f20
Fix names of standard-translation-table-for-decode(encode).
Richard M. Stallman <rms@gnu.org>
parents:
23110
diff
changeset
|
469 uses that table. |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
470 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
471 In encoding, the translation table's translations are applied to the |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
472 characters in the buffer, and the result of translation is actually |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
473 encoded. If a coding system has property |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
474 @code{character-translation-table-for-encode}, that specifies the |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
475 translation table to use. Otherwise the variable |
23433
a53274056f20
Fix names of standard-translation-table-for-decode(encode).
Richard M. Stallman <rms@gnu.org>
parents:
23110
diff
changeset
|
476 @code{standard-translation-table-for-encode} specifies the translation |
a53274056f20
Fix names of standard-translation-table-for-decode(encode).
Richard M. Stallman <rms@gnu.org>
parents:
23110
diff
changeset
|
477 table. |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
478 |
23433
a53274056f20
Fix names of standard-translation-table-for-decode(encode).
Richard M. Stallman <rms@gnu.org>
parents:
23110
diff
changeset
|
479 @defvar standard-translation-table-for-decode |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
480 This is the default translation table for decoding, for |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
481 coding systems that don't specify any other translation table. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
482 @end defvar |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
483 |
23433
a53274056f20
Fix names of standard-translation-table-for-decode(encode).
Richard M. Stallman <rms@gnu.org>
parents:
23110
diff
changeset
|
484 @defvar standard-translation-table-for-encode |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
485 This is the default translation table for encoding, for |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
486 coding systems that don't specify any other translation table. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
487 @end defvar |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
488 |
21006 | 489 @node Coding Systems |
490 @section Coding Systems | |
491 | |
492 @cindex coding system | |
493 When Emacs reads or writes a file, and when Emacs sends text to a | |
494 subprocess or receives text from a subprocess, it normally performs | |
495 character code conversion and end-of-line conversion as specified | |
496 by a particular @dfn{coding system}. | |
497 | |
24951
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
498 How to define a coding system is an arcane matter, not yet documented. |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
499 |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
500 @menu |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
501 * Coding System Basics:: |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
502 * Encoding and I/O:: |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
503 * Lisp and Coding Systems:: |
22252
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
504 * User-Chosen Coding Systems:: |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
505 * Default Coding Systems:: |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
506 * Specifying Coding Systems:: |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
507 * Explicit Encoding:: |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
508 * Terminal I/O Encoding:: |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
509 * MS-DOS File Types:: |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
510 @end menu |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
511 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
512 @node Coding System Basics |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
513 @subsection Basic Concepts of Coding Systems |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
514 |
21006 | 515 @cindex character code conversion |
516 @dfn{Character code conversion} involves conversion between the encoding | |
517 used inside Emacs and some other encoding. Emacs supports many | |
518 different encodings, in that it can convert to and from them. For | |
519 example, it can convert text to or from encodings such as Latin 1, Latin | |
520 2, Latin 3, Latin 4, Latin 5, and several variants of ISO 2022. In some | |
521 cases, Emacs supports several alternative encodings for the same | |
522 characters; for example, there are three coding systems for the Cyrillic | |
523 (Russian) alphabet: ISO, Alternativnyj, and KOI8. | |
524 | |
525 Most coding systems specify a particular character code for | |
526 conversion, but some of them leave this unspecified---to be chosen | |
527 heuristically based on the data. | |
528 | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
529 @cindex end of line conversion |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
530 @dfn{End of line conversion} handles three different conventions used |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
531 on various systems for representing end of line in files. The Unix |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
532 convention is to use the linefeed character (also called newline). The |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
533 DOS convention is to use the two character sequence, carriage-return |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
534 linefeed, at the end of a line. The Mac convention is to use just |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
535 carriage-return. |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
536 |
21006 | 537 @cindex base coding system |
538 @cindex variant coding system | |
539 @dfn{Base coding systems} such as @code{latin-1} leave the end-of-line | |
540 conversion unspecified, to be chosen based on the data. @dfn{Variant | |
541 coding systems} such as @code{latin-1-unix}, @code{latin-1-dos} and | |
542 @code{latin-1-mac} specify the end-of-line conversion explicitly as | |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
543 well. Most base coding systems have three corresponding variants whose |
21006 | 544 names are formed by adding @samp{-unix}, @samp{-dos} and @samp{-mac}. |
545 | |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
546 The coding system @code{raw-text} is special in that it prevents |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
547 character code conversion, and causes the buffer visited with that |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
548 coding system to be a unibyte buffer. It does not specify the |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
549 end-of-line conversion, allowing that to be determined as usual by the |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
550 data, and has the usual three variants which specify the end-of-line |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
551 conversion. @code{no-conversion} is equivalent to @code{raw-text-unix}: |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
552 it specifies no conversion of either character codes or end-of-line. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
553 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
554 The coding system @code{emacs-mule} specifies that the data is |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
555 represented in the internal Emacs encoding. This is like |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
556 @code{raw-text} in that no code conversion happens, but different in |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
557 that the result is multibyte data. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
558 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
559 @defun coding-system-get coding-system property |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
560 @tindex coding-system-get |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
561 This function returns the specified property of the coding system |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
562 @var{coding-system}. Most coding system properties exist for internal |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
563 purposes, but one that you might find useful is @code{mime-charset}. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
564 That property's value is the name used in MIME for the character coding |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
565 which this coding system can read and write. Examples: |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
566 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
567 @example |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
568 (coding-system-get 'iso-latin-1 'mime-charset) |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
569 @result{} iso-8859-1 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
570 (coding-system-get 'iso-2022-cn 'mime-charset) |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
571 @result{} iso-2022-cn |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
572 (coding-system-get 'cyrillic-koi8 'mime-charset) |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
573 @result{} koi8-r |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
574 @end example |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
575 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
576 The value of the @code{mime-charset} property is also defined |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
577 as an alias for the coding system. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
578 @end defun |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
579 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
580 @node Encoding and I/O |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
581 @subsection Encoding and I/O |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
582 |
22252
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
583 The principal purpose of coding systems is for use in reading and |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
584 writing files. The function @code{insert-file-contents} uses |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
585 a coding system for decoding the file data, and @code{write-region} |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
586 uses one to encode the buffer contents. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
587 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
588 You can specify the coding system to use either explicitly |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
589 (@pxref{Specifying Coding Systems}), or implicitly using the defaulting |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
590 mechanism (@pxref{Default Coding Systems}). But these methods may not |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
591 completely specify what to do. For example, they may choose a coding |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
592 system such as @code{undefined} which leaves the character code |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
593 conversion to be determined from the data. In these cases, the I/O |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
594 operation finishes the job of choosing a coding system. Very often |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
595 you will want to find out afterwards which coding system was chosen. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
596 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
597 @defvar buffer-file-coding-system |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
598 @tindex buffer-file-coding-system |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
599 This variable records the coding system that was used for visiting the |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
600 current buffer. It is used for saving the buffer, and for writing part |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
601 of the buffer with @code{write-region}. When those operations ask the |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
602 user to specify a different coding system, |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
603 @code{buffer-file-coding-system} is updated to the coding system |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
604 specified. |
24951
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
605 |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
606 However, @code{buffer-file-coding-system} does not affect sending text |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
607 to a subprocess. |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
608 @end defvar |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
609 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
610 @defvar save-buffer-coding-system |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
611 @tindex save-buffer-coding-system |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
612 This variable specifies the coding system for saving the buffer---but it |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
613 is not used for @code{write-region}. When saving the buffer asks the |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
614 user to specify a different coding system, and |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
615 @code{save-buffer-coding-system} was used, then it is updated to the |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
616 coding system that was specified. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
617 @end defvar |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
618 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
619 @defvar last-coding-system-used |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
620 @tindex last-coding-system-used |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
621 I/O operations for files and subprocesses set this variable to the |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
622 coding system name that was used. The explicit encoding and decoding |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
623 functions (@pxref{Explicit Encoding}) set it too. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
624 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
625 @strong{Warning:} Since receiving subprocess output sets this variable, |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
626 it can change whenever Emacs waits; therefore, you should use copy the |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
627 value shortly after the function call which stores the value you are |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
628 interested in. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
629 @end defvar |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
630 |
23110
0d84817a4973
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22267
diff
changeset
|
631 The variable @code{selection-coding-system} specifies how to encode |
0d84817a4973
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22267
diff
changeset
|
632 selections for the window system. @xref{Window System Selections}. |
0d84817a4973
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22267
diff
changeset
|
633 |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
634 @node Lisp and Coding Systems |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
635 @subsection Coding Systems in Lisp |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
636 |
21006 | 637 Here are Lisp facilities for working with coding systems; |
638 | |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
639 @defun coding-system-list &optional base-only |
21006 | 640 @tindex coding-system-list |
641 This function returns a list of all coding system names (symbols). If | |
642 @var{base-only} is non-@code{nil}, the value includes only the | |
643 base coding systems. Otherwise, it includes variant coding systems as well. | |
644 @end defun | |
645 | |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
646 @defun coding-system-p object |
21006 | 647 @tindex coding-system-p |
648 This function returns @code{t} if @var{object} is a coding system | |
649 name. | |
650 @end defun | |
651 | |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
652 @defun check-coding-system coding-system |
21006 | 653 @tindex check-coding-system |
654 This function checks the validity of @var{coding-system}. | |
655 If that is valid, it returns @var{coding-system}. | |
656 Otherwise it signals an error with condition @code{coding-system-error}. | |
657 @end defun | |
658 | |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
659 @defun coding-system-change-eol-conversion coding-system eol-type |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
660 @tindex coding-system-change-eol-conversion |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
661 This function returns a coding system which is like @var{coding-system} |
22252
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
662 except for its eol conversion, which is specified by @code{eol-type}. |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
663 @var{eol-type} should be @code{unix}, @code{dos}, @code{mac}, or |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
664 @code{nil}. If it is @code{nil}, the returned coding system determines |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
665 the end-of-line conversion from the data. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
666 @end defun |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
667 |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
668 @defun coding-system-change-text-conversion eol-coding text-coding |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
669 @tindex coding-system-change-text-conversion |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
670 This function returns a coding system which uses the end-of-line |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
671 conversion of @var{eol-coding}, and the text conversion of |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
672 @var{text-coding}. If @var{text-coding} is @code{nil}, it returns |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
673 @code{undecided}, or one of its variants according to @var{eol-coding}. |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
674 @end defun |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
675 |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
676 @defun find-coding-systems-region from to |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
677 @tindex find-coding-systems-region |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
678 This function returns a list of coding systems that could be used to |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
679 encode a text between @var{from} and @var{to}. All coding systems in |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
680 the list can safely encode any multibyte characters in that portion of |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
681 the text. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
682 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
683 If the text contains no multibyte characters, the function returns the |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
684 list @code{(undecided)}. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
685 @end defun |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
686 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
687 @defun find-coding-systems-string string |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
688 @tindex find-coding-systems-string |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
689 This function returns a list of coding systems that could be used to |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
690 encode the text of @var{string}. All coding systems in the list can |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
691 safely encode any multibyte characters in @var{string}. If the text |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
692 contains no multibyte characters, this returns the list |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
693 @code{(undecided)}. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
694 @end defun |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
695 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
696 @defun find-coding-systems-for-charsets charsets |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
697 @tindex find-coding-systems-for-charsets |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
698 This function returns a list of coding systems that could be used to |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
699 encode all the character sets in the list @var{charsets}. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
700 @end defun |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
701 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
702 @defun detect-coding-region start end &optional highest |
21006 | 703 @tindex detect-coding-region |
704 This function chooses a plausible coding system for decoding the text | |
705 from @var{start} to @var{end}. This text should be ``raw bytes'' | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
706 (@pxref{Explicit Encoding}). |
21006 | 707 |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
708 Normally this function returns a list of coding systems that could |
21006 | 709 handle decoding the text that was scanned. They are listed in order of |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
710 decreasing priority. But if @var{highest} is non-@code{nil}, then the |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
711 return value is just one coding system, the one that is highest in |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
712 priority. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
713 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
714 If the region contains only @sc{ASCII} characters, the value |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
715 is @code{undecided} or @code{(undecided)}. |
21006 | 716 @end defun |
717 | |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
718 @defun detect-coding-string string highest |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
719 @tindex detect-coding-string |
21006 | 720 This function is like @code{detect-coding-region} except that it |
721 operates on the contents of @var{string} instead of bytes in the buffer. | |
722 @end defun | |
723 | |
22252
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
724 @xref{Process Information}, for how to examine or set the coding |
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
725 systems used for I/O to a subprocess. |
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
726 |
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
727 @node User-Chosen Coding Systems |
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
728 @subsection User-Chosen Coding Systems |
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
729 |
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
730 @tindex select-safe-coding-system |
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
731 @defun select-safe-coding-system from to &optional preferred-coding-system |
22267
dfac7398266b
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22252
diff
changeset
|
732 This function selects a coding system for encoding the text between |
22252
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
733 @var{from} and @var{to}, asking the user to choose if necessary. |
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
734 |
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
735 The optional argument @var{preferred-coding-system} specifies a coding |
22267
dfac7398266b
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22252
diff
changeset
|
736 system to try first. If that one can handle the text in the specified |
dfac7398266b
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22252
diff
changeset
|
737 region, then it is used. If this argument is omitted, the current |
dfac7398266b
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22252
diff
changeset
|
738 buffer's value of @code{buffer-file-coding-system} is tried first. |
22252
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
739 |
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
740 If the region contains some multibyte characters that the preferred |
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
741 coding system cannot encode, this function asks the user to choose from |
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
742 a list of coding systems which can encode the text, and returns the |
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
743 user's choice. |
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
744 |
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
745 One other kludgy feature: if @var{from} is a string, the string is the |
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
746 target text, and @var{to} is ignored. |
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
747 @end defun |
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
748 |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
749 Here are two functions you can use to let the user specify a coding |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
750 system, with completion. @xref{Completion}. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
751 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
752 @defun read-coding-system prompt &optional default |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
753 @tindex read-coding-system |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
754 This function reads a coding system using the minibuffer, prompting with |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
755 string @var{prompt}, and returns the coding system name as a symbol. If |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
756 the user enters null input, @var{default} specifies which coding system |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
757 to return. It should be a symbol or a string. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
758 @end defun |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
759 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
760 @defun read-non-nil-coding-system prompt |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
761 @tindex read-non-nil-coding-system |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
762 This function reads a coding system using the minibuffer, prompting with |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
763 string @var{prompt}, and returns the coding system name as a symbol. If |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
764 the user tries to enter null input, it asks the user to try again. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
765 @xref{Coding Systems}. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
766 @end defun |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
767 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
768 @node Default Coding Systems |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
769 @subsection Default Coding Systems |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
770 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
771 This section describes variables that specify the default coding |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
772 system for certain files or when running certain subprograms, and the |
22252
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
773 function that I/O operations use to access them. |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
774 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
775 The idea of these variables is that you set them once and for all to the |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
776 defaults you want, and then do not change them again. To specify a |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
777 particular coding system for a particular operation in a Lisp program, |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
778 don't change these variables; instead, override them using |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
779 @code{coding-system-for-read} and @code{coding-system-for-write} |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
780 (@pxref{Specifying Coding Systems}). |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
781 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
782 @defvar file-coding-system-alist |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
783 @tindex file-coding-system-alist |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
784 This variable is an alist that specifies the coding systems to use for |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
785 reading and writing particular files. Each element has the form |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
786 @code{(@var{pattern} . @var{coding})}, where @var{pattern} is a regular |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
787 expression that matches certain file names. The element applies to file |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
788 names that match @var{pattern}. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
789 |
22252
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
790 The @sc{cdr} of the element, @var{coding}, should be either a coding |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
791 system, a cons cell containing two coding systems, or a function symbol. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
792 If @var{val} is a coding system, that coding system is used for both |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
793 reading the file and writing it. If @var{val} is a cons cell containing |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
794 two coding systems, its @sc{car} specifies the coding system for |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
795 decoding, and its @sc{cdr} specifies the coding system for encoding. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
796 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
797 If @var{val} is a function symbol, the function must return a coding |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
798 system or a cons cell containing two coding systems. This value is used |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
799 as described above. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
800 @end defvar |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
801 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
802 @defvar process-coding-system-alist |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
803 @tindex process-coding-system-alist |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
804 This variable is an alist specifying which coding systems to use for a |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
805 subprocess, depending on which program is running in the subprocess. It |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
806 works like @code{file-coding-system-alist}, except that @var{pattern} is |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
807 matched against the program name used to start the subprocess. The coding |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
808 system or systems specified in this alist are used to initialize the |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
809 coding systems used for I/O to the subprocess, but you can specify |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
810 other coding systems later using @code{set-process-coding-system}. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
811 @end defvar |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
812 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
813 @strong{Warning:} Coding systems such as @code{undecided} which |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
814 determine the coding system from the data do not work entirely reliably |
22252
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
815 with asynchronous subprocess output. This is because Emacs handles |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
816 asynchronous subprocess output in batches, as it arrives. If the coding |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
817 system leaves the character code conversion unspecified, or leaves the |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
818 end-of-line conversion unspecified, Emacs must try to detect the proper |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
819 conversion from one batch at a time, and this does not always work. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
820 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
821 Therefore, with an asynchronous subprocess, if at all possible, use a |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
822 coding system which determines both the character code conversion and |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
823 the end of line conversion---that is, one like @code{latin-1-unix}, |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
824 rather than @code{undecided} or @code{latin-1}. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
825 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
826 @defvar network-coding-system-alist |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
827 @tindex network-coding-system-alist |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
828 This variable is an alist that specifies the coding system to use for |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
829 network streams. It works much like @code{file-coding-system-alist}, |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
830 with the difference that the @var{pattern} in an element may be either a |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
831 port number or a regular expression. If it is a regular expression, it |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
832 is matched against the network service name used to open the network |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
833 stream. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
834 @end defvar |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
835 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
836 @defvar default-process-coding-system |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
837 @tindex default-process-coding-system |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
838 This variable specifies the coding systems to use for subprocess (and |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
839 network stream) input and output, when nothing else specifies what to |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
840 do. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
841 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
842 The value should be a cons cell of the form @code{(@var{input-coding} |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
843 . @var{output-coding})}. Here @var{input-coding} applies to input from |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
844 the subprocess, and @var{output-coding} applies to output to it. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
845 @end defvar |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
846 |
21006 | 847 @defun find-operation-coding-system operation &rest arguments |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
848 @tindex find-operation-coding-system |
21006 | 849 This function returns the coding system to use (by default) for |
850 performing @var{operation} with @var{arguments}. The value has this | |
851 form: | |
852 | |
853 @example | |
854 (@var{decoding-system} @var{encoding-system}) | |
855 @end example | |
856 | |
857 The first element, @var{decoding-system}, is the coding system to use | |
858 for decoding (in case @var{operation} does decoding), and | |
859 @var{encoding-system} is the coding system for encoding (in case | |
860 @var{operation} does encoding). | |
861 | |
862 The argument @var{operation} should be an Emacs I/O primitive: | |
863 @code{insert-file-contents}, @code{write-region}, @code{call-process}, | |
864 @code{call-process-region}, @code{start-process}, or | |
865 @code{open-network-stream}. | |
866 | |
867 The remaining arguments should be the same arguments that might be given | |
868 to that I/O primitive. Depending on which primitive, one of those | |
869 arguments is selected as the @dfn{target}. For example, if | |
870 @var{operation} does file I/O, whichever argument specifies the file | |
871 name is the target. For subprocess primitives, the process name is the | |
872 target. For @code{open-network-stream}, the target is the service name | |
873 or port number. | |
874 | |
875 This function looks up the target in @code{file-coding-system-alist}, | |
876 @code{process-coding-system-alist}, or | |
877 @code{network-coding-system-alist}, depending on @var{operation}. | |
878 @xref{Default Coding Systems}. | |
879 @end defun | |
880 | |
881 @node Specifying Coding Systems | |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
882 @subsection Specifying a Coding System for One Operation |
21006 | 883 |
884 You can specify the coding system for a specific operation by binding | |
885 the variables @code{coding-system-for-read} and/or | |
886 @code{coding-system-for-write}. | |
887 | |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
888 @defvar coding-system-for-read |
21006 | 889 @tindex coding-system-for-read |
890 If this variable is non-@code{nil}, it specifies the coding system to | |
891 use for reading a file, or for input from a synchronous subprocess. | |
892 | |
893 It also applies to any asynchronous subprocess or network stream, but in | |
894 a different way: the value of @code{coding-system-for-read} when you | |
895 start the subprocess or open the network stream specifies the input | |
896 decoding method for that subprocess or network stream. It remains in | |
897 use for that subprocess or network stream unless and until overridden. | |
898 | |
899 The right way to use this variable is to bind it with @code{let} for a | |
900 specific I/O operation. Its global value is normally @code{nil}, and | |
901 you should not globally set it to any other value. Here is an example | |
902 of the right way to use the variable: | |
903 | |
904 @example | |
905 ;; @r{Read the file with no character code conversion.} | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
906 ;; @r{Assume @sc{crlf} represents end-of-line.} |
21006 | 907 (let ((coding-system-for-write 'emacs-mule-dos)) |
908 (insert-file-contents filename)) | |
909 @end example | |
910 | |
911 When its value is non-@code{nil}, @code{coding-system-for-read} takes | |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
912 precedence over all other methods of specifying a coding system to use for |
21006 | 913 input, including @code{file-coding-system-alist}, |
914 @code{process-coding-system-alist} and | |
915 @code{network-coding-system-alist}. | |
916 @end defvar | |
917 | |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
918 @defvar coding-system-for-write |
21006 | 919 @tindex coding-system-for-write |
920 This works much like @code{coding-system-for-read}, except that it | |
921 applies to output rather than input. It affects writing to files, | |
24951
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
922 as well as sending output to subprocesses and net connections. |
21006 | 923 |
924 When a single operation does both input and output, as do | |
925 @code{call-process-region} and @code{start-process}, both | |
926 @code{coding-system-for-read} and @code{coding-system-for-write} | |
927 affect it. | |
928 @end defvar | |
929 | |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
930 @defvar inhibit-eol-conversion |
21006 | 931 @tindex inhibit-eol-conversion |
932 When this variable is non-@code{nil}, no end-of-line conversion is done, | |
933 no matter which coding system is specified. This applies to all the | |
934 Emacs I/O and subprocess primitives, and to the explicit encoding and | |
935 decoding functions (@pxref{Explicit Encoding}). | |
936 @end defvar | |
937 | |
938 @node Explicit Encoding | |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
939 @subsection Explicit Encoding and Decoding |
21006 | 940 @cindex encoding text |
941 @cindex decoding text | |
942 | |
943 All the operations that transfer text in and out of Emacs have the | |
944 ability to use a coding system to encode or decode the text. | |
945 You can also explicitly encode and decode text using the functions | |
946 in this section. | |
947 | |
948 @cindex raw bytes | |
949 The result of encoding, and the input to decoding, are not ordinary | |
950 text. They are ``raw bytes''---bytes that represent text in the same | |
951 way that an external file would. When a buffer contains raw bytes, it | |
952 is most natural to mark that buffer as using unibyte representation, | |
953 using @code{set-buffer-multibyte} (@pxref{Selecting a Representation}), | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
954 but this is not required. If the buffer's contents are only temporarily |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
955 raw, leave the buffer multibyte, which will be correct after you decode |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
956 them. |
21006 | 957 |
958 The usual way to get raw bytes in a buffer, for explicit decoding, is | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
959 to read them from a file with @code{insert-file-contents-literally} |
21006 | 960 (@pxref{Reading from Files}) or specify a non-@code{nil} @var{rawfile} |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
961 argument when visiting a file with @code{find-file-noselect}. |
21006 | 962 |
963 The usual way to use the raw bytes that result from explicitly | |
964 encoding text is to copy them to a file or process---for example, to | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
965 write them with @code{write-region} (@pxref{Writing to Files}), and |
21006 | 966 suppress encoding for that @code{write-region} call by binding |
967 @code{coding-system-for-write} to @code{no-conversion}. | |
968 | |
24951
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
969 Raw bytes typically contain stray individual bytes with values in the |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
970 range 128 through 255, that are legitimate only as part of multibyte |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
971 sequences. Even if the buffer is multibyte, Emacs treats each such |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
972 individual byte as a character and uses the byte value as its character |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
973 code. In this way, character codes 128 through 255 can be found in a |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
974 multibyte buffer, even though they are not legitimate multibyte |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
975 character codes. |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
976 |
22252
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
977 Raw bytes sometimes contain overlong byte-sequences that look like a |
24951
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
978 proper multibyte character plus extra superfluous trailing codes. For |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
979 most purposes, Emacs treats such a sequence in a buffer or string as a |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
980 single character, and if you look at its character code, you get the |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
981 value that corresponds to the multibyte character |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
982 sequence---disregarding the extra trailing codes. This is not quite |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
983 clean, but raw bytes are used only in limited ways, so as a practical |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
984 matter it is not worth the trouble to treat this case differently. |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
985 |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
986 When a multibyte buffer contains illegitimate byte sequences, |
24952 | 987 sometimes insertion or deletion can cause them to coalesce into a |
24951
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
988 legitimate multibyte character. For example, suppose the buffer |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
989 contains the sequence 129 68 192, 68 being the character @samp{D}. If |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
990 you delete the @samp{D}, the bytes 129 and 192 become adjacent, and thus |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
991 become one multibyte character (Latin-1 A with grave accent). Point |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
992 moves to one side or the other of the character, since it cannot be |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
993 within a character. Don't be alarmed by this. |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
994 |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
995 Some really peculiar situations prevent proper coalescence. For |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
996 example, if you narrow the buffer so that the accessible portion begins |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
997 just before the @samp{D}, then delete the @samp{D}, the two surrounding |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
998 bytes cannot coalesce because one of them is outside the accessible |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
999 portion of the buffer. In this case, the deletion cannot be done, so |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
1000 @code{delete-region} signals an error. |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
1001 |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
1002 Here are the functions to perform explicit encoding or decoding. The |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
1003 decoding functions produce ``raw bytes''; the encoding functions are |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
1004 meant to operate on ``raw bytes''. All of these functions discard text |
7451b1458af1
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
23433
diff
changeset
|
1005 properties. |
22252
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
1006 |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1007 @defun encode-coding-region start end coding-system |
21006 | 1008 @tindex encode-coding-region |
1009 This function encodes the text from @var{start} to @var{end} according | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1010 to coding system @var{coding-system}. The encoded text replaces the |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1011 original text in the buffer. The result of encoding is ``raw bytes,'' |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1012 but the buffer remains multibyte if it was multibyte before. |
21006 | 1013 @end defun |
1014 | |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1015 @defun encode-coding-string string coding-system |
21006 | 1016 @tindex encode-coding-string |
1017 This function encodes the text in @var{string} according to coding | |
1018 system @var{coding-system}. It returns a new string containing the | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1019 encoded text. The result of encoding is a unibyte string of ``raw bytes.'' |
21006 | 1020 @end defun |
1021 | |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1022 @defun decode-coding-region start end coding-system |
21006 | 1023 @tindex decode-coding-region |
1024 This function decodes the text from @var{start} to @var{end} according | |
1025 to coding system @var{coding-system}. The decoded text replaces the | |
1026 original text in the buffer. To make explicit decoding useful, the text | |
1027 before decoding ought to be ``raw bytes.'' | |
1028 @end defun | |
1029 | |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1030 @defun decode-coding-string string coding-system |
21006 | 1031 @tindex decode-coding-string |
1032 This function decodes the text in @var{string} according to coding | |
1033 system @var{coding-system}. It returns a new string containing the | |
1034 decoded text. To make explicit decoding useful, the contents of | |
1035 @var{string} ought to be ``raw bytes.'' | |
1036 @end defun | |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1037 |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1038 @node Terminal I/O Encoding |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1039 @subsection Terminal I/O Encoding |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1040 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1041 Emacs can decode keyboard input using a coding system, and encode |
23110
0d84817a4973
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22267
diff
changeset
|
1042 terminal output. This is useful for terminals that transmit or display |
0d84817a4973
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22267
diff
changeset
|
1043 text using a particular encoding such as Latin-1. Emacs does not set |
0d84817a4973
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22267
diff
changeset
|
1044 @code{last-coding-system-used} for encoding or decoding for the |
0d84817a4973
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22267
diff
changeset
|
1045 terminal. |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1046 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1047 @defun keyboard-coding-system |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1048 @tindex keyboard-coding-system |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1049 This function returns the coding system that is in use for decoding |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1050 keyboard input---or @code{nil} if no coding system is to be used. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1051 @end defun |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1052 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1053 @defun set-keyboard-coding-system coding-system |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1054 @tindex set-keyboard-coding-system |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1055 This function specifies @var{coding-system} as the coding system to |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1056 use for decoding keyboard input. If @var{coding-system} is @code{nil}, |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1057 that means do not decode keyboard input. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1058 @end defun |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1059 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1060 @defun terminal-coding-system |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1061 @tindex terminal-coding-system |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1062 This function returns the coding system that is in use for encoding |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1063 terminal output---or @code{nil} for no encoding. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1064 @end defun |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1065 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1066 @defun set-terminal-coding-system coding-system |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1067 @tindex set-terminal-coding-system |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1068 This function specifies @var{coding-system} as the coding system to use |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1069 for encoding terminal output. If @var{coding-system} is @code{nil}, |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1070 that means do not encode terminal output. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1071 @end defun |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1072 |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1073 @node MS-DOS File Types |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1074 @subsection MS-DOS File Types |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1075 @cindex DOS file types |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1076 @cindex MS-DOS file types |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1077 @cindex Windows file types |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1078 @cindex file types on MS-DOS and Windows |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1079 @cindex text files and binary files |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1080 @cindex binary files and text files |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1081 |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1082 Emacs on MS-DOS and on MS-Windows recognizes certain file names as |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1083 text files or binary files. By ``binary file'' we mean a file of |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1084 literal byte values that are not necessary meant to be characters. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1085 Emacs does no end-of-line conversion and no character code conversion |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1086 for a binary file. Meanwhile, when you create a new file which is |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1087 marked by its name as a ``text file'', Emacs uses DOS end-of-line |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1088 conversion. |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1089 |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1090 @defvar buffer-file-type |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1091 This variable, automatically buffer-local in each buffer, records the |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1092 file type of the buffer's visited file. When a buffer does not specify |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1093 a coding system with @code{buffer-file-coding-system}, this variable is |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1094 used to determine which coding system to use when writing the contents |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1095 of the buffer. It should be @code{nil} for text, @code{t} for binary. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1096 If it is @code{t}, the coding system is @code{no-conversion}. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1097 Otherwise, @code{undecided-dos} is used. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1098 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1099 Normally this variable is set by visiting a file; it is set to |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1100 @code{nil} if the file was visited without any actual conversion. |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1101 @end defvar |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1102 |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1103 @defopt file-name-buffer-file-type-alist |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1104 This variable holds an alist for recognizing text and binary files. |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1105 Each element has the form (@var{regexp} . @var{type}), where |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1106 @var{regexp} is matched against the file name, and @var{type} may be |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1107 @code{nil} for text, @code{t} for binary, or a function to call to |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1108 compute which. If it is a function, then it is called with a single |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1109 argument (the file name) and should return @code{t} or @code{nil}. |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1110 |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1111 Emacs when running on MS-DOS or MS-Windows checks this alist to decide |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1112 which coding system to use when reading a file. For a text file, |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1113 @code{undecided-dos} is used. For a binary file, @code{no-conversion} |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1114 is used. |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1115 |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1116 If no element in this alist matches a given file name, then |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1117 @code{default-buffer-file-type} says how to treat the file. |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1118 @end defopt |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1119 |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1120 @defopt default-buffer-file-type |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1121 This variable says how to handle files for which |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1122 @code{file-name-buffer-file-type-alist} says nothing about the type. |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1123 |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1124 If this variable is non-@code{nil}, then these files are treated as |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1125 binary: the coding system @code{no-conversion} is used. Otherwise, |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1126 nothing special is done for them---the coding system is deduced solely |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1127 from the file contents, in the usual Emacs fashion. |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1128 @end defopt |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1129 |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1130 @node Input Methods |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1131 @section Input Methods |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1132 @cindex input methods |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1133 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1134 @dfn{Input methods} provide convenient ways of entering non-@sc{ASCII} |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1135 characters from the keyboard. Unlike coding systems, which translate |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1136 non-@sc{ASCII} characters to and from encodings meant to be read by |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1137 programs, input methods provide human-friendly commands. (@xref{Input |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1138 Methods,,, emacs, The GNU Emacs Manual}, for information on how users |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1139 use input methods to enter text.) How to define input methods is not |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1140 yet documented in this manual, but here we describe how to use them. |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1141 |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1142 Each input method has a name, which is currently a string; |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1143 in the future, symbols may also be usable as input method names. |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1144 |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1145 @tindex current-input-method |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1146 @defvar current-input-method |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1147 This variable holds the name of the input method now active in the |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1148 current buffer. (It automatically becomes local in each buffer when set |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1149 in any fashion.) It is @code{nil} if no input method is active in the |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1150 buffer now. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1151 @end defvar |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1152 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1153 @tindex default-input-method |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1154 @defvar default-input-method |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1155 This variable holds the default input method for commands that choose an |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1156 input method. Unlike @code{current-input-method}, this variable is |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1157 normally global. |
21682
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1158 @end defvar |
90da2489c498
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21006
diff
changeset
|
1159 |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1160 @tindex set-input-method |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1161 @defun set-input-method input-method |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1162 This function activates input method @var{input-method} for the current |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1163 buffer. It also sets @code{default-input-method} to @var{input-method}. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1164 If @var{input-method} is @code{nil}, this function deactivates any input |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1165 method for the current buffer. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1166 @end defun |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1167 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1168 @tindex read-input-method-name |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1169 @defun read-input-method-name prompt &optional default inhibit-null |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1170 This function reads an input method name with the minibuffer, prompting |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1171 with @var{prompt}. If @var{default} is non-@code{nil}, that is returned |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1172 by default, if the user enters empty input. However, if |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1173 @var{inhibit-null} is non-@code{nil}, empty input signals an error. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1174 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1175 The returned value is a string. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1176 @end defun |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1177 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1178 @tindex input-method-alist |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1179 @defvar input-method-alist |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1180 This variable defines all the supported input methods. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1181 Each element defines one input method, and should have the form: |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1182 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1183 @example |
22252
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
1184 (@var{input-method} @var{language-env} @var{activate-func} |
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
1185 @var{title} @var{description} @var{args}...) |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1186 @end example |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1187 |
22252
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
1188 Here @var{input-method} is the input method name, a string; |
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
1189 @var{language-env} is another string, the name of the language |
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
1190 environment this input method is recommended for. (That serves only for |
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
1191 documentation purposes.) |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1192 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1193 @var{title} is a string to display in the mode line while this method is |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1194 active. @var{description} is a string describing this method and what |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1195 it is good for. |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1196 |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1197 @var{activate-func} is a function to call to activate this method. The |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1198 @var{args}, if any, are passed as arguments to @var{activate-func}. All |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1199 told, the arguments to @var{activate-func} are @var{input-method} and |
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1200 the @var{args}. |
22252
40089afa2b1d
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22138
diff
changeset
|
1201 @end defvar |
22138
d4ac295a98b3
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
21682
diff
changeset
|
1202 |
23110
0d84817a4973
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22267
diff
changeset
|
1203 The fundamental interface to input methods is through the |
0d84817a4973
*** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents:
22267
diff
changeset
|
1204 variable @code{input-method-function}. @xref{Reading One Event}. |