Mercurial > emacs
comparison lispref/nonascii.texi @ 88155:d7ddb3e565de
sync with trunk
author | Henrik Enberg <henrik.enberg@telia.com> |
---|---|
date | Mon, 16 Jan 2006 00:03:54 +0000 |
parents | 23a1cea22d13 |
children |
comparison
equal
deleted
inserted
replaced
88154:8ce476d3ba36 | 88155:d7ddb3e565de |
---|---|
1 @c -*-texinfo-*- | 1 @c -*-texinfo-*- |
2 @c This is part of the GNU Emacs Lisp Reference Manual. | 2 @c This is part of the GNU Emacs Lisp Reference Manual. |
3 @c Copyright (C) 1998, 1999 Free Software Foundation, Inc. | 3 @c Copyright (C) 1998, 1999, 2002, 2003, 2004, |
4 @c 2005 Free Software Foundation, Inc. | |
4 @c See the file elisp.texi for copying conditions. | 5 @c See the file elisp.texi for copying conditions. |
5 @setfilename ../info/characters | 6 @setfilename ../info/characters |
6 @node Non-ASCII Characters, Searching and Matching, Text, Top | 7 @node Non-ASCII Characters, Searching and Matching, Text, Top |
7 @chapter Non-@sc{ascii} Characters | 8 @chapter Non-@acronym{ASCII} Characters |
8 @cindex multibyte characters | 9 @cindex multibyte characters |
9 @cindex non-@sc{ascii} characters | 10 @cindex non-@acronym{ASCII} characters |
10 | 11 |
11 This chapter covers the special issues relating to non-@sc{ascii} | 12 This chapter covers the special issues relating to non-@acronym{ASCII} |
12 characters and how they are stored in strings and buffers. | 13 characters and how they are stored in strings and buffers. |
13 | 14 |
14 @menu | 15 @menu |
15 * Text Representations:: Unibyte and multibyte representations | 16 * Text Representations:: Unibyte and multibyte representations |
16 * Converting Representations:: Converting unibyte to multibyte and vice versa. | 17 * Converting Representations:: Converting unibyte to multibyte and vice versa. |
17 * Selecting a Representation:: Treating a byte sequence as unibyte or multi. | 18 * Selecting a Representation:: Treating a byte sequence as unibyte or multi. |
18 * Character Codes:: How unibyte and multibyte relate to | 19 * Character Codes:: How unibyte and multibyte relate to |
19 codes of individual characters. | 20 codes of individual characters. |
20 * Character Sets:: The space of possible characters codes | 21 * Character Sets:: The space of possible character codes |
21 is divided into various character sets. | 22 is divided into various character sets. |
22 * Chars and Bytes:: More information about multibyte encodings. | 23 * Chars and Bytes:: More information about multibyte encodings. |
23 * Splitting Characters:: Converting a character to its byte sequence. | 24 * Splitting Characters:: Converting a character to its byte sequence. |
24 * Scanning Charsets:: Which character sets are used in a buffer? | 25 * Scanning Charsets:: Which character sets are used in a buffer? |
25 * Translation of Characters:: Translation tables are used for conversion. | 26 * Translation of Characters:: Translation tables are used for conversion. |
42 attention to the difference. | 43 attention to the difference. |
43 | 44 |
44 @cindex unibyte text | 45 @cindex unibyte text |
45 In unibyte representation, each character occupies one byte and | 46 In unibyte representation, each character occupies one byte and |
46 therefore the possible character codes range from 0 to 255. Codes 0 | 47 therefore the possible character codes range from 0 to 255. Codes 0 |
47 through 127 are @sc{ascii} characters; the codes from 128 through 255 | 48 through 127 are @acronym{ASCII} characters; the codes from 128 through 255 |
48 are used for one non-@sc{ascii} character set (you can choose which | 49 are used for one non-@acronym{ASCII} character set (you can choose which |
49 character set by setting the variable @code{nonascii-insert-offset}). | 50 character set by setting the variable @code{nonascii-insert-offset}). |
50 | 51 |
51 @cindex leading code | 52 @cindex leading code |
52 @cindex multibyte text | 53 @cindex multibyte text |
53 @cindex trailing codes | 54 @cindex trailing codes |
93 default value to @code{nil} early in startup. | 94 default value to @code{nil} early in startup. |
94 @end defvar | 95 @end defvar |
95 | 96 |
96 @defun position-bytes position | 97 @defun position-bytes position |
97 @tindex position-bytes | 98 @tindex position-bytes |
98 Return the byte-position corresponding to buffer position @var{position} | 99 Return the byte-position corresponding to buffer position |
99 in the current buffer. | 100 @var{position} in the current buffer. This is 1 at the start of the |
101 buffer, and counts upward in bytes. If @var{position} is out of | |
102 range, the value is @code{nil}. | |
100 @end defun | 103 @end defun |
101 | 104 |
102 @defun byte-to-position byte-position | 105 @defun byte-to-position byte-position |
103 @tindex byte-to-position | 106 @tindex byte-to-position |
104 Return the buffer position corresponding to byte-position | 107 Return the buffer position corresponding to byte-position |
105 @var{byte-position} in the current buffer. | 108 @var{byte-position} in the current buffer. If @var{byte-position} is |
109 out of range, the value is @code{nil}. | |
106 @end defun | 110 @end defun |
107 | 111 |
108 @defun multibyte-string-p string | 112 @defun multibyte-string-p string |
109 Return @code{t} if @var{string} is a multibyte string. | 113 Return @code{t} if @var{string} is a multibyte string. |
110 @end defun | 114 @end defun |
132 the characters that might be in the multibyte text. The other natural | 136 the characters that might be in the multibyte text. The other natural |
133 alternative, to convert the buffer contents to multibyte, is not | 137 alternative, to convert the buffer contents to multibyte, is not |
134 acceptable because the buffer's representation is a choice made by the | 138 acceptable because the buffer's representation is a choice made by the |
135 user that cannot be overridden automatically. | 139 user that cannot be overridden automatically. |
136 | 140 |
137 Converting unibyte text to multibyte text leaves @sc{ascii} characters | 141 Converting unibyte text to multibyte text leaves @acronym{ASCII} characters |
138 unchanged, and likewise character codes 128 through 159. It converts | 142 unchanged, and likewise character codes 128 through 159. It converts |
139 the non-@sc{ascii} codes 160 through 255 by adding the value | 143 the non-@acronym{ASCII} codes 160 through 255 by adding the value |
140 @code{nonascii-insert-offset} to each character code. By setting this | 144 @code{nonascii-insert-offset} to each character code. By setting this |
141 variable, you specify which character set the unibyte characters | 145 variable, you specify which character set the unibyte characters |
142 correspond to (@pxref{Character Sets}). For example, if | 146 correspond to (@pxref{Character Sets}). For example, if |
143 @code{nonascii-insert-offset} is 2048, which is @code{(- (make-char | 147 @code{nonascii-insert-offset} is 2048, which is @code{(- (make-char |
144 'latin-iso8859-1) 128)}, then the unibyte non-@sc{ascii} characters | 148 'latin-iso8859-1) 128)}, then the unibyte non-@acronym{ASCII} characters |
145 correspond to Latin 1. If it is 2688, which is @code{(- (make-char | 149 correspond to Latin 1. If it is 2688, which is @code{(- (make-char |
146 'greek-iso8859-7) 128)}, then they correspond to Greek letters. | 150 'greek-iso8859-7) 128)}, then they correspond to Greek letters. |
147 | 151 |
148 Converting multibyte text to unibyte is simpler: it discards all but | 152 Converting multibyte text to unibyte is simpler: it discards all but |
149 the low 8 bits of each character code. If @code{nonascii-insert-offset} | 153 the low 8 bits of each character code. If @code{nonascii-insert-offset} |
151 set, this conversion is the inverse of the other: converting unibyte | 155 set, this conversion is the inverse of the other: converting unibyte |
152 text to multibyte and back to unibyte reproduces the original unibyte | 156 text to multibyte and back to unibyte reproduces the original unibyte |
153 text. | 157 text. |
154 | 158 |
155 @defvar nonascii-insert-offset | 159 @defvar nonascii-insert-offset |
156 This variable specifies the amount to add to a non-@sc{ascii} character | 160 This variable specifies the amount to add to a non-@acronym{ASCII} character |
157 when converting unibyte text to multibyte. It also applies when | 161 when converting unibyte text to multibyte. It also applies when |
158 @code{self-insert-command} inserts a character in the unibyte | 162 @code{self-insert-command} inserts a character in the unibyte |
159 non-@sc{ascii} range, 128 through 255. However, the functions | 163 non-@acronym{ASCII} range, 128 through 255. However, the functions |
160 @code{insert} and @code{insert-char} do not perform this conversion. | 164 @code{insert} and @code{insert-char} do not perform this conversion. |
161 | 165 |
162 The right value to use to select character set @var{cs} is @code{(- | 166 The right value to use to select character set @var{cs} is @code{(- |
163 (make-char @var{cs}) 128)}. If the value of | 167 (make-char @var{cs}) 128)}. If the value of |
164 @code{nonascii-insert-offset} is zero, then conversion actually uses the | 168 @code{nonascii-insert-offset} is zero, then conversion actually uses the |
170 @code{nonascii-insert-offset}. You can use it to specify independently | 174 @code{nonascii-insert-offset}. You can use it to specify independently |
171 how to translate each code in the range of 128 through 255 into a | 175 how to translate each code in the range of 128 through 255 into a |
172 multibyte character. The value should be a char-table, or @code{nil}. | 176 multibyte character. The value should be a char-table, or @code{nil}. |
173 If this is non-@code{nil}, it overrides @code{nonascii-insert-offset}. | 177 If this is non-@code{nil}, it overrides @code{nonascii-insert-offset}. |
174 @end defvar | 178 @end defvar |
179 | |
180 The next three functions either return the argument @var{string}, or a | |
181 newly created string with no text properties. | |
175 | 182 |
176 @defun string-make-unibyte string | 183 @defun string-make-unibyte string |
177 This function converts the text of @var{string} to unibyte | 184 This function converts the text of @var{string} to unibyte |
178 representation, if it isn't already, and returns the result. If | 185 representation, if it isn't already, and returns the result. If |
179 @var{string} is a unibyte string, it is returned unchanged. Multibyte | 186 @var{string} is a unibyte string, it is returned unchanged. Multibyte |
184 @end defun | 191 @end defun |
185 | 192 |
186 @defun string-make-multibyte string | 193 @defun string-make-multibyte string |
187 This function converts the text of @var{string} to multibyte | 194 This function converts the text of @var{string} to multibyte |
188 representation, if it isn't already, and returns the result. If | 195 representation, if it isn't already, and returns the result. If |
189 @var{string} is a multibyte string, it is returned unchanged. | 196 @var{string} is a multibyte string or consists entirely of |
190 The function @code{unibyte-char-to-multibyte} is used to convert | 197 @acronym{ASCII} characters, it is returned unchanged. In particular, |
191 each unibyte character to a multibyte character. | 198 if @var{string} is unibyte and entirely @acronym{ASCII}, the returned |
199 string is unibyte. (When the characters are all @acronym{ASCII}, | |
200 Emacs primitives will treat the string the same way whether it is | |
201 unibyte or multibyte.) If @var{string} is unibyte and contains | |
202 non-@acronym{ASCII} characters, the function | |
203 @code{unibyte-char-to-multibyte} is used to convert each unibyte | |
204 character to a multibyte character. | |
205 @end defun | |
206 | |
207 @defun string-to-multibyte string | |
208 This function returns a multibyte string containing the same sequence | |
209 of character codes as @var{string}. Unlike | |
210 @code{string-make-multibyte}, this function unconditionally returns a | |
211 multibyte string. If @var{string} is a multibyte string, it is | |
212 returned unchanged. | |
213 @end defun | |
214 | |
215 @defun multibyte-char-to-unibyte char | |
216 This convert the multibyte character @var{char} to a unibyte | |
217 character, based on @code{nonascii-translation-table} and | |
218 @code{nonascii-insert-offset}. | |
219 @end defun | |
220 | |
221 @defun unibyte-char-to-multibyte char | |
222 This convert the unibyte character @var{char} to a multibyte | |
223 character, based on @code{nonascii-translation-table} and | |
224 @code{nonascii-insert-offset}. | |
192 @end defun | 225 @end defun |
193 | 226 |
194 @node Selecting a Representation | 227 @node Selecting a Representation |
195 @section Selecting a Representation | 228 @section Selecting a Representation |
196 | 229 |
227 more characters than @var{string} has. | 260 more characters than @var{string} has. |
228 | 261 |
229 If @var{string} is already a unibyte string, then the value is | 262 If @var{string} is already a unibyte string, then the value is |
230 @var{string} itself. Otherwise it is a newly created string, with no | 263 @var{string} itself. Otherwise it is a newly created string, with no |
231 text properties. If @var{string} is multibyte, any characters it | 264 text properties. If @var{string} is multibyte, any characters it |
232 contains of charset @var{eight-bit-control} or @var{eight-bit-graphic} | 265 contains of charset @code{eight-bit-control} or @code{eight-bit-graphic} |
233 are converted to the corresponding single byte. | 266 are converted to the corresponding single byte. |
234 @end defun | 267 @end defun |
235 | 268 |
236 @defun string-as-multibyte string | 269 @defun string-as-multibyte string |
237 This function returns a string with the same bytes as @var{string} but | 270 This function returns a string with the same bytes as @var{string} but |
240 | 273 |
241 If @var{string} is already a multibyte string, then the value is | 274 If @var{string} is already a multibyte string, then the value is |
242 @var{string} itself. Otherwise it is a newly created string, with no | 275 @var{string} itself. Otherwise it is a newly created string, with no |
243 text properties. If @var{string} is unibyte and contains any individual | 276 text properties. If @var{string} is unibyte and contains any individual |
244 8-bit bytes (i.e.@: not part of a multibyte form), they are converted to | 277 8-bit bytes (i.e.@: not part of a multibyte form), they are converted to |
245 the corresponding multibyte character of charset @var{eight-bit-control} | 278 the corresponding multibyte character of charset @code{eight-bit-control} |
246 or @var{eight-bit-graphic}. | 279 or @code{eight-bit-graphic}. |
247 @end defun | 280 @end defun |
248 | 281 |
249 @node Character Codes | 282 @node Character Codes |
250 @section Character Codes | 283 @section Character Codes |
251 @cindex character codes | 284 @cindex character codes |
255 0 to 255---the values that can fit in one byte. The valid character | 288 0 to 255---the values that can fit in one byte. The valid character |
256 codes for multibyte representation range from 0 to 524287, but not all | 289 codes for multibyte representation range from 0 to 524287, but not all |
257 values in that range are valid. The values 128 through 255 are not | 290 values in that range are valid. The values 128 through 255 are not |
258 entirely proper in multibyte text, but they can occur if you do explicit | 291 entirely proper in multibyte text, but they can occur if you do explicit |
259 encoding and decoding (@pxref{Explicit Encoding}). Some other character | 292 encoding and decoding (@pxref{Explicit Encoding}). Some other character |
260 codes cannot occur at all in multibyte text. Only the @sc{ascii} codes | 293 codes cannot occur at all in multibyte text. Only the @acronym{ASCII} codes |
261 0 through 127 are completely legitimate in both representations. | 294 0 through 127 are completely legitimate in both representations. |
262 | 295 |
263 @defun char-valid-p charcode &optional genericp | 296 @defun char-valid-p charcode &optional genericp |
264 This returns @code{t} if @var{charcode} is valid for either one of the two | 297 This returns @code{t} if @var{charcode} is valid (either for unibyte |
265 text representations. | 298 text or for multibyte text). |
266 | 299 |
267 @example | 300 @example |
268 (char-valid-p 65) | 301 (char-valid-p 65) |
269 @result{} t | 302 @result{} t |
270 (char-valid-p 256) | 303 (char-valid-p 256) |
271 @result{} nil | 304 @result{} nil |
272 (char-valid-p 2248) | 305 (char-valid-p 2248) |
273 @result{} t | 306 @result{} t |
274 @end example | 307 @end example |
275 | 308 |
276 If the optional argument @var{genericp} is non-nil, this function | 309 If the optional argument @var{genericp} is non-@code{nil}, this |
277 returns @code{t} if @var{charcode} is a generic character | 310 function also returns @code{t} if @var{charcode} is a generic |
278 (@pxref{Splitting Characters}). | 311 character (@pxref{Splitting Characters}). |
279 @end defun | 312 @end defun |
280 | 313 |
281 @node Character Sets | 314 @node Character Sets |
282 @section Character Sets | 315 @section Character Sets |
283 @cindex character sets | 316 @cindex character sets |
293 cases, characters that would logically be grouped together are split | 326 cases, characters that would logically be grouped together are split |
294 into several character sets. For example, one set of Chinese | 327 into several character sets. For example, one set of Chinese |
295 characters, generally known as Big 5, is divided into two Emacs | 328 characters, generally known as Big 5, is divided into two Emacs |
296 character sets, @code{chinese-big5-1} and @code{chinese-big5-2}. | 329 character sets, @code{chinese-big5-1} and @code{chinese-big5-2}. |
297 | 330 |
298 @sc{ascii} characters are in character set @code{ascii}. The | 331 @acronym{ASCII} characters are in character set @code{ascii}. The |
299 non-@sc{ascii} characters 128 through 159 are in character set | 332 non-@acronym{ASCII} characters 128 through 159 are in character set |
300 @code{eight-bit-control}, and codes 160 through 255 are in character set | 333 @code{eight-bit-control}, and codes 160 through 255 are in character set |
301 @code{eight-bit-graphic}. | 334 @code{eight-bit-graphic}. |
302 | 335 |
303 @defun charsetp object | 336 @defun charsetp object |
304 Returns @code{t} if @var{object} is a symbol that names a character set, | 337 Returns @code{t} if @var{object} is a symbol that names a character set, |
305 @code{nil} otherwise. | 338 @code{nil} otherwise. |
306 @end defun | 339 @end defun |
307 | 340 |
341 @defvar charset-list | |
342 The value is a list of all defined character set names. | |
343 @end defvar | |
344 | |
308 @defun charset-list | 345 @defun charset-list |
309 This function returns a list of all defined character set names. | 346 This function returns the value of @code{charset-list}. It is only |
347 provided for backward compatibility. | |
310 @end defun | 348 @end defun |
311 | 349 |
312 @defun char-charset character | 350 @defun char-charset character |
313 This function returns the name of the character set that @var{character} | 351 This function returns the name of the character set that @var{character} |
314 belongs to. | 352 belongs to, or the symbol @code{unknown} if @var{character} is not a |
353 valid character. | |
315 @end defun | 354 @end defun |
316 | 355 |
317 @defun charset-plist charset | 356 @defun charset-plist charset |
318 @tindex charset-plist | 357 @tindex charset-plist |
319 This function returns the charset property list of the character set | 358 This function returns the charset property list of the character set |
320 @var{charset}. Although @var{charset} is a symbol, this is not the same | 359 @var{charset}. Although @var{charset} is a symbol, this is not the same |
321 as the property list of that symbol. Charset properties are used for | 360 as the property list of that symbol. Charset properties are used for |
322 special purposes within Emacs; for example, | 361 special purposes within Emacs. |
323 @code{preferred-coding-system} helps determine which coding system to | 362 @end defun |
324 use to encode characters in a charset. | 363 |
325 @end defun | 364 @deffn Command list-charset-chars charset |
365 This command displays a list of characters in the character set | |
366 @var{charset}. | |
367 @end deffn | |
326 | 368 |
327 @node Chars and Bytes | 369 @node Chars and Bytes |
328 @section Characters and Bytes | 370 @section Characters and Bytes |
329 @cindex bytes and characters | 371 @cindex bytes and characters |
330 | 372 |
331 @cindex introduction sequence | 373 @cindex introduction sequence |
332 @cindex dimension (of character set) | 374 @cindex dimension (of character set) |
333 In multibyte representation, each character occupies one or more | 375 In multibyte representation, each character occupies one or more |
334 bytes. Each character set has an @dfn{introduction sequence}, which is | 376 bytes. Each character set has an @dfn{introduction sequence}, which is |
335 normally one or two bytes long. (Exception: the @sc{ascii} character | 377 normally one or two bytes long. (Exception: the @code{ascii} character |
336 set and the @sc{eight-bit-graphic} character set have a zero-length | 378 set and the @code{eight-bit-graphic} character set have a zero-length |
337 introduction sequence.) The introduction sequence is the beginning of | 379 introduction sequence.) The introduction sequence is the beginning of |
338 the byte sequence for any character in the character set. The rest of | 380 the byte sequence for any character in the character set. The rest of |
339 the character's bytes distinguish it from the other characters in the | 381 the character's bytes distinguish it from the other characters in the |
340 same character set. Depending on the character set, there are either | 382 same character set. Depending on the character set, there are either |
341 one or two distinguishing bytes; the number of such bytes is called the | 383 one or two distinguishing bytes; the number of such bytes is called the |
371 @defun split-char character | 413 @defun split-char character |
372 Return a list containing the name of the character set of | 414 Return a list containing the name of the character set of |
373 @var{character}, followed by one or two byte values (integers) which | 415 @var{character}, followed by one or two byte values (integers) which |
374 identify @var{character} within that character set. The number of byte | 416 identify @var{character} within that character set. The number of byte |
375 values is the character set's dimension. | 417 values is the character set's dimension. |
418 | |
419 If @var{character} is invalid as a character code, @code{split-char} | |
420 returns a list consisting of the symbol @code{unknown} and @var{character}. | |
376 | 421 |
377 @example | 422 @example |
378 (split-char 2248) | 423 (split-char 2248) |
379 @result{} (latin-iso8859-1 72) | 424 @result{} (latin-iso8859-1 72) |
380 (split-char 65) | 425 (split-char 65) |
393 | 438 |
394 @example | 439 @example |
395 (make-char 'latin-iso8859-1 72) | 440 (make-char 'latin-iso8859-1 72) |
396 @result{} 2248 | 441 @result{} 2248 |
397 @end example | 442 @end example |
443 | |
444 Actually, the eighth bit of both @var{code1} and @var{code2} is zeroed | |
445 before they are used to index @var{charset}. Thus you may use, for | |
446 instance, an ISO 8859 character code rather than subtracting 128, as | |
447 is necessary to index the corresponding Emacs charset. | |
398 @end defun | 448 @end defun |
399 | 449 |
400 @cindex generic characters | 450 @cindex generic characters |
401 If you call @code{make-char} with no @var{byte-values}, the result is | 451 If you call @code{make-char} with no @var{byte-values}, the result is |
402 a @dfn{generic character} which stands for @var{charset}. A generic | 452 a @dfn{generic character} which stands for @var{charset}. A generic |
415 @result{} t | 465 @result{} t |
416 (split-char 2176) | 466 (split-char 2176) |
417 @result{} (latin-iso8859-1 0) | 467 @result{} (latin-iso8859-1 0) |
418 @end example | 468 @end example |
419 | 469 |
420 The character sets @sc{ascii}, @sc{eight-bit-control}, and | 470 The character sets @code{ascii}, @code{eight-bit-control}, and |
421 @sc{eight-bit-graphic} don't have corresponding generic characters. If | 471 @code{eight-bit-graphic} don't have corresponding generic characters. If |
422 @var{charset} is one of them and you don't supply @var{code1}, | 472 @var{charset} is one of them and you don't supply @var{code1}, |
423 @code{make-char} returns the character code corresponding to the | 473 @code{make-char} returns the character code corresponding to the |
424 smallest code in @var{charset}. | 474 smallest code in @var{charset}. |
425 | 475 |
426 @node Scanning Charsets | 476 @node Scanning Charsets |
428 | 478 |
429 Sometimes it is useful to find out which character sets appear in a | 479 Sometimes it is useful to find out which character sets appear in a |
430 part of a buffer or a string. One use for this is in determining which | 480 part of a buffer or a string. One use for this is in determining which |
431 coding systems (@pxref{Coding Systems}) are capable of representing all | 481 coding systems (@pxref{Coding Systems}) are capable of representing all |
432 of the text in question. | 482 of the text in question. |
483 | |
484 @defun charset-after &optional pos | |
485 This function return the charset of a character in the current buffer | |
486 at position @var{pos}. If @var{pos} is omitted or @code{nil}, it | |
487 defauls to the current value of point. If @var{pos} is out of range, | |
488 the value is @code{nil}. | |
489 @end defun | |
433 | 490 |
434 @defun find-charset-region beg end &optional translation | 491 @defun find-charset-region beg end &optional translation |
435 This function returns a list of the character sets that appear in the | 492 This function returns a list of the character sets that appear in the |
436 current buffer between positions @var{beg} and @var{end}. | 493 current buffer between positions @var{beg} and @var{end}. |
437 | 494 |
452 @node Translation of Characters | 509 @node Translation of Characters |
453 @section Translation of Characters | 510 @section Translation of Characters |
454 @cindex character translation tables | 511 @cindex character translation tables |
455 @cindex translation tables | 512 @cindex translation tables |
456 | 513 |
457 A @dfn{translation table} specifies a mapping of characters | 514 A @dfn{translation table} is a char-table that specifies a mapping |
458 into characters. These tables are used in encoding and decoding, and | 515 of characters into characters. These tables are used in encoding and |
459 for other purposes. Some coding systems specify their own particular | 516 decoding, and for other purposes. Some coding systems specify their |
460 translation tables; there are also default translation tables which | 517 own particular translation tables; there are also default translation |
461 apply to all other coding systems. | 518 tables which apply to all other coding systems. |
519 | |
520 For instance, the coding-system @code{utf-8} has a translation table | |
521 that maps characters of various charsets (e.g., | |
522 @code{latin-iso8859-@var{x}}) into Unicode character sets. This way, | |
523 it can encode Latin-2 characters into UTF-8. Meanwhile, | |
524 @code{unify-8859-on-decoding-mode} operates by specifying | |
525 @code{standard-translation-table-for-decode} to translate | |
526 Latin-@var{x} characters into corresponding Unicode characters. | |
462 | 527 |
463 @defun make-translation-table &rest translations | 528 @defun make-translation-table &rest translations |
464 This function returns a translation table based on the argument | 529 This function returns a translation table based on the argument |
465 @var{translations}. Each element of @var{translations} should be a | 530 @var{translations}. Each element of @var{translations} should be a |
466 list of elements of the form @code{(@var{from} . @var{to})}; this says | 531 list of elements of the form @code{(@var{from} . @var{to})}; this says |
472 @var{to-alt}. | 537 @var{to-alt}. |
473 | 538 |
474 You can also map one whole character set into another character set with | 539 You can also map one whole character set into another character set with |
475 the same dimension. To do this, you specify a generic character (which | 540 the same dimension. To do this, you specify a generic character (which |
476 designates a character set) for @var{from} (@pxref{Splitting Characters}). | 541 designates a character set) for @var{from} (@pxref{Splitting Characters}). |
477 In this case, @var{to} should also be a generic character, for another | 542 In this case, if @var{to} is also a generic character, its character |
478 character set of the same dimension. Then the translation table | 543 set should have the same dimension as @var{from}'s. Then the |
479 translates each character of @var{from}'s character set into the | 544 translation table translates each character of @var{from}'s character |
480 corresponding character of @var{to}'s character set. | 545 set into the corresponding character of @var{to}'s character set. If |
546 @var{from} is a generic character and @var{to} is an ordinary | |
547 character, then the translation table translates every character of | |
548 @var{from}'s character set into @var{to}. | |
481 @end defun | 549 @end defun |
482 | 550 |
483 In decoding, the translation table's translations are applied to the | 551 In decoding, the translation table's translations are applied to the |
484 characters that result from ordinary decoding. If a coding system has | 552 characters that result from ordinary decoding. If a coding system has |
485 property @code{character-translation-table-for-decode}, that specifies | 553 property @code{translation-table-for-decode}, that specifies the |
486 the translation table to use. Otherwise, if | 554 translation table to use. (This is a property of the coding system, |
487 @code{standard-translation-table-for-decode} is non-@code{nil}, decoding | 555 as returned by @code{coding-system-get}, not a property of the symbol |
488 uses that table. | 556 that is the coding system's name. @xref{Coding System Basics,, Basic |
557 Concepts of Coding Systems}.) Otherwise, if | |
558 @code{standard-translation-table-for-decode} is non-@code{nil}, | |
559 decoding uses that table. | |
489 | 560 |
490 In encoding, the translation table's translations are applied to the | 561 In encoding, the translation table's translations are applied to the |
491 characters in the buffer, and the result of translation is actually | 562 characters in the buffer, and the result of translation is actually |
492 encoded. If a coding system has property | 563 encoded. If a coding system has property |
493 @code{character-translation-table-for-encode}, that specifies the | 564 @code{translation-table-for-encode}, that specifies the translation |
494 translation table to use. Otherwise the variable | 565 table to use. Otherwise the variable |
495 @code{standard-translation-table-for-encode} specifies the translation | 566 @code{standard-translation-table-for-encode} specifies the translation |
496 table. | 567 table. |
497 | 568 |
498 @defvar standard-translation-table-for-decode | 569 @defvar standard-translation-table-for-decode |
499 This is the default translation table for decoding, for | 570 This is the default translation table for decoding, for |
501 @end defvar | 572 @end defvar |
502 | 573 |
503 @defvar standard-translation-table-for-encode | 574 @defvar standard-translation-table-for-encode |
504 This is the default translation table for encoding, for | 575 This is the default translation table for encoding, for |
505 coding systems that don't specify any other translation table. | 576 coding systems that don't specify any other translation table. |
577 @end defvar | |
578 | |
579 @defvar translation-table-for-input | |
580 Self-inserting characters are translated through this translation | |
581 table before they are inserted. This variable automatically becomes | |
582 buffer-local when set. | |
583 | |
584 @code{set-buffer-file-coding-system} sets this variable so that your | |
585 keyboard input gets translated into the character sets that the buffer | |
586 is likely to contain. | |
506 @end defvar | 587 @end defvar |
507 | 588 |
508 @node Coding Systems | 589 @node Coding Systems |
509 @section Coding Systems | 590 @section Coding Systems |
510 | 591 |
546 | 627 |
547 Most coding systems specify a particular character code for | 628 Most coding systems specify a particular character code for |
548 conversion, but some of them leave the choice unspecified---to be chosen | 629 conversion, but some of them leave the choice unspecified---to be chosen |
549 heuristically for each file, based on the data. | 630 heuristically for each file, based on the data. |
550 | 631 |
632 In general, a coding system doesn't guarantee roundtrip identity: | |
633 decoding a byte sequence using coding system, then encoding the | |
634 resulting text in the same coding system, can produce a different byte | |
635 sequence. However, the following coding systems do guarantee that the | |
636 byte sequence will be the same as what you originally decoded: | |
637 | |
638 @quotation | |
639 chinese-big5 chinese-iso-8bit cyrillic-iso-8bit emacs-mule | |
640 greek-iso-8bit hebrew-iso-8bit iso-latin-1 iso-latin-2 iso-latin-3 | |
641 iso-latin-4 iso-latin-5 iso-latin-8 iso-latin-9 iso-safe | |
642 japanese-iso-8bit japanese-shift-jis korean-iso-8bit raw-text | |
643 @end quotation | |
644 | |
645 Encoding buffer text and then decoding the result can also fail to | |
646 reproduce the original text. For instance, if you encode Latin-2 | |
647 characters with @code{utf-8} and decode the result using the same | |
648 coding system, you'll get Unicode characters (of charset | |
649 @code{mule-unicode-0100-24ff}). If you encode Unicode characters with | |
650 @code{iso-latin-2} and decode the result with the same coding system, | |
651 you'll get Latin-2 characters. | |
652 | |
551 @cindex end of line conversion | 653 @cindex end of line conversion |
552 @dfn{End of line conversion} handles three different conventions used | 654 @dfn{End of line conversion} handles three different conventions used |
553 on various systems for representing end of line in files. The Unix | 655 on various systems for representing end of line in files. The Unix |
554 convention is to use the linefeed character (also called newline). The | 656 convention is to use the linefeed character (also called newline). The |
555 DOS convention is to use a carriage-return and a linefeed at the end of | 657 DOS convention is to use a carriage-return and a linefeed at the end of |
604 writing files. The function @code{insert-file-contents} uses | 706 writing files. The function @code{insert-file-contents} uses |
605 a coding system for decoding the file data, and @code{write-region} | 707 a coding system for decoding the file data, and @code{write-region} |
606 uses one to encode the buffer contents. | 708 uses one to encode the buffer contents. |
607 | 709 |
608 You can specify the coding system to use either explicitly | 710 You can specify the coding system to use either explicitly |
609 (@pxref{Specifying Coding Systems}), or implicitly using the defaulting | 711 (@pxref{Specifying Coding Systems}), or implicitly using a default |
610 mechanism (@pxref{Default Coding Systems}). But these methods may not | 712 mechanism (@pxref{Default Coding Systems}). But these methods may not |
611 completely specify what to do. For example, they may choose a coding | 713 completely specify what to do. For example, they may choose a coding |
612 system such as @code{undefined} which leaves the character code | 714 system such as @code{undefined} which leaves the character code |
613 conversion to be determined from the data. In these cases, the I/O | 715 conversion to be determined from the data. In these cases, the I/O |
614 operation finishes the job of choosing a coding system. Very often | 716 operation finishes the job of choosing a coding system. Very often |
615 you will want to find out afterwards which coding system was chosen. | 717 you will want to find out afterwards which coding system was chosen. |
616 | 718 |
617 @defvar buffer-file-coding-system | 719 @defvar buffer-file-coding-system |
618 This variable records the coding system that was used for visiting the | 720 This buffer-local variable records the coding system that was used to visit |
619 current buffer. It is used for saving the buffer, and for writing part | 721 the current buffer. It is used for saving the buffer, and for writing part |
620 of the buffer with @code{write-region}. If the text to be written | 722 of the buffer with @code{write-region}. If the text to be written |
621 cannot be safely encoded using the coding system specified by this | 723 cannot be safely encoded using the coding system specified by this |
622 variable, these operations select an alternative encoding by calling | 724 variable, these operations select an alternative encoding by calling |
623 the function @code{select-safe-coding-system} (@pxref{User-Chosen | 725 the function @code{select-safe-coding-system} (@pxref{User-Chosen |
624 Coding Systems}). If selecting a different encoding requires to ask | 726 Coding Systems}). If selecting a different encoding requires to ask |
656 @end defvar | 758 @end defvar |
657 | 759 |
658 The variable @code{selection-coding-system} specifies how to encode | 760 The variable @code{selection-coding-system} specifies how to encode |
659 selections for the window system. @xref{Window System Selections}. | 761 selections for the window system. @xref{Window System Selections}. |
660 | 762 |
763 @defvar file-name-coding-system | |
764 The variable @code{file-name-coding-system} specifies the coding | |
765 system to use for encoding file names. Emacs encodes file names using | |
766 that coding system for all file operations. If | |
767 @code{file-name-coding-system} is @code{nil}, Emacs uses a default | |
768 coding system determined by the selected language environment. In the | |
769 default language environment, any non-@acronym{ASCII} characters in | |
770 file names are not encoded specially; they appear in the file system | |
771 using the internal Emacs representation. | |
772 @end defvar | |
773 | |
774 @strong{Warning:} if you change @code{file-name-coding-system} (or | |
775 the language environment) in the middle of an Emacs session, problems | |
776 can result if you have already visited files whose names were encoded | |
777 using the earlier coding system and are handled differently under the | |
778 new coding system. If you try to save one of these buffers under the | |
779 visited file name, saving may use the wrong file name, or it may get | |
780 an error. If such a problem happens, use @kbd{C-x C-w} to specify a | |
781 new file name for that buffer. | |
782 | |
661 @node Lisp and Coding Systems | 783 @node Lisp and Coding Systems |
662 @subsection Coding Systems in Lisp | 784 @subsection Coding Systems in Lisp |
663 | 785 |
664 Here are the Lisp facilities for working with coding systems: | 786 Here are the Lisp facilities for working with coding systems: |
665 | 787 |
670 systems as well. | 792 systems as well. |
671 @end defun | 793 @end defun |
672 | 794 |
673 @defun coding-system-p object | 795 @defun coding-system-p object |
674 This function returns @code{t} if @var{object} is a coding system | 796 This function returns @code{t} if @var{object} is a coding system |
675 name. | 797 name or @code{nil}. |
676 @end defun | 798 @end defun |
677 | 799 |
678 @defun check-coding-system coding-system | 800 @defun check-coding-system coding-system |
679 This function checks the validity of @var{coding-system}. | 801 This function checks the validity of @var{coding-system}. |
680 If that is valid, it returns @var{coding-system}. | 802 If that is valid, it returns @var{coding-system}. |
685 This function returns a coding system which is like @var{coding-system} | 807 This function returns a coding system which is like @var{coding-system} |
686 except for its eol conversion, which is specified by @code{eol-type}. | 808 except for its eol conversion, which is specified by @code{eol-type}. |
687 @var{eol-type} should be @code{unix}, @code{dos}, @code{mac}, or | 809 @var{eol-type} should be @code{unix}, @code{dos}, @code{mac}, or |
688 @code{nil}. If it is @code{nil}, the returned coding system determines | 810 @code{nil}. If it is @code{nil}, the returned coding system determines |
689 the end-of-line conversion from the data. | 811 the end-of-line conversion from the data. |
812 | |
813 @var{eol-type} may also be 0, 1 or 2, standing for @code{unix}, | |
814 @code{dos} and @code{mac}, respectively. | |
690 @end defun | 815 @end defun |
691 | 816 |
692 @defun coding-system-change-text-conversion eol-coding text-coding | 817 @defun coding-system-change-text-conversion eol-coding text-coding |
693 This function returns a coding system which uses the end-of-line | 818 This function returns a coding system which uses the end-of-line |
694 conversion of @var{eol-coding}, and the text conversion of | 819 conversion of @var{eol-coding}, and the text conversion of |
728 handle decoding the text that was scanned. They are listed in order of | 853 handle decoding the text that was scanned. They are listed in order of |
729 decreasing priority. But if @var{highest} is non-@code{nil}, then the | 854 decreasing priority. But if @var{highest} is non-@code{nil}, then the |
730 return value is just one coding system, the one that is highest in | 855 return value is just one coding system, the one that is highest in |
731 priority. | 856 priority. |
732 | 857 |
733 If the region contains only @sc{ascii} characters, the value | 858 If the region contains only @acronym{ASCII} characters, the value |
734 is @code{undecided} or @code{(undecided)}. | 859 is @code{undecided} or @code{(undecided)}, or a variant specifying |
735 @end defun | 860 end-of-line conversion, if that can be deduced from the text. |
736 | 861 @end defun |
737 @defun detect-coding-string string highest | 862 |
863 @defun detect-coding-string string &optional highest | |
738 This function is like @code{detect-coding-region} except that it | 864 This function is like @code{detect-coding-region} except that it |
739 operates on the contents of @var{string} instead of bytes in the buffer. | 865 operates on the contents of @var{string} instead of bytes in the buffer. |
740 @end defun | 866 @end defun |
741 | 867 |
742 @xref{Process Information}, for how to examine or set the coding | 868 @xref{Coding systems for a subprocess,, Process Information}, in |
743 systems used for I/O to a subprocess. | 869 particular the description of the functions |
870 @code{process-coding-system} and @code{set-process-coding-system}, for | |
871 how to examine or set the coding systems used for I/O to a subprocess. | |
744 | 872 |
745 @node User-Chosen Coding Systems | 873 @node User-Chosen Coding Systems |
746 @subsection User-Chosen Coding Systems | 874 @subsection User-Chosen Coding Systems |
747 | 875 |
748 @cindex select safe coding system | 876 @cindex select safe coding system |
749 @defun select-safe-coding-system from to &optional default-coding-system accept-default-p | 877 @defun select-safe-coding-system from to &optional default-coding-system accept-default-p file |
750 This function selects a coding system for encoding specified text, | 878 This function selects a coding system for encoding specified text, |
751 asking the user to choose if necessary. Normally the specified text | 879 asking the user to choose if necessary. Normally the specified text |
752 is the text in the current buffer between @var{from} and @var{to}, | 880 is the text in the current buffer between @var{from} and @var{to}. If |
753 defaulting to the whole buffer if they are @code{nil}. If @var{from} | 881 @var{from} is a string, the string specifies the text to encode, and |
754 is a string, the string specifies the text to encode, and @var{to} is | 882 @var{to} is ignored. |
755 ignored. | |
756 | 883 |
757 If @var{default-coding-system} is non-@code{nil}, that is the first | 884 If @var{default-coding-system} is non-@code{nil}, that is the first |
758 coding system to try; if that can handle the text, | 885 coding system to try; if that can handle the text, |
759 @code{select-safe-coding-system} returns that coding system. It can | 886 @code{select-safe-coding-system} returns that coding system. It can |
760 also be a list of coding systems; then the function tries each of them | 887 also be a list of coding systems; then the function tries each of them |
761 one by one. After trying all of them, it next tries the user's most | 888 one by one. After trying all of them, it next tries the current |
762 preferred coding system (@pxref{Recognize Coding, | 889 buffer's value of @code{buffer-file-coding-system} (if it is not |
763 prefer-coding-system, the description of @code{prefer-coding-system}, | 890 @code{undecided}), then the value of |
764 emacs, GNU Emacs Manual}), and after that the current buffer's value | 891 @code{default-buffer-file-coding-system} and finally the user's most |
765 of @code{buffer-file-coding-system} (if it is not @code{undecided}). | 892 preferred coding system, which the user can set using the command |
893 @code{prefer-coding-system} (@pxref{Recognize Coding,, Recognizing | |
894 Coding Systems, emacs, The GNU Emacs Manual}). | |
766 | 895 |
767 If one of those coding systems can safely encode all the specified | 896 If one of those coding systems can safely encode all the specified |
768 text, @code{select-safe-coding-system} chooses it and returns it. | 897 text, @code{select-safe-coding-system} chooses it and returns it. |
769 Otherwise, it asks the user to choose from a list of coding systems | 898 Otherwise, it asks the user to choose from a list of coding systems |
770 which can encode all the text, and returns the user's choice. | 899 which can encode all the text, and returns the user's choice. |
771 | 900 |
901 @var{default-coding-system} can also be a list whose first element is | |
902 t and whose other elements are coding systems. Then, if no coding | |
903 system in the list can handle the text, @code{select-safe-coding-system} | |
904 queries the user immediately, without trying any of the three | |
905 alternatives described above. | |
906 | |
772 The optional argument @var{accept-default-p}, if non-@code{nil}, | 907 The optional argument @var{accept-default-p}, if non-@code{nil}, |
773 should be a function to determine whether the coding system selected | 908 should be a function to determine whether a coding system selected |
774 without user interaction is acceptable. If this function returns | 909 without user interaction is acceptable. @code{select-safe-coding-system} |
775 @code{nil}, the silently selected coding system is rejected, and the | 910 calls this function with one argument, the base coding system of the |
776 user is asked to select a coding system from a list of possible | 911 selected coding system. If @var{accept-default-p} returns @code{nil}, |
777 candidates. | 912 @code{select-safe-coding-system} rejects the silently selected coding |
913 system, and asks the user to select a coding system from a list of | |
914 possible candidates. | |
778 | 915 |
779 @vindex select-safe-coding-system-accept-default-p | 916 @vindex select-safe-coding-system-accept-default-p |
780 If the variable @code{select-safe-coding-system-accept-default-p} is | 917 If the variable @code{select-safe-coding-system-accept-default-p} is |
781 non-@code{nil}, its value overrides the value of | 918 non-@code{nil}, its value overrides the value of |
782 @var{accept-default-p}. | 919 @var{accept-default-p}. |
920 | |
921 As a final step, before returning the chosen coding system, | |
922 @code{select-safe-coding-system} checks whether that coding system is | |
923 consistent with what would be selected if the contents of the region | |
924 were read from a file. (If not, this could lead to data corruption in | |
925 a file subsequently re-visited and edited.) Normally, | |
926 @code{select-safe-coding-system} uses @code{buffer-file-name} as the | |
927 file for this purpose, but if @var{file} is non-@code{nil}, it uses | |
928 that file instead (this can be relevant for @code{write-region} and | |
929 similar functions). If it detects an apparent inconsistency, | |
930 @code{select-safe-coding-system} queries the user before selecting the | |
931 coding system. | |
783 @end defun | 932 @end defun |
784 | 933 |
785 Here are two functions you can use to let the user specify a coding | 934 Here are two functions you can use to let the user specify a coding |
786 system, with completion. @xref{Completion}. | 935 system, with completion. @xref{Completion}. |
787 | 936 |
838 that coding system is used for both reading the file and writing it. If | 987 that coding system is used for both reading the file and writing it. If |
839 @var{coding} is a cons cell containing two coding systems, its @sc{car} | 988 @var{coding} is a cons cell containing two coding systems, its @sc{car} |
840 specifies the coding system for decoding, and its @sc{cdr} specifies the | 989 specifies the coding system for decoding, and its @sc{cdr} specifies the |
841 coding system for encoding. | 990 coding system for encoding. |
842 | 991 |
843 If @var{coding} is a function name, the function must return a coding | 992 If @var{coding} is a function name, the function should take one |
844 system or a cons cell containing two coding systems. This value is used | 993 argument, a list of all arguments passed to |
845 as described above. | 994 @code{find-operation-coding-system}. It must return a coding system |
995 or a cons cell containing two coding systems. This value has the same | |
996 meaning as described above. | |
846 @end defvar | 997 @end defvar |
847 | 998 |
848 @defvar process-coding-system-alist | 999 @defvar process-coding-system-alist |
849 This variable is an alist specifying which coding systems to use for a | 1000 This variable is an alist specifying which coding systems to use for a |
850 subprocess, depending on which program is running in the subprocess. It | 1001 subprocess, depending on which program is running in the subprocess. It |
885 The value should be a cons cell of the form @code{(@var{input-coding} | 1036 The value should be a cons cell of the form @code{(@var{input-coding} |
886 . @var{output-coding})}. Here @var{input-coding} applies to input from | 1037 . @var{output-coding})}. Here @var{input-coding} applies to input from |
887 the subprocess, and @var{output-coding} applies to output to it. | 1038 the subprocess, and @var{output-coding} applies to output to it. |
888 @end defvar | 1039 @end defvar |
889 | 1040 |
1041 @defvar auto-coding-functions | |
1042 This variable holds a list of functions that try to determine a | |
1043 coding system for a file based on its undecoded contents. | |
1044 | |
1045 Each function in this list should be written to look at text in the | |
1046 current buffer, but should not modify it in any way. The buffer will | |
1047 contain undecoded text of parts of the file. Each function should | |
1048 take one argument, @var{size}, which tells it how many characters to | |
1049 look at, starting from point. If the function succeeds in determining | |
1050 a coding system for the file, it should return that coding system. | |
1051 Otherwise, it should return @code{nil}. | |
1052 | |
1053 If a file has a @samp{coding:} tag, that takes precedence, so these | |
1054 functions won't be called. | |
1055 @end defvar | |
1056 | |
890 @defun find-operation-coding-system operation &rest arguments | 1057 @defun find-operation-coding-system operation &rest arguments |
891 This function returns the coding system to use (by default) for | 1058 This function returns the coding system to use (by default) for |
892 performing @var{operation} with @var{arguments}. The value has this | 1059 performing @var{operation} with @var{arguments}. The value has this |
893 form: | 1060 form: |
894 | 1061 |
895 @example | 1062 @example |
896 (@var{decoding-system} @var{encoding-system}) | 1063 (@var{decoding-system} . @var{encoding-system}) |
897 @end example | 1064 @end example |
898 | 1065 |
899 The first element, @var{decoding-system}, is the coding system to use | 1066 The first element, @var{decoding-system}, is the coding system to use |
900 for decoding (in case @var{operation} does decoding), and | 1067 for decoding (in case @var{operation} does decoding), and |
901 @var{encoding-system} is the coding system for encoding (in case | 1068 @var{encoding-system} is the coding system for encoding (in case |
902 @var{operation} does encoding). | 1069 @var{operation} does encoding). |
903 | 1070 |
904 The argument @var{operation} should be a symbol, one of | 1071 The argument @var{operation} should be a symbol, any one of |
905 @code{insert-file-contents}, @code{write-region}, @code{call-process}, | 1072 @code{insert-file-contents}, @code{write-region}, |
906 @code{call-process-region}, @code{start-process}, or | 1073 @code{start-process}, @code{call-process}, @code{call-process-region}, |
907 @code{open-network-stream}. These are the names of the Emacs I/O primitives | 1074 or @code{open-network-stream}. These are the names of the Emacs I/O |
908 that can do coding system conversion. | 1075 primitives that can do coding system conversion. |
909 | 1076 |
910 The remaining arguments should be the same arguments that might be given | 1077 The remaining arguments should be the same arguments that might be given |
911 to that I/O primitive. Depending on the primitive, one of those | 1078 to that I/O primitive. Depending on the primitive, one of those |
912 arguments is selected as the @dfn{target}. For example, if | 1079 arguments is selected as the @dfn{target}. For example, if |
913 @var{operation} does file I/O, whichever argument specifies the file | 1080 @var{operation} does file I/O, whichever argument specifies the file |
914 name is the target. For subprocess primitives, the process name is the | 1081 name is the target. For subprocess primitives, the process name is the |
915 target. For @code{open-network-stream}, the target is the service name | 1082 target. For @code{open-network-stream}, the target is the service name |
916 or port number. | 1083 or port number. |
917 | 1084 |
918 This function looks up the target in @code{file-coding-system-alist}, | 1085 Depending on @var{operation}, this function looks up the target in |
919 @code{process-coding-system-alist}, or | 1086 @code{file-coding-system-alist}, @code{process-coding-system-alist}, |
920 @code{network-coding-system-alist}, depending on @var{operation}. | 1087 or @code{network-coding-system-alist}. |
921 @xref{Default Coding Systems}. | |
922 @end defun | 1088 @end defun |
923 | 1089 |
924 @node Specifying Coding Systems | 1090 @node Specifying Coding Systems |
925 @subsection Specifying a Coding System for One Operation | 1091 @subsection Specifying a Coding System for One Operation |
926 | 1092 |
943 you should not globally set it to any other value. Here is an example | 1109 you should not globally set it to any other value. Here is an example |
944 of the right way to use the variable: | 1110 of the right way to use the variable: |
945 | 1111 |
946 @example | 1112 @example |
947 ;; @r{Read the file with no character code conversion.} | 1113 ;; @r{Read the file with no character code conversion.} |
948 ;; @r{Assume @sc{crlf} represents end-of-line.} | 1114 ;; @r{Assume @acronym{crlf} represents end-of-line.} |
949 (let ((coding-system-for-write 'emacs-mule-dos)) | 1115 (let ((coding-system-for-read 'emacs-mule-dos)) |
950 (insert-file-contents filename)) | 1116 (insert-file-contents filename)) |
951 @end example | 1117 @end example |
952 | 1118 |
953 When its value is non-@code{nil}, @code{coding-system-for-read} takes | 1119 When its value is non-@code{nil}, @code{coding-system-for-read} takes |
954 precedence over all other methods of specifying a coding system to use for | 1120 precedence over all other methods of specifying a coding system to use for |
1008 Here are the functions to perform explicit encoding or decoding. The | 1174 Here are the functions to perform explicit encoding or decoding. The |
1009 decoding functions produce sequences of bytes; the encoding functions | 1175 decoding functions produce sequences of bytes; the encoding functions |
1010 are meant to operate on sequences of bytes. All of these functions | 1176 are meant to operate on sequences of bytes. All of these functions |
1011 discard text properties. | 1177 discard text properties. |
1012 | 1178 |
1013 @defun encode-coding-region start end coding-system | 1179 @deffn Command encode-coding-region start end coding-system |
1014 This function encodes the text from @var{start} to @var{end} according | 1180 This command encodes the text from @var{start} to @var{end} according |
1015 to coding system @var{coding-system}. The encoded text replaces the | 1181 to coding system @var{coding-system}. The encoded text replaces the |
1016 original text in the buffer. The result of encoding is logically a | 1182 original text in the buffer. The result of encoding is logically a |
1017 sequence of bytes, but the buffer remains multibyte if it was multibyte | 1183 sequence of bytes, but the buffer remains multibyte if it was multibyte |
1018 before. | 1184 before. |
1019 @end defun | 1185 |
1020 | 1186 This command returns the length of the encoded text. |
1021 @defun encode-coding-string string coding-system | 1187 @end deffn |
1188 | |
1189 @defun encode-coding-string string coding-system &optional nocopy | |
1022 This function encodes the text in @var{string} according to coding | 1190 This function encodes the text in @var{string} according to coding |
1023 system @var{coding-system}. It returns a new string containing the | 1191 system @var{coding-system}. It returns a new string containing the |
1024 encoded text. The result of encoding is a unibyte string. | 1192 encoded text, except when @var{nocopy} is non-@code{nil}, in which |
1025 @end defun | 1193 case the function may return @var{string} itself if the encoding |
1026 | 1194 operation is trivial. The result of encoding is a unibyte string. |
1027 @defun decode-coding-region start end coding-system | 1195 @end defun |
1028 This function decodes the text from @var{start} to @var{end} according | 1196 |
1197 @deffn Command decode-coding-region start end coding-system | |
1198 This command decodes the text from @var{start} to @var{end} according | |
1029 to coding system @var{coding-system}. The decoded text replaces the | 1199 to coding system @var{coding-system}. The decoded text replaces the |
1030 original text in the buffer. To make explicit decoding useful, the text | 1200 original text in the buffer. To make explicit decoding useful, the text |
1031 before decoding ought to be a sequence of byte values, but both | 1201 before decoding ought to be a sequence of byte values, but both |
1032 multibyte and unibyte buffers are acceptable. | 1202 multibyte and unibyte buffers are acceptable. |
1033 @end defun | 1203 |
1034 | 1204 This command returns the length of the decoded text. |
1035 @defun decode-coding-string string coding-system | 1205 @end deffn |
1206 | |
1207 @defun decode-coding-string string coding-system &optional nocopy | |
1036 This function decodes the text in @var{string} according to coding | 1208 This function decodes the text in @var{string} according to coding |
1037 system @var{coding-system}. It returns a new string containing the | 1209 system @var{coding-system}. It returns a new string containing the |
1038 decoded text. To make explicit decoding useful, the contents of | 1210 decoded text, except when @var{nocopy} is non-@code{nil}, in which |
1039 @var{string} ought to be a sequence of byte values, but a multibyte | 1211 case the function may return @var{string} itself if the decoding |
1212 operation is trivial. To make explicit decoding useful, the contents | |
1213 of @var{string} ought to be a sequence of byte values, but a multibyte | |
1040 string is acceptable. | 1214 string is acceptable. |
1215 @end defun | |
1216 | |
1217 @defun decode-coding-inserted-region from to filename &optional visit beg end replace | |
1218 This function decodes the text from @var{from} to @var{to} as if | |
1219 it were being read from file @var{filename} using @code{insert-file-contents} | |
1220 using the rest of the arguments provided. | |
1221 | |
1222 The normal way to use this function is after reading text from a file | |
1223 without decoding, if you decide you would rather have decoded it. | |
1224 Instead of deleting the text and reading it again, this time with | |
1225 decoding, you can call this function. | |
1041 @end defun | 1226 @end defun |
1042 | 1227 |
1043 @node Terminal I/O Encoding | 1228 @node Terminal I/O Encoding |
1044 @subsection Terminal I/O Encoding | 1229 @subsection Terminal I/O Encoding |
1045 | 1230 |
1052 @defun keyboard-coding-system | 1237 @defun keyboard-coding-system |
1053 This function returns the coding system that is in use for decoding | 1238 This function returns the coding system that is in use for decoding |
1054 keyboard input---or @code{nil} if no coding system is to be used. | 1239 keyboard input---or @code{nil} if no coding system is to be used. |
1055 @end defun | 1240 @end defun |
1056 | 1241 |
1057 @defun set-keyboard-coding-system coding-system | 1242 @deffn Command set-keyboard-coding-system coding-system |
1058 This function specifies @var{coding-system} as the coding system to | 1243 This command specifies @var{coding-system} as the coding system to |
1059 use for decoding keyboard input. If @var{coding-system} is @code{nil}, | 1244 use for decoding keyboard input. If @var{coding-system} is @code{nil}, |
1060 that means do not decode keyboard input. | 1245 that means do not decode keyboard input. |
1061 @end defun | 1246 @end deffn |
1062 | 1247 |
1063 @defun terminal-coding-system | 1248 @defun terminal-coding-system |
1064 This function returns the coding system that is in use for encoding | 1249 This function returns the coding system that is in use for encoding |
1065 terminal output---or @code{nil} for no encoding. | 1250 terminal output---or @code{nil} for no encoding. |
1066 @end defun | 1251 @end defun |
1067 | 1252 |
1068 @defun set-terminal-coding-system coding-system | 1253 @deffn Command set-terminal-coding-system coding-system |
1069 This function specifies @var{coding-system} as the coding system to use | 1254 This command specifies @var{coding-system} as the coding system to use |
1070 for encoding terminal output. If @var{coding-system} is @code{nil}, | 1255 for encoding terminal output. If @var{coding-system} is @code{nil}, |
1071 that means do not encode terminal output. | 1256 that means do not encode terminal output. |
1072 @end defun | 1257 @end deffn |
1073 | 1258 |
1074 @node MS-DOS File Types | 1259 @node MS-DOS File Types |
1075 @subsection MS-DOS File Types | 1260 @subsection MS-DOS File Types |
1076 @cindex DOS file types | 1261 @cindex DOS file types |
1077 @cindex MS-DOS file types | 1262 @cindex MS-DOS file types |
1132 | 1317 |
1133 @node Input Methods | 1318 @node Input Methods |
1134 @section Input Methods | 1319 @section Input Methods |
1135 @cindex input methods | 1320 @cindex input methods |
1136 | 1321 |
1137 @dfn{Input methods} provide convenient ways of entering non-@sc{ascii} | 1322 @dfn{Input methods} provide convenient ways of entering non-@acronym{ASCII} |
1138 characters from the keyboard. Unlike coding systems, which translate | 1323 characters from the keyboard. Unlike coding systems, which translate |
1139 non-@sc{ascii} characters to and from encodings meant to be read by | 1324 non-@acronym{ASCII} characters to and from encodings meant to be read by |
1140 programs, input methods provide human-friendly commands. (@xref{Input | 1325 programs, input methods provide human-friendly commands. (@xref{Input |
1141 Methods,,, emacs, The GNU Emacs Manual}, for information on how users | 1326 Methods,,, emacs, The GNU Emacs Manual}, for information on how users |
1142 use input methods to enter text.) How to define input methods is not | 1327 use input methods to enter text.) How to define input methods is not |
1143 yet documented in this manual, but here we describe how to use them. | 1328 yet documented in this manual, but here we describe how to use them. |
1144 | 1329 |
1150 current buffer. (It automatically becomes local in each buffer when set | 1335 current buffer. (It automatically becomes local in each buffer when set |
1151 in any fashion.) It is @code{nil} if no input method is active in the | 1336 in any fashion.) It is @code{nil} if no input method is active in the |
1152 buffer now. | 1337 buffer now. |
1153 @end defvar | 1338 @end defvar |
1154 | 1339 |
1155 @defvar default-input-method | 1340 @defopt default-input-method |
1156 This variable holds the default input method for commands that choose an | 1341 This variable holds the default input method for commands that choose an |
1157 input method. Unlike @code{current-input-method}, this variable is | 1342 input method. Unlike @code{current-input-method}, this variable is |
1158 normally global. | 1343 normally global. |
1159 @end defvar | 1344 @end defopt |
1160 | 1345 |
1161 @defun set-input-method input-method | 1346 @deffn Command set-input-method input-method |
1162 This function activates input method @var{input-method} for the current | 1347 This command activates input method @var{input-method} for the current |
1163 buffer. It also sets @code{default-input-method} to @var{input-method}. | 1348 buffer. It also sets @code{default-input-method} to @var{input-method}. |
1164 If @var{input-method} is @code{nil}, this function deactivates any input | 1349 If @var{input-method} is @code{nil}, this command deactivates any input |
1165 method for the current buffer. | 1350 method for the current buffer. |
1166 @end defun | 1351 @end deffn |
1167 | 1352 |
1168 @defun read-input-method-name prompt &optional default inhibit-null | 1353 @defun read-input-method-name prompt &optional default inhibit-null |
1169 This function reads an input method name with the minibuffer, prompting | 1354 This function reads an input method name with the minibuffer, prompting |
1170 with @var{prompt}. If @var{default} is non-@code{nil}, that is returned | 1355 with @var{prompt}. If @var{default} is non-@code{nil}, that is returned |
1171 by default, if the user enters empty input. However, if | 1356 by default, if the user enters empty input. However, if |
1197 active. @var{description} is a string describing this method and what | 1382 active. @var{description} is a string describing this method and what |
1198 it is good for. | 1383 it is good for. |
1199 @end defvar | 1384 @end defvar |
1200 | 1385 |
1201 The fundamental interface to input methods is through the | 1386 The fundamental interface to input methods is through the |
1202 variable @code{input-method-function}. @xref{Reading One Event}. | 1387 variable @code{input-method-function}. @xref{Reading One Event}, |
1388 and @ref{Invoking the Input Method}. | |
1203 | 1389 |
1204 @node Locales | 1390 @node Locales |
1205 @section Locales | 1391 @section Locales |
1206 @cindex locale | 1392 @cindex locale |
1207 | 1393 |
1233 Changing the locale can cause messages to appear according to the | 1419 Changing the locale can cause messages to appear according to the |
1234 conventions of a different language. If the variable is @code{nil}, the | 1420 conventions of a different language. If the variable is @code{nil}, the |
1235 locale is specified by environment variables in the usual POSIX fashion. | 1421 locale is specified by environment variables in the usual POSIX fashion. |
1236 @end defvar | 1422 @end defvar |
1237 | 1423 |
1424 @defun locale-info item | |
1425 This function returns locale data @var{item} for the current POSIX | |
1426 locale, if available. @var{item} should be one of these symbols: | |
1427 | |
1428 @table @code | |
1429 @item codeset | |
1430 Return the character set as a string (locale item @code{CODESET}). | |
1431 | |
1432 @item days | |
1433 Return a 7-element vector of day names (locale items | |
1434 @code{DAY_1} through @code{DAY_7}); | |
1435 | |
1436 @item months | |
1437 Return a 12-element vector of month names (locale items @code{MON_1} | |
1438 through @code{MON_12}). | |
1439 | |
1440 @item paper | |
1441 Return a list @code{(@var{width} @var{height})} for the default paper | |
1442 size measured in millimeters (locale items @code{PAPER_WIDTH} and | |
1443 @code{PAPER_HEIGHT}). | |
1444 @end table | |
1445 | |
1446 If the system can't provide the requested information, or if | |
1447 @var{item} is not one of those symbols, the value is @code{nil}. All | |
1448 strings in the return value are decoded using | |
1449 @code{locale-coding-system}. @xref{Locales,,, libc, The GNU Libc Manual}, | |
1450 for more information about locales and locale items. | |
1451 @end defun | |
1452 | |
1453 @ignore | |
1454 arch-tag: be705bf8-941b-4c35-84fc-ad7d20ddb7cb | |
1455 @end ignore |