comparison lispref/nonascii.texi @ 88155:d7ddb3e565de

sync with trunk
author Henrik Enberg <henrik.enberg@telia.com>
date Mon, 16 Jan 2006 00:03:54 +0000
parents 23a1cea22d13
children
comparison
equal deleted inserted replaced
88154:8ce476d3ba36 88155:d7ddb3e565de
1 @c -*-texinfo-*- 1 @c -*-texinfo-*-
2 @c This is part of the GNU Emacs Lisp Reference Manual. 2 @c This is part of the GNU Emacs Lisp Reference Manual.
3 @c Copyright (C) 1998, 1999 Free Software Foundation, Inc. 3 @c Copyright (C) 1998, 1999, 2002, 2003, 2004,
4 @c 2005 Free Software Foundation, Inc.
4 @c See the file elisp.texi for copying conditions. 5 @c See the file elisp.texi for copying conditions.
5 @setfilename ../info/characters 6 @setfilename ../info/characters
6 @node Non-ASCII Characters, Searching and Matching, Text, Top 7 @node Non-ASCII Characters, Searching and Matching, Text, Top
7 @chapter Non-@sc{ascii} Characters 8 @chapter Non-@acronym{ASCII} Characters
8 @cindex multibyte characters 9 @cindex multibyte characters
9 @cindex non-@sc{ascii} characters 10 @cindex non-@acronym{ASCII} characters
10 11
11 This chapter covers the special issues relating to non-@sc{ascii} 12 This chapter covers the special issues relating to non-@acronym{ASCII}
12 characters and how they are stored in strings and buffers. 13 characters and how they are stored in strings and buffers.
13 14
14 @menu 15 @menu
15 * Text Representations:: Unibyte and multibyte representations 16 * Text Representations:: Unibyte and multibyte representations
16 * Converting Representations:: Converting unibyte to multibyte and vice versa. 17 * Converting Representations:: Converting unibyte to multibyte and vice versa.
17 * Selecting a Representation:: Treating a byte sequence as unibyte or multi. 18 * Selecting a Representation:: Treating a byte sequence as unibyte or multi.
18 * Character Codes:: How unibyte and multibyte relate to 19 * Character Codes:: How unibyte and multibyte relate to
19 codes of individual characters. 20 codes of individual characters.
20 * Character Sets:: The space of possible characters codes 21 * Character Sets:: The space of possible character codes
21 is divided into various character sets. 22 is divided into various character sets.
22 * Chars and Bytes:: More information about multibyte encodings. 23 * Chars and Bytes:: More information about multibyte encodings.
23 * Splitting Characters:: Converting a character to its byte sequence. 24 * Splitting Characters:: Converting a character to its byte sequence.
24 * Scanning Charsets:: Which character sets are used in a buffer? 25 * Scanning Charsets:: Which character sets are used in a buffer?
25 * Translation of Characters:: Translation tables are used for conversion. 26 * Translation of Characters:: Translation tables are used for conversion.
42 attention to the difference. 43 attention to the difference.
43 44
44 @cindex unibyte text 45 @cindex unibyte text
45 In unibyte representation, each character occupies one byte and 46 In unibyte representation, each character occupies one byte and
46 therefore the possible character codes range from 0 to 255. Codes 0 47 therefore the possible character codes range from 0 to 255. Codes 0
47 through 127 are @sc{ascii} characters; the codes from 128 through 255 48 through 127 are @acronym{ASCII} characters; the codes from 128 through 255
48 are used for one non-@sc{ascii} character set (you can choose which 49 are used for one non-@acronym{ASCII} character set (you can choose which
49 character set by setting the variable @code{nonascii-insert-offset}). 50 character set by setting the variable @code{nonascii-insert-offset}).
50 51
51 @cindex leading code 52 @cindex leading code
52 @cindex multibyte text 53 @cindex multibyte text
53 @cindex trailing codes 54 @cindex trailing codes
93 default value to @code{nil} early in startup. 94 default value to @code{nil} early in startup.
94 @end defvar 95 @end defvar
95 96
96 @defun position-bytes position 97 @defun position-bytes position
97 @tindex position-bytes 98 @tindex position-bytes
98 Return the byte-position corresponding to buffer position @var{position} 99 Return the byte-position corresponding to buffer position
99 in the current buffer. 100 @var{position} in the current buffer. This is 1 at the start of the
101 buffer, and counts upward in bytes. If @var{position} is out of
102 range, the value is @code{nil}.
100 @end defun 103 @end defun
101 104
102 @defun byte-to-position byte-position 105 @defun byte-to-position byte-position
103 @tindex byte-to-position 106 @tindex byte-to-position
104 Return the buffer position corresponding to byte-position 107 Return the buffer position corresponding to byte-position
105 @var{byte-position} in the current buffer. 108 @var{byte-position} in the current buffer. If @var{byte-position} is
109 out of range, the value is @code{nil}.
106 @end defun 110 @end defun
107 111
108 @defun multibyte-string-p string 112 @defun multibyte-string-p string
109 Return @code{t} if @var{string} is a multibyte string. 113 Return @code{t} if @var{string} is a multibyte string.
110 @end defun 114 @end defun
132 the characters that might be in the multibyte text. The other natural 136 the characters that might be in the multibyte text. The other natural
133 alternative, to convert the buffer contents to multibyte, is not 137 alternative, to convert the buffer contents to multibyte, is not
134 acceptable because the buffer's representation is a choice made by the 138 acceptable because the buffer's representation is a choice made by the
135 user that cannot be overridden automatically. 139 user that cannot be overridden automatically.
136 140
137 Converting unibyte text to multibyte text leaves @sc{ascii} characters 141 Converting unibyte text to multibyte text leaves @acronym{ASCII} characters
138 unchanged, and likewise character codes 128 through 159. It converts 142 unchanged, and likewise character codes 128 through 159. It converts
139 the non-@sc{ascii} codes 160 through 255 by adding the value 143 the non-@acronym{ASCII} codes 160 through 255 by adding the value
140 @code{nonascii-insert-offset} to each character code. By setting this 144 @code{nonascii-insert-offset} to each character code. By setting this
141 variable, you specify which character set the unibyte characters 145 variable, you specify which character set the unibyte characters
142 correspond to (@pxref{Character Sets}). For example, if 146 correspond to (@pxref{Character Sets}). For example, if
143 @code{nonascii-insert-offset} is 2048, which is @code{(- (make-char 147 @code{nonascii-insert-offset} is 2048, which is @code{(- (make-char
144 'latin-iso8859-1) 128)}, then the unibyte non-@sc{ascii} characters 148 'latin-iso8859-1) 128)}, then the unibyte non-@acronym{ASCII} characters
145 correspond to Latin 1. If it is 2688, which is @code{(- (make-char 149 correspond to Latin 1. If it is 2688, which is @code{(- (make-char
146 'greek-iso8859-7) 128)}, then they correspond to Greek letters. 150 'greek-iso8859-7) 128)}, then they correspond to Greek letters.
147 151
148 Converting multibyte text to unibyte is simpler: it discards all but 152 Converting multibyte text to unibyte is simpler: it discards all but
149 the low 8 bits of each character code. If @code{nonascii-insert-offset} 153 the low 8 bits of each character code. If @code{nonascii-insert-offset}
151 set, this conversion is the inverse of the other: converting unibyte 155 set, this conversion is the inverse of the other: converting unibyte
152 text to multibyte and back to unibyte reproduces the original unibyte 156 text to multibyte and back to unibyte reproduces the original unibyte
153 text. 157 text.
154 158
155 @defvar nonascii-insert-offset 159 @defvar nonascii-insert-offset
156 This variable specifies the amount to add to a non-@sc{ascii} character 160 This variable specifies the amount to add to a non-@acronym{ASCII} character
157 when converting unibyte text to multibyte. It also applies when 161 when converting unibyte text to multibyte. It also applies when
158 @code{self-insert-command} inserts a character in the unibyte 162 @code{self-insert-command} inserts a character in the unibyte
159 non-@sc{ascii} range, 128 through 255. However, the functions 163 non-@acronym{ASCII} range, 128 through 255. However, the functions
160 @code{insert} and @code{insert-char} do not perform this conversion. 164 @code{insert} and @code{insert-char} do not perform this conversion.
161 165
162 The right value to use to select character set @var{cs} is @code{(- 166 The right value to use to select character set @var{cs} is @code{(-
163 (make-char @var{cs}) 128)}. If the value of 167 (make-char @var{cs}) 128)}. If the value of
164 @code{nonascii-insert-offset} is zero, then conversion actually uses the 168 @code{nonascii-insert-offset} is zero, then conversion actually uses the
170 @code{nonascii-insert-offset}. You can use it to specify independently 174 @code{nonascii-insert-offset}. You can use it to specify independently
171 how to translate each code in the range of 128 through 255 into a 175 how to translate each code in the range of 128 through 255 into a
172 multibyte character. The value should be a char-table, or @code{nil}. 176 multibyte character. The value should be a char-table, or @code{nil}.
173 If this is non-@code{nil}, it overrides @code{nonascii-insert-offset}. 177 If this is non-@code{nil}, it overrides @code{nonascii-insert-offset}.
174 @end defvar 178 @end defvar
179
180 The next three functions either return the argument @var{string}, or a
181 newly created string with no text properties.
175 182
176 @defun string-make-unibyte string 183 @defun string-make-unibyte string
177 This function converts the text of @var{string} to unibyte 184 This function converts the text of @var{string} to unibyte
178 representation, if it isn't already, and returns the result. If 185 representation, if it isn't already, and returns the result. If
179 @var{string} is a unibyte string, it is returned unchanged. Multibyte 186 @var{string} is a unibyte string, it is returned unchanged. Multibyte
184 @end defun 191 @end defun
185 192
186 @defun string-make-multibyte string 193 @defun string-make-multibyte string
187 This function converts the text of @var{string} to multibyte 194 This function converts the text of @var{string} to multibyte
188 representation, if it isn't already, and returns the result. If 195 representation, if it isn't already, and returns the result. If
189 @var{string} is a multibyte string, it is returned unchanged. 196 @var{string} is a multibyte string or consists entirely of
190 The function @code{unibyte-char-to-multibyte} is used to convert 197 @acronym{ASCII} characters, it is returned unchanged. In particular,
191 each unibyte character to a multibyte character. 198 if @var{string} is unibyte and entirely @acronym{ASCII}, the returned
199 string is unibyte. (When the characters are all @acronym{ASCII},
200 Emacs primitives will treat the string the same way whether it is
201 unibyte or multibyte.) If @var{string} is unibyte and contains
202 non-@acronym{ASCII} characters, the function
203 @code{unibyte-char-to-multibyte} is used to convert each unibyte
204 character to a multibyte character.
205 @end defun
206
207 @defun string-to-multibyte string
208 This function returns a multibyte string containing the same sequence
209 of character codes as @var{string}. Unlike
210 @code{string-make-multibyte}, this function unconditionally returns a
211 multibyte string. If @var{string} is a multibyte string, it is
212 returned unchanged.
213 @end defun
214
215 @defun multibyte-char-to-unibyte char
216 This convert the multibyte character @var{char} to a unibyte
217 character, based on @code{nonascii-translation-table} and
218 @code{nonascii-insert-offset}.
219 @end defun
220
221 @defun unibyte-char-to-multibyte char
222 This convert the unibyte character @var{char} to a multibyte
223 character, based on @code{nonascii-translation-table} and
224 @code{nonascii-insert-offset}.
192 @end defun 225 @end defun
193 226
194 @node Selecting a Representation 227 @node Selecting a Representation
195 @section Selecting a Representation 228 @section Selecting a Representation
196 229
227 more characters than @var{string} has. 260 more characters than @var{string} has.
228 261
229 If @var{string} is already a unibyte string, then the value is 262 If @var{string} is already a unibyte string, then the value is
230 @var{string} itself. Otherwise it is a newly created string, with no 263 @var{string} itself. Otherwise it is a newly created string, with no
231 text properties. If @var{string} is multibyte, any characters it 264 text properties. If @var{string} is multibyte, any characters it
232 contains of charset @var{eight-bit-control} or @var{eight-bit-graphic} 265 contains of charset @code{eight-bit-control} or @code{eight-bit-graphic}
233 are converted to the corresponding single byte. 266 are converted to the corresponding single byte.
234 @end defun 267 @end defun
235 268
236 @defun string-as-multibyte string 269 @defun string-as-multibyte string
237 This function returns a string with the same bytes as @var{string} but 270 This function returns a string with the same bytes as @var{string} but
240 273
241 If @var{string} is already a multibyte string, then the value is 274 If @var{string} is already a multibyte string, then the value is
242 @var{string} itself. Otherwise it is a newly created string, with no 275 @var{string} itself. Otherwise it is a newly created string, with no
243 text properties. If @var{string} is unibyte and contains any individual 276 text properties. If @var{string} is unibyte and contains any individual
244 8-bit bytes (i.e.@: not part of a multibyte form), they are converted to 277 8-bit bytes (i.e.@: not part of a multibyte form), they are converted to
245 the corresponding multibyte character of charset @var{eight-bit-control} 278 the corresponding multibyte character of charset @code{eight-bit-control}
246 or @var{eight-bit-graphic}. 279 or @code{eight-bit-graphic}.
247 @end defun 280 @end defun
248 281
249 @node Character Codes 282 @node Character Codes
250 @section Character Codes 283 @section Character Codes
251 @cindex character codes 284 @cindex character codes
255 0 to 255---the values that can fit in one byte. The valid character 288 0 to 255---the values that can fit in one byte. The valid character
256 codes for multibyte representation range from 0 to 524287, but not all 289 codes for multibyte representation range from 0 to 524287, but not all
257 values in that range are valid. The values 128 through 255 are not 290 values in that range are valid. The values 128 through 255 are not
258 entirely proper in multibyte text, but they can occur if you do explicit 291 entirely proper in multibyte text, but they can occur if you do explicit
259 encoding and decoding (@pxref{Explicit Encoding}). Some other character 292 encoding and decoding (@pxref{Explicit Encoding}). Some other character
260 codes cannot occur at all in multibyte text. Only the @sc{ascii} codes 293 codes cannot occur at all in multibyte text. Only the @acronym{ASCII} codes
261 0 through 127 are completely legitimate in both representations. 294 0 through 127 are completely legitimate in both representations.
262 295
263 @defun char-valid-p charcode &optional genericp 296 @defun char-valid-p charcode &optional genericp
264 This returns @code{t} if @var{charcode} is valid for either one of the two 297 This returns @code{t} if @var{charcode} is valid (either for unibyte
265 text representations. 298 text or for multibyte text).
266 299
267 @example 300 @example
268 (char-valid-p 65) 301 (char-valid-p 65)
269 @result{} t 302 @result{} t
270 (char-valid-p 256) 303 (char-valid-p 256)
271 @result{} nil 304 @result{} nil
272 (char-valid-p 2248) 305 (char-valid-p 2248)
273 @result{} t 306 @result{} t
274 @end example 307 @end example
275 308
276 If the optional argument @var{genericp} is non-nil, this function 309 If the optional argument @var{genericp} is non-@code{nil}, this
277 returns @code{t} if @var{charcode} is a generic character 310 function also returns @code{t} if @var{charcode} is a generic
278 (@pxref{Splitting Characters}). 311 character (@pxref{Splitting Characters}).
279 @end defun 312 @end defun
280 313
281 @node Character Sets 314 @node Character Sets
282 @section Character Sets 315 @section Character Sets
283 @cindex character sets 316 @cindex character sets
293 cases, characters that would logically be grouped together are split 326 cases, characters that would logically be grouped together are split
294 into several character sets. For example, one set of Chinese 327 into several character sets. For example, one set of Chinese
295 characters, generally known as Big 5, is divided into two Emacs 328 characters, generally known as Big 5, is divided into two Emacs
296 character sets, @code{chinese-big5-1} and @code{chinese-big5-2}. 329 character sets, @code{chinese-big5-1} and @code{chinese-big5-2}.
297 330
298 @sc{ascii} characters are in character set @code{ascii}. The 331 @acronym{ASCII} characters are in character set @code{ascii}. The
299 non-@sc{ascii} characters 128 through 159 are in character set 332 non-@acronym{ASCII} characters 128 through 159 are in character set
300 @code{eight-bit-control}, and codes 160 through 255 are in character set 333 @code{eight-bit-control}, and codes 160 through 255 are in character set
301 @code{eight-bit-graphic}. 334 @code{eight-bit-graphic}.
302 335
303 @defun charsetp object 336 @defun charsetp object
304 Returns @code{t} if @var{object} is a symbol that names a character set, 337 Returns @code{t} if @var{object} is a symbol that names a character set,
305 @code{nil} otherwise. 338 @code{nil} otherwise.
306 @end defun 339 @end defun
307 340
341 @defvar charset-list
342 The value is a list of all defined character set names.
343 @end defvar
344
308 @defun charset-list 345 @defun charset-list
309 This function returns a list of all defined character set names. 346 This function returns the value of @code{charset-list}. It is only
347 provided for backward compatibility.
310 @end defun 348 @end defun
311 349
312 @defun char-charset character 350 @defun char-charset character
313 This function returns the name of the character set that @var{character} 351 This function returns the name of the character set that @var{character}
314 belongs to. 352 belongs to, or the symbol @code{unknown} if @var{character} is not a
353 valid character.
315 @end defun 354 @end defun
316 355
317 @defun charset-plist charset 356 @defun charset-plist charset
318 @tindex charset-plist 357 @tindex charset-plist
319 This function returns the charset property list of the character set 358 This function returns the charset property list of the character set
320 @var{charset}. Although @var{charset} is a symbol, this is not the same 359 @var{charset}. Although @var{charset} is a symbol, this is not the same
321 as the property list of that symbol. Charset properties are used for 360 as the property list of that symbol. Charset properties are used for
322 special purposes within Emacs; for example, 361 special purposes within Emacs.
323 @code{preferred-coding-system} helps determine which coding system to 362 @end defun
324 use to encode characters in a charset. 363
325 @end defun 364 @deffn Command list-charset-chars charset
365 This command displays a list of characters in the character set
366 @var{charset}.
367 @end deffn
326 368
327 @node Chars and Bytes 369 @node Chars and Bytes
328 @section Characters and Bytes 370 @section Characters and Bytes
329 @cindex bytes and characters 371 @cindex bytes and characters
330 372
331 @cindex introduction sequence 373 @cindex introduction sequence
332 @cindex dimension (of character set) 374 @cindex dimension (of character set)
333 In multibyte representation, each character occupies one or more 375 In multibyte representation, each character occupies one or more
334 bytes. Each character set has an @dfn{introduction sequence}, which is 376 bytes. Each character set has an @dfn{introduction sequence}, which is
335 normally one or two bytes long. (Exception: the @sc{ascii} character 377 normally one or two bytes long. (Exception: the @code{ascii} character
336 set and the @sc{eight-bit-graphic} character set have a zero-length 378 set and the @code{eight-bit-graphic} character set have a zero-length
337 introduction sequence.) The introduction sequence is the beginning of 379 introduction sequence.) The introduction sequence is the beginning of
338 the byte sequence for any character in the character set. The rest of 380 the byte sequence for any character in the character set. The rest of
339 the character's bytes distinguish it from the other characters in the 381 the character's bytes distinguish it from the other characters in the
340 same character set. Depending on the character set, there are either 382 same character set. Depending on the character set, there are either
341 one or two distinguishing bytes; the number of such bytes is called the 383 one or two distinguishing bytes; the number of such bytes is called the
371 @defun split-char character 413 @defun split-char character
372 Return a list containing the name of the character set of 414 Return a list containing the name of the character set of
373 @var{character}, followed by one or two byte values (integers) which 415 @var{character}, followed by one or two byte values (integers) which
374 identify @var{character} within that character set. The number of byte 416 identify @var{character} within that character set. The number of byte
375 values is the character set's dimension. 417 values is the character set's dimension.
418
419 If @var{character} is invalid as a character code, @code{split-char}
420 returns a list consisting of the symbol @code{unknown} and @var{character}.
376 421
377 @example 422 @example
378 (split-char 2248) 423 (split-char 2248)
379 @result{} (latin-iso8859-1 72) 424 @result{} (latin-iso8859-1 72)
380 (split-char 65) 425 (split-char 65)
393 438
394 @example 439 @example
395 (make-char 'latin-iso8859-1 72) 440 (make-char 'latin-iso8859-1 72)
396 @result{} 2248 441 @result{} 2248
397 @end example 442 @end example
443
444 Actually, the eighth bit of both @var{code1} and @var{code2} is zeroed
445 before they are used to index @var{charset}. Thus you may use, for
446 instance, an ISO 8859 character code rather than subtracting 128, as
447 is necessary to index the corresponding Emacs charset.
398 @end defun 448 @end defun
399 449
400 @cindex generic characters 450 @cindex generic characters
401 If you call @code{make-char} with no @var{byte-values}, the result is 451 If you call @code{make-char} with no @var{byte-values}, the result is
402 a @dfn{generic character} which stands for @var{charset}. A generic 452 a @dfn{generic character} which stands for @var{charset}. A generic
415 @result{} t 465 @result{} t
416 (split-char 2176) 466 (split-char 2176)
417 @result{} (latin-iso8859-1 0) 467 @result{} (latin-iso8859-1 0)
418 @end example 468 @end example
419 469
420 The character sets @sc{ascii}, @sc{eight-bit-control}, and 470 The character sets @code{ascii}, @code{eight-bit-control}, and
421 @sc{eight-bit-graphic} don't have corresponding generic characters. If 471 @code{eight-bit-graphic} don't have corresponding generic characters. If
422 @var{charset} is one of them and you don't supply @var{code1}, 472 @var{charset} is one of them and you don't supply @var{code1},
423 @code{make-char} returns the character code corresponding to the 473 @code{make-char} returns the character code corresponding to the
424 smallest code in @var{charset}. 474 smallest code in @var{charset}.
425 475
426 @node Scanning Charsets 476 @node Scanning Charsets
428 478
429 Sometimes it is useful to find out which character sets appear in a 479 Sometimes it is useful to find out which character sets appear in a
430 part of a buffer or a string. One use for this is in determining which 480 part of a buffer or a string. One use for this is in determining which
431 coding systems (@pxref{Coding Systems}) are capable of representing all 481 coding systems (@pxref{Coding Systems}) are capable of representing all
432 of the text in question. 482 of the text in question.
483
484 @defun charset-after &optional pos
485 This function return the charset of a character in the current buffer
486 at position @var{pos}. If @var{pos} is omitted or @code{nil}, it
487 defauls to the current value of point. If @var{pos} is out of range,
488 the value is @code{nil}.
489 @end defun
433 490
434 @defun find-charset-region beg end &optional translation 491 @defun find-charset-region beg end &optional translation
435 This function returns a list of the character sets that appear in the 492 This function returns a list of the character sets that appear in the
436 current buffer between positions @var{beg} and @var{end}. 493 current buffer between positions @var{beg} and @var{end}.
437 494
452 @node Translation of Characters 509 @node Translation of Characters
453 @section Translation of Characters 510 @section Translation of Characters
454 @cindex character translation tables 511 @cindex character translation tables
455 @cindex translation tables 512 @cindex translation tables
456 513
457 A @dfn{translation table} specifies a mapping of characters 514 A @dfn{translation table} is a char-table that specifies a mapping
458 into characters. These tables are used in encoding and decoding, and 515 of characters into characters. These tables are used in encoding and
459 for other purposes. Some coding systems specify their own particular 516 decoding, and for other purposes. Some coding systems specify their
460 translation tables; there are also default translation tables which 517 own particular translation tables; there are also default translation
461 apply to all other coding systems. 518 tables which apply to all other coding systems.
519
520 For instance, the coding-system @code{utf-8} has a translation table
521 that maps characters of various charsets (e.g.,
522 @code{latin-iso8859-@var{x}}) into Unicode character sets. This way,
523 it can encode Latin-2 characters into UTF-8. Meanwhile,
524 @code{unify-8859-on-decoding-mode} operates by specifying
525 @code{standard-translation-table-for-decode} to translate
526 Latin-@var{x} characters into corresponding Unicode characters.
462 527
463 @defun make-translation-table &rest translations 528 @defun make-translation-table &rest translations
464 This function returns a translation table based on the argument 529 This function returns a translation table based on the argument
465 @var{translations}. Each element of @var{translations} should be a 530 @var{translations}. Each element of @var{translations} should be a
466 list of elements of the form @code{(@var{from} . @var{to})}; this says 531 list of elements of the form @code{(@var{from} . @var{to})}; this says
472 @var{to-alt}. 537 @var{to-alt}.
473 538
474 You can also map one whole character set into another character set with 539 You can also map one whole character set into another character set with
475 the same dimension. To do this, you specify a generic character (which 540 the same dimension. To do this, you specify a generic character (which
476 designates a character set) for @var{from} (@pxref{Splitting Characters}). 541 designates a character set) for @var{from} (@pxref{Splitting Characters}).
477 In this case, @var{to} should also be a generic character, for another 542 In this case, if @var{to} is also a generic character, its character
478 character set of the same dimension. Then the translation table 543 set should have the same dimension as @var{from}'s. Then the
479 translates each character of @var{from}'s character set into the 544 translation table translates each character of @var{from}'s character
480 corresponding character of @var{to}'s character set. 545 set into the corresponding character of @var{to}'s character set. If
546 @var{from} is a generic character and @var{to} is an ordinary
547 character, then the translation table translates every character of
548 @var{from}'s character set into @var{to}.
481 @end defun 549 @end defun
482 550
483 In decoding, the translation table's translations are applied to the 551 In decoding, the translation table's translations are applied to the
484 characters that result from ordinary decoding. If a coding system has 552 characters that result from ordinary decoding. If a coding system has
485 property @code{character-translation-table-for-decode}, that specifies 553 property @code{translation-table-for-decode}, that specifies the
486 the translation table to use. Otherwise, if 554 translation table to use. (This is a property of the coding system,
487 @code{standard-translation-table-for-decode} is non-@code{nil}, decoding 555 as returned by @code{coding-system-get}, not a property of the symbol
488 uses that table. 556 that is the coding system's name. @xref{Coding System Basics,, Basic
557 Concepts of Coding Systems}.) Otherwise, if
558 @code{standard-translation-table-for-decode} is non-@code{nil},
559 decoding uses that table.
489 560
490 In encoding, the translation table's translations are applied to the 561 In encoding, the translation table's translations are applied to the
491 characters in the buffer, and the result of translation is actually 562 characters in the buffer, and the result of translation is actually
492 encoded. If a coding system has property 563 encoded. If a coding system has property
493 @code{character-translation-table-for-encode}, that specifies the 564 @code{translation-table-for-encode}, that specifies the translation
494 translation table to use. Otherwise the variable 565 table to use. Otherwise the variable
495 @code{standard-translation-table-for-encode} specifies the translation 566 @code{standard-translation-table-for-encode} specifies the translation
496 table. 567 table.
497 568
498 @defvar standard-translation-table-for-decode 569 @defvar standard-translation-table-for-decode
499 This is the default translation table for decoding, for 570 This is the default translation table for decoding, for
501 @end defvar 572 @end defvar
502 573
503 @defvar standard-translation-table-for-encode 574 @defvar standard-translation-table-for-encode
504 This is the default translation table for encoding, for 575 This is the default translation table for encoding, for
505 coding systems that don't specify any other translation table. 576 coding systems that don't specify any other translation table.
577 @end defvar
578
579 @defvar translation-table-for-input
580 Self-inserting characters are translated through this translation
581 table before they are inserted. This variable automatically becomes
582 buffer-local when set.
583
584 @code{set-buffer-file-coding-system} sets this variable so that your
585 keyboard input gets translated into the character sets that the buffer
586 is likely to contain.
506 @end defvar 587 @end defvar
507 588
508 @node Coding Systems 589 @node Coding Systems
509 @section Coding Systems 590 @section Coding Systems
510 591
546 627
547 Most coding systems specify a particular character code for 628 Most coding systems specify a particular character code for
548 conversion, but some of them leave the choice unspecified---to be chosen 629 conversion, but some of them leave the choice unspecified---to be chosen
549 heuristically for each file, based on the data. 630 heuristically for each file, based on the data.
550 631
632 In general, a coding system doesn't guarantee roundtrip identity:
633 decoding a byte sequence using coding system, then encoding the
634 resulting text in the same coding system, can produce a different byte
635 sequence. However, the following coding systems do guarantee that the
636 byte sequence will be the same as what you originally decoded:
637
638 @quotation
639 chinese-big5 chinese-iso-8bit cyrillic-iso-8bit emacs-mule
640 greek-iso-8bit hebrew-iso-8bit iso-latin-1 iso-latin-2 iso-latin-3
641 iso-latin-4 iso-latin-5 iso-latin-8 iso-latin-9 iso-safe
642 japanese-iso-8bit japanese-shift-jis korean-iso-8bit raw-text
643 @end quotation
644
645 Encoding buffer text and then decoding the result can also fail to
646 reproduce the original text. For instance, if you encode Latin-2
647 characters with @code{utf-8} and decode the result using the same
648 coding system, you'll get Unicode characters (of charset
649 @code{mule-unicode-0100-24ff}). If you encode Unicode characters with
650 @code{iso-latin-2} and decode the result with the same coding system,
651 you'll get Latin-2 characters.
652
551 @cindex end of line conversion 653 @cindex end of line conversion
552 @dfn{End of line conversion} handles three different conventions used 654 @dfn{End of line conversion} handles three different conventions used
553 on various systems for representing end of line in files. The Unix 655 on various systems for representing end of line in files. The Unix
554 convention is to use the linefeed character (also called newline). The 656 convention is to use the linefeed character (also called newline). The
555 DOS convention is to use a carriage-return and a linefeed at the end of 657 DOS convention is to use a carriage-return and a linefeed at the end of
604 writing files. The function @code{insert-file-contents} uses 706 writing files. The function @code{insert-file-contents} uses
605 a coding system for decoding the file data, and @code{write-region} 707 a coding system for decoding the file data, and @code{write-region}
606 uses one to encode the buffer contents. 708 uses one to encode the buffer contents.
607 709
608 You can specify the coding system to use either explicitly 710 You can specify the coding system to use either explicitly
609 (@pxref{Specifying Coding Systems}), or implicitly using the defaulting 711 (@pxref{Specifying Coding Systems}), or implicitly using a default
610 mechanism (@pxref{Default Coding Systems}). But these methods may not 712 mechanism (@pxref{Default Coding Systems}). But these methods may not
611 completely specify what to do. For example, they may choose a coding 713 completely specify what to do. For example, they may choose a coding
612 system such as @code{undefined} which leaves the character code 714 system such as @code{undefined} which leaves the character code
613 conversion to be determined from the data. In these cases, the I/O 715 conversion to be determined from the data. In these cases, the I/O
614 operation finishes the job of choosing a coding system. Very often 716 operation finishes the job of choosing a coding system. Very often
615 you will want to find out afterwards which coding system was chosen. 717 you will want to find out afterwards which coding system was chosen.
616 718
617 @defvar buffer-file-coding-system 719 @defvar buffer-file-coding-system
618 This variable records the coding system that was used for visiting the 720 This buffer-local variable records the coding system that was used to visit
619 current buffer. It is used for saving the buffer, and for writing part 721 the current buffer. It is used for saving the buffer, and for writing part
620 of the buffer with @code{write-region}. If the text to be written 722 of the buffer with @code{write-region}. If the text to be written
621 cannot be safely encoded using the coding system specified by this 723 cannot be safely encoded using the coding system specified by this
622 variable, these operations select an alternative encoding by calling 724 variable, these operations select an alternative encoding by calling
623 the function @code{select-safe-coding-system} (@pxref{User-Chosen 725 the function @code{select-safe-coding-system} (@pxref{User-Chosen
624 Coding Systems}). If selecting a different encoding requires to ask 726 Coding Systems}). If selecting a different encoding requires to ask
656 @end defvar 758 @end defvar
657 759
658 The variable @code{selection-coding-system} specifies how to encode 760 The variable @code{selection-coding-system} specifies how to encode
659 selections for the window system. @xref{Window System Selections}. 761 selections for the window system. @xref{Window System Selections}.
660 762
763 @defvar file-name-coding-system
764 The variable @code{file-name-coding-system} specifies the coding
765 system to use for encoding file names. Emacs encodes file names using
766 that coding system for all file operations. If
767 @code{file-name-coding-system} is @code{nil}, Emacs uses a default
768 coding system determined by the selected language environment. In the
769 default language environment, any non-@acronym{ASCII} characters in
770 file names are not encoded specially; they appear in the file system
771 using the internal Emacs representation.
772 @end defvar
773
774 @strong{Warning:} if you change @code{file-name-coding-system} (or
775 the language environment) in the middle of an Emacs session, problems
776 can result if you have already visited files whose names were encoded
777 using the earlier coding system and are handled differently under the
778 new coding system. If you try to save one of these buffers under the
779 visited file name, saving may use the wrong file name, or it may get
780 an error. If such a problem happens, use @kbd{C-x C-w} to specify a
781 new file name for that buffer.
782
661 @node Lisp and Coding Systems 783 @node Lisp and Coding Systems
662 @subsection Coding Systems in Lisp 784 @subsection Coding Systems in Lisp
663 785
664 Here are the Lisp facilities for working with coding systems: 786 Here are the Lisp facilities for working with coding systems:
665 787
670 systems as well. 792 systems as well.
671 @end defun 793 @end defun
672 794
673 @defun coding-system-p object 795 @defun coding-system-p object
674 This function returns @code{t} if @var{object} is a coding system 796 This function returns @code{t} if @var{object} is a coding system
675 name. 797 name or @code{nil}.
676 @end defun 798 @end defun
677 799
678 @defun check-coding-system coding-system 800 @defun check-coding-system coding-system
679 This function checks the validity of @var{coding-system}. 801 This function checks the validity of @var{coding-system}.
680 If that is valid, it returns @var{coding-system}. 802 If that is valid, it returns @var{coding-system}.
685 This function returns a coding system which is like @var{coding-system} 807 This function returns a coding system which is like @var{coding-system}
686 except for its eol conversion, which is specified by @code{eol-type}. 808 except for its eol conversion, which is specified by @code{eol-type}.
687 @var{eol-type} should be @code{unix}, @code{dos}, @code{mac}, or 809 @var{eol-type} should be @code{unix}, @code{dos}, @code{mac}, or
688 @code{nil}. If it is @code{nil}, the returned coding system determines 810 @code{nil}. If it is @code{nil}, the returned coding system determines
689 the end-of-line conversion from the data. 811 the end-of-line conversion from the data.
812
813 @var{eol-type} may also be 0, 1 or 2, standing for @code{unix},
814 @code{dos} and @code{mac}, respectively.
690 @end defun 815 @end defun
691 816
692 @defun coding-system-change-text-conversion eol-coding text-coding 817 @defun coding-system-change-text-conversion eol-coding text-coding
693 This function returns a coding system which uses the end-of-line 818 This function returns a coding system which uses the end-of-line
694 conversion of @var{eol-coding}, and the text conversion of 819 conversion of @var{eol-coding}, and the text conversion of
728 handle decoding the text that was scanned. They are listed in order of 853 handle decoding the text that was scanned. They are listed in order of
729 decreasing priority. But if @var{highest} is non-@code{nil}, then the 854 decreasing priority. But if @var{highest} is non-@code{nil}, then the
730 return value is just one coding system, the one that is highest in 855 return value is just one coding system, the one that is highest in
731 priority. 856 priority.
732 857
733 If the region contains only @sc{ascii} characters, the value 858 If the region contains only @acronym{ASCII} characters, the value
734 is @code{undecided} or @code{(undecided)}. 859 is @code{undecided} or @code{(undecided)}, or a variant specifying
735 @end defun 860 end-of-line conversion, if that can be deduced from the text.
736 861 @end defun
737 @defun detect-coding-string string highest 862
863 @defun detect-coding-string string &optional highest
738 This function is like @code{detect-coding-region} except that it 864 This function is like @code{detect-coding-region} except that it
739 operates on the contents of @var{string} instead of bytes in the buffer. 865 operates on the contents of @var{string} instead of bytes in the buffer.
740 @end defun 866 @end defun
741 867
742 @xref{Process Information}, for how to examine or set the coding 868 @xref{Coding systems for a subprocess,, Process Information}, in
743 systems used for I/O to a subprocess. 869 particular the description of the functions
870 @code{process-coding-system} and @code{set-process-coding-system}, for
871 how to examine or set the coding systems used for I/O to a subprocess.
744 872
745 @node User-Chosen Coding Systems 873 @node User-Chosen Coding Systems
746 @subsection User-Chosen Coding Systems 874 @subsection User-Chosen Coding Systems
747 875
748 @cindex select safe coding system 876 @cindex select safe coding system
749 @defun select-safe-coding-system from to &optional default-coding-system accept-default-p 877 @defun select-safe-coding-system from to &optional default-coding-system accept-default-p file
750 This function selects a coding system for encoding specified text, 878 This function selects a coding system for encoding specified text,
751 asking the user to choose if necessary. Normally the specified text 879 asking the user to choose if necessary. Normally the specified text
752 is the text in the current buffer between @var{from} and @var{to}, 880 is the text in the current buffer between @var{from} and @var{to}. If
753 defaulting to the whole buffer if they are @code{nil}. If @var{from} 881 @var{from} is a string, the string specifies the text to encode, and
754 is a string, the string specifies the text to encode, and @var{to} is 882 @var{to} is ignored.
755 ignored.
756 883
757 If @var{default-coding-system} is non-@code{nil}, that is the first 884 If @var{default-coding-system} is non-@code{nil}, that is the first
758 coding system to try; if that can handle the text, 885 coding system to try; if that can handle the text,
759 @code{select-safe-coding-system} returns that coding system. It can 886 @code{select-safe-coding-system} returns that coding system. It can
760 also be a list of coding systems; then the function tries each of them 887 also be a list of coding systems; then the function tries each of them
761 one by one. After trying all of them, it next tries the user's most 888 one by one. After trying all of them, it next tries the current
762 preferred coding system (@pxref{Recognize Coding, 889 buffer's value of @code{buffer-file-coding-system} (if it is not
763 prefer-coding-system, the description of @code{prefer-coding-system}, 890 @code{undecided}), then the value of
764 emacs, GNU Emacs Manual}), and after that the current buffer's value 891 @code{default-buffer-file-coding-system} and finally the user's most
765 of @code{buffer-file-coding-system} (if it is not @code{undecided}). 892 preferred coding system, which the user can set using the command
893 @code{prefer-coding-system} (@pxref{Recognize Coding,, Recognizing
894 Coding Systems, emacs, The GNU Emacs Manual}).
766 895
767 If one of those coding systems can safely encode all the specified 896 If one of those coding systems can safely encode all the specified
768 text, @code{select-safe-coding-system} chooses it and returns it. 897 text, @code{select-safe-coding-system} chooses it and returns it.
769 Otherwise, it asks the user to choose from a list of coding systems 898 Otherwise, it asks the user to choose from a list of coding systems
770 which can encode all the text, and returns the user's choice. 899 which can encode all the text, and returns the user's choice.
771 900
901 @var{default-coding-system} can also be a list whose first element is
902 t and whose other elements are coding systems. Then, if no coding
903 system in the list can handle the text, @code{select-safe-coding-system}
904 queries the user immediately, without trying any of the three
905 alternatives described above.
906
772 The optional argument @var{accept-default-p}, if non-@code{nil}, 907 The optional argument @var{accept-default-p}, if non-@code{nil},
773 should be a function to determine whether the coding system selected 908 should be a function to determine whether a coding system selected
774 without user interaction is acceptable. If this function returns 909 without user interaction is acceptable. @code{select-safe-coding-system}
775 @code{nil}, the silently selected coding system is rejected, and the 910 calls this function with one argument, the base coding system of the
776 user is asked to select a coding system from a list of possible 911 selected coding system. If @var{accept-default-p} returns @code{nil},
777 candidates. 912 @code{select-safe-coding-system} rejects the silently selected coding
913 system, and asks the user to select a coding system from a list of
914 possible candidates.
778 915
779 @vindex select-safe-coding-system-accept-default-p 916 @vindex select-safe-coding-system-accept-default-p
780 If the variable @code{select-safe-coding-system-accept-default-p} is 917 If the variable @code{select-safe-coding-system-accept-default-p} is
781 non-@code{nil}, its value overrides the value of 918 non-@code{nil}, its value overrides the value of
782 @var{accept-default-p}. 919 @var{accept-default-p}.
920
921 As a final step, before returning the chosen coding system,
922 @code{select-safe-coding-system} checks whether that coding system is
923 consistent with what would be selected if the contents of the region
924 were read from a file. (If not, this could lead to data corruption in
925 a file subsequently re-visited and edited.) Normally,
926 @code{select-safe-coding-system} uses @code{buffer-file-name} as the
927 file for this purpose, but if @var{file} is non-@code{nil}, it uses
928 that file instead (this can be relevant for @code{write-region} and
929 similar functions). If it detects an apparent inconsistency,
930 @code{select-safe-coding-system} queries the user before selecting the
931 coding system.
783 @end defun 932 @end defun
784 933
785 Here are two functions you can use to let the user specify a coding 934 Here are two functions you can use to let the user specify a coding
786 system, with completion. @xref{Completion}. 935 system, with completion. @xref{Completion}.
787 936
838 that coding system is used for both reading the file and writing it. If 987 that coding system is used for both reading the file and writing it. If
839 @var{coding} is a cons cell containing two coding systems, its @sc{car} 988 @var{coding} is a cons cell containing two coding systems, its @sc{car}
840 specifies the coding system for decoding, and its @sc{cdr} specifies the 989 specifies the coding system for decoding, and its @sc{cdr} specifies the
841 coding system for encoding. 990 coding system for encoding.
842 991
843 If @var{coding} is a function name, the function must return a coding 992 If @var{coding} is a function name, the function should take one
844 system or a cons cell containing two coding systems. This value is used 993 argument, a list of all arguments passed to
845 as described above. 994 @code{find-operation-coding-system}. It must return a coding system
995 or a cons cell containing two coding systems. This value has the same
996 meaning as described above.
846 @end defvar 997 @end defvar
847 998
848 @defvar process-coding-system-alist 999 @defvar process-coding-system-alist
849 This variable is an alist specifying which coding systems to use for a 1000 This variable is an alist specifying which coding systems to use for a
850 subprocess, depending on which program is running in the subprocess. It 1001 subprocess, depending on which program is running in the subprocess. It
885 The value should be a cons cell of the form @code{(@var{input-coding} 1036 The value should be a cons cell of the form @code{(@var{input-coding}
886 . @var{output-coding})}. Here @var{input-coding} applies to input from 1037 . @var{output-coding})}. Here @var{input-coding} applies to input from
887 the subprocess, and @var{output-coding} applies to output to it. 1038 the subprocess, and @var{output-coding} applies to output to it.
888 @end defvar 1039 @end defvar
889 1040
1041 @defvar auto-coding-functions
1042 This variable holds a list of functions that try to determine a
1043 coding system for a file based on its undecoded contents.
1044
1045 Each function in this list should be written to look at text in the
1046 current buffer, but should not modify it in any way. The buffer will
1047 contain undecoded text of parts of the file. Each function should
1048 take one argument, @var{size}, which tells it how many characters to
1049 look at, starting from point. If the function succeeds in determining
1050 a coding system for the file, it should return that coding system.
1051 Otherwise, it should return @code{nil}.
1052
1053 If a file has a @samp{coding:} tag, that takes precedence, so these
1054 functions won't be called.
1055 @end defvar
1056
890 @defun find-operation-coding-system operation &rest arguments 1057 @defun find-operation-coding-system operation &rest arguments
891 This function returns the coding system to use (by default) for 1058 This function returns the coding system to use (by default) for
892 performing @var{operation} with @var{arguments}. The value has this 1059 performing @var{operation} with @var{arguments}. The value has this
893 form: 1060 form:
894 1061
895 @example 1062 @example
896 (@var{decoding-system} @var{encoding-system}) 1063 (@var{decoding-system} . @var{encoding-system})
897 @end example 1064 @end example
898 1065
899 The first element, @var{decoding-system}, is the coding system to use 1066 The first element, @var{decoding-system}, is the coding system to use
900 for decoding (in case @var{operation} does decoding), and 1067 for decoding (in case @var{operation} does decoding), and
901 @var{encoding-system} is the coding system for encoding (in case 1068 @var{encoding-system} is the coding system for encoding (in case
902 @var{operation} does encoding). 1069 @var{operation} does encoding).
903 1070
904 The argument @var{operation} should be a symbol, one of 1071 The argument @var{operation} should be a symbol, any one of
905 @code{insert-file-contents}, @code{write-region}, @code{call-process}, 1072 @code{insert-file-contents}, @code{write-region},
906 @code{call-process-region}, @code{start-process}, or 1073 @code{start-process}, @code{call-process}, @code{call-process-region},
907 @code{open-network-stream}. These are the names of the Emacs I/O primitives 1074 or @code{open-network-stream}. These are the names of the Emacs I/O
908 that can do coding system conversion. 1075 primitives that can do coding system conversion.
909 1076
910 The remaining arguments should be the same arguments that might be given 1077 The remaining arguments should be the same arguments that might be given
911 to that I/O primitive. Depending on the primitive, one of those 1078 to that I/O primitive. Depending on the primitive, one of those
912 arguments is selected as the @dfn{target}. For example, if 1079 arguments is selected as the @dfn{target}. For example, if
913 @var{operation} does file I/O, whichever argument specifies the file 1080 @var{operation} does file I/O, whichever argument specifies the file
914 name is the target. For subprocess primitives, the process name is the 1081 name is the target. For subprocess primitives, the process name is the
915 target. For @code{open-network-stream}, the target is the service name 1082 target. For @code{open-network-stream}, the target is the service name
916 or port number. 1083 or port number.
917 1084
918 This function looks up the target in @code{file-coding-system-alist}, 1085 Depending on @var{operation}, this function looks up the target in
919 @code{process-coding-system-alist}, or 1086 @code{file-coding-system-alist}, @code{process-coding-system-alist},
920 @code{network-coding-system-alist}, depending on @var{operation}. 1087 or @code{network-coding-system-alist}.
921 @xref{Default Coding Systems}.
922 @end defun 1088 @end defun
923 1089
924 @node Specifying Coding Systems 1090 @node Specifying Coding Systems
925 @subsection Specifying a Coding System for One Operation 1091 @subsection Specifying a Coding System for One Operation
926 1092
943 you should not globally set it to any other value. Here is an example 1109 you should not globally set it to any other value. Here is an example
944 of the right way to use the variable: 1110 of the right way to use the variable:
945 1111
946 @example 1112 @example
947 ;; @r{Read the file with no character code conversion.} 1113 ;; @r{Read the file with no character code conversion.}
948 ;; @r{Assume @sc{crlf} represents end-of-line.} 1114 ;; @r{Assume @acronym{crlf} represents end-of-line.}
949 (let ((coding-system-for-write 'emacs-mule-dos)) 1115 (let ((coding-system-for-read 'emacs-mule-dos))
950 (insert-file-contents filename)) 1116 (insert-file-contents filename))
951 @end example 1117 @end example
952 1118
953 When its value is non-@code{nil}, @code{coding-system-for-read} takes 1119 When its value is non-@code{nil}, @code{coding-system-for-read} takes
954 precedence over all other methods of specifying a coding system to use for 1120 precedence over all other methods of specifying a coding system to use for
1008 Here are the functions to perform explicit encoding or decoding. The 1174 Here are the functions to perform explicit encoding or decoding. The
1009 decoding functions produce sequences of bytes; the encoding functions 1175 decoding functions produce sequences of bytes; the encoding functions
1010 are meant to operate on sequences of bytes. All of these functions 1176 are meant to operate on sequences of bytes. All of these functions
1011 discard text properties. 1177 discard text properties.
1012 1178
1013 @defun encode-coding-region start end coding-system 1179 @deffn Command encode-coding-region start end coding-system
1014 This function encodes the text from @var{start} to @var{end} according 1180 This command encodes the text from @var{start} to @var{end} according
1015 to coding system @var{coding-system}. The encoded text replaces the 1181 to coding system @var{coding-system}. The encoded text replaces the
1016 original text in the buffer. The result of encoding is logically a 1182 original text in the buffer. The result of encoding is logically a
1017 sequence of bytes, but the buffer remains multibyte if it was multibyte 1183 sequence of bytes, but the buffer remains multibyte if it was multibyte
1018 before. 1184 before.
1019 @end defun 1185
1020 1186 This command returns the length of the encoded text.
1021 @defun encode-coding-string string coding-system 1187 @end deffn
1188
1189 @defun encode-coding-string string coding-system &optional nocopy
1022 This function encodes the text in @var{string} according to coding 1190 This function encodes the text in @var{string} according to coding
1023 system @var{coding-system}. It returns a new string containing the 1191 system @var{coding-system}. It returns a new string containing the
1024 encoded text. The result of encoding is a unibyte string. 1192 encoded text, except when @var{nocopy} is non-@code{nil}, in which
1025 @end defun 1193 case the function may return @var{string} itself if the encoding
1026 1194 operation is trivial. The result of encoding is a unibyte string.
1027 @defun decode-coding-region start end coding-system 1195 @end defun
1028 This function decodes the text from @var{start} to @var{end} according 1196
1197 @deffn Command decode-coding-region start end coding-system
1198 This command decodes the text from @var{start} to @var{end} according
1029 to coding system @var{coding-system}. The decoded text replaces the 1199 to coding system @var{coding-system}. The decoded text replaces the
1030 original text in the buffer. To make explicit decoding useful, the text 1200 original text in the buffer. To make explicit decoding useful, the text
1031 before decoding ought to be a sequence of byte values, but both 1201 before decoding ought to be a sequence of byte values, but both
1032 multibyte and unibyte buffers are acceptable. 1202 multibyte and unibyte buffers are acceptable.
1033 @end defun 1203
1034 1204 This command returns the length of the decoded text.
1035 @defun decode-coding-string string coding-system 1205 @end deffn
1206
1207 @defun decode-coding-string string coding-system &optional nocopy
1036 This function decodes the text in @var{string} according to coding 1208 This function decodes the text in @var{string} according to coding
1037 system @var{coding-system}. It returns a new string containing the 1209 system @var{coding-system}. It returns a new string containing the
1038 decoded text. To make explicit decoding useful, the contents of 1210 decoded text, except when @var{nocopy} is non-@code{nil}, in which
1039 @var{string} ought to be a sequence of byte values, but a multibyte 1211 case the function may return @var{string} itself if the decoding
1212 operation is trivial. To make explicit decoding useful, the contents
1213 of @var{string} ought to be a sequence of byte values, but a multibyte
1040 string is acceptable. 1214 string is acceptable.
1215 @end defun
1216
1217 @defun decode-coding-inserted-region from to filename &optional visit beg end replace
1218 This function decodes the text from @var{from} to @var{to} as if
1219 it were being read from file @var{filename} using @code{insert-file-contents}
1220 using the rest of the arguments provided.
1221
1222 The normal way to use this function is after reading text from a file
1223 without decoding, if you decide you would rather have decoded it.
1224 Instead of deleting the text and reading it again, this time with
1225 decoding, you can call this function.
1041 @end defun 1226 @end defun
1042 1227
1043 @node Terminal I/O Encoding 1228 @node Terminal I/O Encoding
1044 @subsection Terminal I/O Encoding 1229 @subsection Terminal I/O Encoding
1045 1230
1052 @defun keyboard-coding-system 1237 @defun keyboard-coding-system
1053 This function returns the coding system that is in use for decoding 1238 This function returns the coding system that is in use for decoding
1054 keyboard input---or @code{nil} if no coding system is to be used. 1239 keyboard input---or @code{nil} if no coding system is to be used.
1055 @end defun 1240 @end defun
1056 1241
1057 @defun set-keyboard-coding-system coding-system 1242 @deffn Command set-keyboard-coding-system coding-system
1058 This function specifies @var{coding-system} as the coding system to 1243 This command specifies @var{coding-system} as the coding system to
1059 use for decoding keyboard input. If @var{coding-system} is @code{nil}, 1244 use for decoding keyboard input. If @var{coding-system} is @code{nil},
1060 that means do not decode keyboard input. 1245 that means do not decode keyboard input.
1061 @end defun 1246 @end deffn
1062 1247
1063 @defun terminal-coding-system 1248 @defun terminal-coding-system
1064 This function returns the coding system that is in use for encoding 1249 This function returns the coding system that is in use for encoding
1065 terminal output---or @code{nil} for no encoding. 1250 terminal output---or @code{nil} for no encoding.
1066 @end defun 1251 @end defun
1067 1252
1068 @defun set-terminal-coding-system coding-system 1253 @deffn Command set-terminal-coding-system coding-system
1069 This function specifies @var{coding-system} as the coding system to use 1254 This command specifies @var{coding-system} as the coding system to use
1070 for encoding terminal output. If @var{coding-system} is @code{nil}, 1255 for encoding terminal output. If @var{coding-system} is @code{nil},
1071 that means do not encode terminal output. 1256 that means do not encode terminal output.
1072 @end defun 1257 @end deffn
1073 1258
1074 @node MS-DOS File Types 1259 @node MS-DOS File Types
1075 @subsection MS-DOS File Types 1260 @subsection MS-DOS File Types
1076 @cindex DOS file types 1261 @cindex DOS file types
1077 @cindex MS-DOS file types 1262 @cindex MS-DOS file types
1132 1317
1133 @node Input Methods 1318 @node Input Methods
1134 @section Input Methods 1319 @section Input Methods
1135 @cindex input methods 1320 @cindex input methods
1136 1321
1137 @dfn{Input methods} provide convenient ways of entering non-@sc{ascii} 1322 @dfn{Input methods} provide convenient ways of entering non-@acronym{ASCII}
1138 characters from the keyboard. Unlike coding systems, which translate 1323 characters from the keyboard. Unlike coding systems, which translate
1139 non-@sc{ascii} characters to and from encodings meant to be read by 1324 non-@acronym{ASCII} characters to and from encodings meant to be read by
1140 programs, input methods provide human-friendly commands. (@xref{Input 1325 programs, input methods provide human-friendly commands. (@xref{Input
1141 Methods,,, emacs, The GNU Emacs Manual}, for information on how users 1326 Methods,,, emacs, The GNU Emacs Manual}, for information on how users
1142 use input methods to enter text.) How to define input methods is not 1327 use input methods to enter text.) How to define input methods is not
1143 yet documented in this manual, but here we describe how to use them. 1328 yet documented in this manual, but here we describe how to use them.
1144 1329
1150 current buffer. (It automatically becomes local in each buffer when set 1335 current buffer. (It automatically becomes local in each buffer when set
1151 in any fashion.) It is @code{nil} if no input method is active in the 1336 in any fashion.) It is @code{nil} if no input method is active in the
1152 buffer now. 1337 buffer now.
1153 @end defvar 1338 @end defvar
1154 1339
1155 @defvar default-input-method 1340 @defopt default-input-method
1156 This variable holds the default input method for commands that choose an 1341 This variable holds the default input method for commands that choose an
1157 input method. Unlike @code{current-input-method}, this variable is 1342 input method. Unlike @code{current-input-method}, this variable is
1158 normally global. 1343 normally global.
1159 @end defvar 1344 @end defopt
1160 1345
1161 @defun set-input-method input-method 1346 @deffn Command set-input-method input-method
1162 This function activates input method @var{input-method} for the current 1347 This command activates input method @var{input-method} for the current
1163 buffer. It also sets @code{default-input-method} to @var{input-method}. 1348 buffer. It also sets @code{default-input-method} to @var{input-method}.
1164 If @var{input-method} is @code{nil}, this function deactivates any input 1349 If @var{input-method} is @code{nil}, this command deactivates any input
1165 method for the current buffer. 1350 method for the current buffer.
1166 @end defun 1351 @end deffn
1167 1352
1168 @defun read-input-method-name prompt &optional default inhibit-null 1353 @defun read-input-method-name prompt &optional default inhibit-null
1169 This function reads an input method name with the minibuffer, prompting 1354 This function reads an input method name with the minibuffer, prompting
1170 with @var{prompt}. If @var{default} is non-@code{nil}, that is returned 1355 with @var{prompt}. If @var{default} is non-@code{nil}, that is returned
1171 by default, if the user enters empty input. However, if 1356 by default, if the user enters empty input. However, if
1197 active. @var{description} is a string describing this method and what 1382 active. @var{description} is a string describing this method and what
1198 it is good for. 1383 it is good for.
1199 @end defvar 1384 @end defvar
1200 1385
1201 The fundamental interface to input methods is through the 1386 The fundamental interface to input methods is through the
1202 variable @code{input-method-function}. @xref{Reading One Event}. 1387 variable @code{input-method-function}. @xref{Reading One Event},
1388 and @ref{Invoking the Input Method}.
1203 1389
1204 @node Locales 1390 @node Locales
1205 @section Locales 1391 @section Locales
1206 @cindex locale 1392 @cindex locale
1207 1393
1233 Changing the locale can cause messages to appear according to the 1419 Changing the locale can cause messages to appear according to the
1234 conventions of a different language. If the variable is @code{nil}, the 1420 conventions of a different language. If the variable is @code{nil}, the
1235 locale is specified by environment variables in the usual POSIX fashion. 1421 locale is specified by environment variables in the usual POSIX fashion.
1236 @end defvar 1422 @end defvar
1237 1423
1424 @defun locale-info item
1425 This function returns locale data @var{item} for the current POSIX
1426 locale, if available. @var{item} should be one of these symbols:
1427
1428 @table @code
1429 @item codeset
1430 Return the character set as a string (locale item @code{CODESET}).
1431
1432 @item days
1433 Return a 7-element vector of day names (locale items
1434 @code{DAY_1} through @code{DAY_7});
1435
1436 @item months
1437 Return a 12-element vector of month names (locale items @code{MON_1}
1438 through @code{MON_12}).
1439
1440 @item paper
1441 Return a list @code{(@var{width} @var{height})} for the default paper
1442 size measured in millimeters (locale items @code{PAPER_WIDTH} and
1443 @code{PAPER_HEIGHT}).
1444 @end table
1445
1446 If the system can't provide the requested information, or if
1447 @var{item} is not one of those symbols, the value is @code{nil}. All
1448 strings in the return value are decoded using
1449 @code{locale-coding-system}. @xref{Locales,,, libc, The GNU Libc Manual},
1450 for more information about locales and locale items.
1451 @end defun
1452
1453 @ignore
1454 arch-tag: be705bf8-941b-4c35-84fc-ad7d20ddb7cb
1455 @end ignore