Mercurial > emacs
comparison lispref/nonascii.texi @ 21006:00022857f529
Initial revision
author | Richard M. Stallman <rms@gnu.org> |
---|---|
date | Sat, 28 Feb 1998 01:49:58 +0000 |
parents | |
children | 90da2489c498 |
comparison
equal
deleted
inserted
replaced
21005:fd60546a64f6 | 21006:00022857f529 |
---|---|
1 @c -*-texinfo-*- | |
2 @c This is part of the GNU Emacs Lisp Reference Manual. | |
3 @c Copyright (C) 1998 Free Software Foundation, Inc. | |
4 @c See the file elisp.texi for copying conditions. | |
5 @setfilename ../info/characters | |
6 @node Non-ASCII Characters, Searching and Matching, Text, Top | |
7 @chapter Non-ASCII Characters | |
8 @cindex multibyte characters | |
9 @cindex non-ASCII characters | |
10 | |
11 This chapter covers the special issues relating to non-@sc{ASCII} | |
12 characters and how they are stored in strings and buffers. | |
13 | |
14 @menu | |
15 * Text Representations:: | |
16 * Converting Representations:: | |
17 * Selecting a Representation:: | |
18 * Character Codes:: | |
19 * Character Sets:: | |
20 * Scanning Charsets:: | |
21 * Chars and Bytes:: | |
22 * Coding Systems:: | |
23 * Default Coding Systems:: | |
24 * Specifying Coding Systems:: | |
25 * Explicit Encoding:: | |
26 @end menu | |
27 | |
28 @node Text Representations | |
29 @section Text Representations | |
30 @cindex text representations | |
31 | |
32 Emacs has two @dfn{text representations}---two ways to represent text | |
33 in a string or buffer. These are called @dfn{unibyte} and | |
34 @dfn{multibyte}. Each string, and each buffer, uses one of these two | |
35 representations. For most purposes, you can ignore the issue of | |
36 representations, because Emacs converts text between them as | |
37 appropriate. Occasionally in Lisp programming you will need to pay | |
38 attention to the difference. | |
39 | |
40 @cindex unibyte text | |
41 In unibyte representation, each character occupies one byte and | |
42 therefore the possible character codes range from 0 to 255. Codes 0 | |
43 through 127 are @sc{ASCII} characters; the codes from 128 through 255 | |
44 are used for one non-@sc{ASCII} character set (you can choose which one | |
45 by setting the variable @code{nonascii-insert-offset}). | |
46 | |
47 @cindex leading code | |
48 @cindex multibyte text | |
49 In multibyte representation, a character may occupy more than one | |
50 byte, and as a result, the full range of Emacs character codes can be | |
51 stored. The first byte of a multibyte character is always in the range | |
52 128 through 159 (octal 0200 through 0237). These values are called | |
53 @dfn{leading codes}. The first byte determines which character set the | |
54 character belongs to (@pxref{Character Sets}); in particular, it | |
55 determines how many bytes long the sequence is. The second and | |
56 subsequent bytes of a multibyte character are always in the range 160 | |
57 through 255 (octal 0240 through 0377). | |
58 | |
59 In a buffer, the buffer-local value of the variable | |
60 @code{enable-multibyte-characters} specifies the representation used. | |
61 The representation for a string is determined based on the string | |
62 contents when the string is constructed. | |
63 | |
64 @tindex enable-multibyte-characters | |
65 @defvar enable-multibyte-characters | |
66 This variable specifies the current buffer's text representation. | |
67 If it is non-@code{nil}, the buffer contains multibyte text; otherwise, | |
68 it contains unibyte text. | |
69 | |
70 @strong{Warning:} do not set this variable directly; instead, use the | |
71 function @code{set-buffer-multibyte} to change a buffer's | |
72 representation. | |
73 @end defvar | |
74 | |
75 @tindex default-enable-multibyte-characters | |
76 @defvar default-enable-multibyte-characters | |
77 This variable`s value is entirely equivalent to @code{(default-value | |
78 'enable-multibyte-characters)}, and setting this variable changes that | |
79 default value. Although setting the local binding of | |
80 @code{enable-multibyte-characters} in a specific buffer is dangerous, | |
81 changing the default value is safe, and it is a reasonable thing to do. | |
82 | |
83 The @samp{--unibyte} command line option does its job by setting the | |
84 default value to @code{nil} early in startup. | |
85 @end defvar | |
86 | |
87 @tindex multibyte-string-p | |
88 @defun multibyte-string-p string | |
89 Return @code{t} if @var{string} contains multibyte characters. | |
90 @end defun | |
91 | |
92 @node Converting Representations | |
93 @section Converting Text Representations | |
94 | |
95 Emacs can convert unibyte text to multibyte; it can also convert | |
96 multibyte text to unibyte, though this conversion loses information. In | |
97 general these conversions happen when inserting text into a buffer, or | |
98 when putting text from several strings together in one string. You can | |
99 also explicitly convert a string's contents to either representation. | |
100 | |
101 Emacs chooses the representation for a string based on the text that | |
102 it is constructed from. The general rule is to convert unibyte text to | |
103 multibyte text when combining it with other multibyte text, because the | |
104 multibyte representation is more general and can hold whatever | |
105 characters the unibyte text has. | |
106 | |
107 When inserting text into a buffer, Emacs converts the text to the | |
108 buffer's representation, as specified by | |
109 @code{enable-multibyte-characters} in that buffer. In particular, when | |
110 you insert multibyte text into a unibyte buffer, Emacs converts the text | |
111 to unibyte, even though this conversion cannot in general preserve all | |
112 the characters that might be in the multibyte text. The other natural | |
113 alternative, to convert the buffer contents to multibyte, is not | |
114 acceptable because the buffer's representation is a choice made by the | |
115 user that cannot simply be overrided. | |
116 | |
117 Converting unibyte text to multibyte text leaves @sc{ASCII} characters | |
118 unchanged. It converts the non-@sc{ASCII} codes 128 through 255 by | |
119 adding the value @code{nonascii-insert-offset} to each character code. | |
120 By setting this variable, you specify which character set the unibyte | |
121 characters correspond to. For example, if @code{nonascii-insert-offset} | |
122 is 2048, which is @code{(- (make-char 'latin-iso8859-1 0) 128)}, then | |
123 the unibyte non-@sc{ASCII} characters correspond to Latin 1. If it is | |
124 2688, which is @code{(- (make-char 'greek-iso8859-7 0) 128)}, then they | |
125 correspond to Greek letters. | |
126 | |
127 Converting multibyte text to unibyte is simpler: it performs | |
128 logical-and of each character code with 255. If | |
129 @code{nonascii-insert-offset} has a reasonable value, corresponding to | |
130 the beginning of some character set, this conversion is the inverse of | |
131 the other: converting unibyte text to multibyte and back to unibyte | |
132 reproduces the original unibyte text. | |
133 | |
134 @tindex nonascii-insert-offset | |
135 @defvar nonascii-insert-offset | |
136 This variable specifies the amount to add to a non-@sc{ASCII} character | |
137 when converting unibyte text to multibyte. It also applies when | |
138 @code{insert-char} or @code{self-insert-command} inserts a character in | |
139 the unibyte non-@sc{ASCII} range, 128 through 255. | |
140 | |
141 The right value to use to select character set @var{cs} is @code{(- | |
142 (make-char @var{cs} 0) 128)}. If the value of | |
143 @code{nonascii-insert-offset} is zero, then conversion actually uses the | |
144 value for the Latin 1 character set, rather than zero. | |
145 @end defvar | |
146 | |
147 @tindex nonascii-translate-table | |
148 @defvar nonascii-translate-table | |
149 This variable provides a more general alternative to | |
150 @code{nonascii-insert-offset}. You can use it to specify independently | |
151 how to translate each code in the range of 128 through 255 into a | |
152 multibyte character. The value should be a vector, or @code{nil}. | |
153 @end defvar | |
154 | |
155 @tindex string-make-unibyte | |
156 @defun string-make-unibyte string | |
157 This function converts the text of @var{string} to unibyte | |
158 representation, if it isn't already, and return the result. If | |
159 conversion does not change the contents, the value may be @var{string} | |
160 itself. | |
161 @end defun | |
162 | |
163 @tindex string-make-multibyte | |
164 @defun string-make-multibyte string | |
165 This function converts the text of @var{string} to multibyte | |
166 representation, if it isn't already, and return the result. If | |
167 conversion does not change the contents, the value may be @var{string} | |
168 itself. | |
169 @end defun | |
170 | |
171 @node Selecting a Representation | |
172 @section Selecting a Representation | |
173 | |
174 Sometimes it is useful to examine an existing buffer or string as | |
175 multibyte when it was unibyte, or vice versa. | |
176 | |
177 @tindex set-buffer-multibyte | |
178 @defun set-buffer-multibyte multibyte | |
179 Set the representation type of the current buffer. If @var{multibyte} | |
180 is non-@code{nil}, the buffer becomes multibyte. If @var{multibyte} | |
181 is @code{nil}, the buffer becomes unibyte. | |
182 | |
183 This function leaves the buffer contents unchanged when viewed as a | |
184 sequence of bytes. As a consequence, it can change the contents viewed | |
185 as characters; a sequence of two bytes which is treated as one character | |
186 in multibyte representation will count as two characters in unibyte | |
187 representation. | |
188 | |
189 This function sets @code{enable-multibyte-characters} to record which | |
190 representation is in use. It also adjusts various data in the buffer | |
191 (including its overlays, text properties and markers) so that they | |
192 cover or fall between the same text as they did before. | |
193 @end defun | |
194 | |
195 @tindex string-as-unibyte | |
196 @defun string-as-unibyte string | |
197 This function returns a string with the same bytes as @var{string} but | |
198 treating each byte as a character. This means that the value may have | |
199 more characters than @var{string} has. | |
200 | |
201 If @var{string} is unibyte already, then the value may be @var{string} | |
202 itself. | |
203 @end defun | |
204 | |
205 @tindex string-as-multibyte | |
206 @defun string-as-multibyte string | |
207 This function returns a string with the same bytes as @var{string} but | |
208 treating each multibyte sequence as one character. This means that the | |
209 value may have fewer characters than @var{string} has. | |
210 | |
211 If @var{string} is multibyte already, then the value may be @var{string} | |
212 itself. | |
213 @end defun | |
214 | |
215 @node Character Codes | |
216 @section Character Codes | |
217 @cindex character codes | |
218 | |
219 The unibyte and multibyte text representations use different character | |
220 codes. The valid character codes for unibyte representation range from | |
221 0 to 255---the values that can fit in one byte. The valid character | |
222 codes for multibyte representation range from 0 to 524287, but not all | |
223 values in that range are valid. In particular, the values 128 through | |
224 255 are not valid in multibyte text. Only the @sc{ASCII} codes 0 | |
225 through 127 are used in both representations. | |
226 | |
227 @defun char-valid-p charcode | |
228 This returns @code{t} if @var{charcode} is valid for either one of the two | |
229 text representations. | |
230 | |
231 @example | |
232 (char-valid-p 65) | |
233 @result{} t | |
234 (char-valid-p 256) | |
235 @result{} nil | |
236 (char-valid-p 2248) | |
237 @result{} t | |
238 @end example | |
239 @end defun | |
240 | |
241 @node Character Sets | |
242 @section Character Sets | |
243 @cindex character sets | |
244 | |
245 Emacs classifies characters into various @dfn{character sets}, each of | |
246 which has a name which is a symbol. Each character belongs to one and | |
247 only one character set. | |
248 | |
249 In general, there is one character set for each distinct script. For | |
250 example, @code{latin-iso8859-1} is one character set, | |
251 @code{greek-iso8859-7} is another, and @code{ascii} is another. An | |
252 Emacs character set can hold at most 9025 characters; therefore. in some | |
253 cases, a set of characters that would logically be grouped together are | |
254 split into several character sets. For example, one set of Chinese | |
255 characters is divided into eight Emacs character sets, | |
256 @code{chinese-cns11643-1} through @code{chinese-cns11643-7}. | |
257 | |
258 @tindex charsetp | |
259 @defun charsetp object | |
260 Return @code{t} if @var{object} is a character set name symbol, | |
261 @code{nil} otherwise. | |
262 @end defun | |
263 | |
264 @tindex charset-list | |
265 @defun charset-list | |
266 This function returns a list of all defined character set names. | |
267 @end defun | |
268 | |
269 @tindex char-charset | |
270 @defun char-charset character | |
271 This function returns the the name of the character | |
272 set that @var{character} belongs to. | |
273 @end defun | |
274 | |
275 @node Scanning Charsets | |
276 @section Scanning for Character Sets | |
277 | |
278 Sometimes it is useful to find out which character sets appear in a | |
279 part of a buffer or a string. One use for this is in determining which | |
280 coding systems (@pxref{Coding Systems}) are capable of representing all | |
281 of the text in question. | |
282 | |
283 @tindex find-charset-region | |
284 @defun find-charset-region beg end &optional unification | |
285 This function returns a list of the character sets | |
286 that appear in the current buffer between positions @var{beg} | |
287 and @var{end}. | |
288 @end defun | |
289 | |
290 @tindex find-charset-string | |
291 @defun find-charset-string string &optional unification | |
292 This function returns a list of the character sets | |
293 that appear in the string @var{string}. | |
294 @end defun | |
295 | |
296 @node Chars and Bytes | |
297 @section Characters and Bytes | |
298 @cindex bytes and characters | |
299 | |
300 In multibyte representation, each character occupies one or more | |
301 bytes. The functions in this section convert between characters and the | |
302 byte values used to represent them. | |
303 | |
304 @tindex char-bytes | |
305 @defun char-bytes character | |
306 This function returns the number of bytes used to represent the | |
307 character @var{character}. In most cases, this is the same as | |
308 @code{(length (split-char @var{character}))}; the only exception is for | |
309 ASCII characters, which use just one byte. | |
310 | |
311 @example | |
312 (char-bytes 2248) | |
313 @result{} 2 | |
314 (char-bytes 65) | |
315 @result{} 1 | |
316 @end example | |
317 | |
318 This function's values are correct for both multibyte and unibyte | |
319 representations, because the non-@sc{ASCII} character codes used in | |
320 those two representations do not overlap. | |
321 | |
322 @example | |
323 (char-bytes 192) | |
324 @result{} 1 | |
325 @end example | |
326 @end defun | |
327 | |
328 @tindex split-char | |
329 @defun split-char character | |
330 Return a list containing the name of the character set of | |
331 @var{character}, followed by one or two byte-values which identify | |
332 @var{character} within that character set. | |
333 | |
334 @example | |
335 (split-char 2248) | |
336 @result{} (latin-iso8859-1 72) | |
337 (split-char 65) | |
338 @result{} (ascii 65) | |
339 @end example | |
340 | |
341 Unibyte non-@sc{ASCII} characters are considered as part of | |
342 the @code{ascii} character set: | |
343 | |
344 @example | |
345 (split-char 192) | |
346 @result{} (ascii 192) | |
347 @end example | |
348 @end defun | |
349 | |
350 @tindex make-char | |
351 @defun make-char charset &rest byte-values | |
352 Thus function returns the character in character set @var{charset} | |
353 identified by @var{byte-values}. This is roughly the opposite of | |
354 split-char. | |
355 | |
356 @example | |
357 (make-char 'latin-iso8859-1 72) | |
358 @result{} 2248 | |
359 @end example | |
360 @end defun | |
361 | |
362 @node Coding Systems | |
363 @section Coding Systems | |
364 | |
365 @cindex coding system | |
366 When Emacs reads or writes a file, and when Emacs sends text to a | |
367 subprocess or receives text from a subprocess, it normally performs | |
368 character code conversion and end-of-line conversion as specified | |
369 by a particular @dfn{coding system}. | |
370 | |
371 @cindex character code conversion | |
372 @dfn{Character code conversion} involves conversion between the encoding | |
373 used inside Emacs and some other encoding. Emacs supports many | |
374 different encodings, in that it can convert to and from them. For | |
375 example, it can convert text to or from encodings such as Latin 1, Latin | |
376 2, Latin 3, Latin 4, Latin 5, and several variants of ISO 2022. In some | |
377 cases, Emacs supports several alternative encodings for the same | |
378 characters; for example, there are three coding systems for the Cyrillic | |
379 (Russian) alphabet: ISO, Alternativnyj, and KOI8. | |
380 | |
381 @cindex end of line conversion | |
382 @dfn{End of line conversion} handles three different conventions used | |
383 on various systems for end of line. The Unix convention is to use the | |
384 linefeed character (also called newline). The DOS convention is to use | |
385 the two character sequence, carriage-return linefeed, at the end of a | |
386 line. The Mac convention is to use just carriage-return. | |
387 | |
388 Most coding systems specify a particular character code for | |
389 conversion, but some of them leave this unspecified---to be chosen | |
390 heuristically based on the data. | |
391 | |
392 @cindex base coding system | |
393 @cindex variant coding system | |
394 @dfn{Base coding systems} such as @code{latin-1} leave the end-of-line | |
395 conversion unspecified, to be chosen based on the data. @dfn{Variant | |
396 coding systems} such as @code{latin-1-unix}, @code{latin-1-dos} and | |
397 @code{latin-1-mac} specify the end-of-line conversion explicitly as | |
398 well. Each base coding system has three corresponding variants whose | |
399 names are formed by adding @samp{-unix}, @samp{-dos} and @samp{-mac}. | |
400 | |
401 Here are Lisp facilities for working with coding systems; | |
402 | |
403 @tindex coding-system-list | |
404 @defun coding-system-list &optional base-only | |
405 This function returns a list of all coding system names (symbols). If | |
406 @var{base-only} is non-@code{nil}, the value includes only the | |
407 base coding systems. Otherwise, it includes variant coding systems as well. | |
408 @end defun | |
409 | |
410 @tindex coding-system-p | |
411 @defun coding-system-p object | |
412 This function returns @code{t} if @var{object} is a coding system | |
413 name. | |
414 @end defun | |
415 | |
416 @tindex check-coding-system | |
417 @defun check-coding-system coding-system | |
418 This function checks the validity of @var{coding-system}. | |
419 If that is valid, it returns @var{coding-system}. | |
420 Otherwise it signals an error with condition @code{coding-system-error}. | |
421 @end defun | |
422 | |
423 @tindex detect-coding-region | |
424 @defun detect-coding-region start end highest | |
425 This function chooses a plausible coding system for decoding the text | |
426 from @var{start} to @var{end}. This text should be ``raw bytes'' | |
427 (@pxref{Specifying Coding Systems}). | |
428 | |
429 Normally this function returns is a list of coding systems that could | |
430 handle decoding the text that was scanned. They are listed in order of | |
431 decreasing priority, based on the priority specified by the user with | |
432 @code{prefer-coding-system}. But if @var{highest} is non-@code{nil}, | |
433 then the return value is just one coding system, the one that is highest | |
434 in priority. | |
435 @end defun | |
436 | |
437 @tindex detect-coding-string string highest | |
438 @defun detect-coding-string | |
439 This function is like @code{detect-coding-region} except that it | |
440 operates on the contents of @var{string} instead of bytes in the buffer. | |
441 @end defun | |
442 | |
443 @defun find-operation-coding-system operation &rest arguments | |
444 This function returns the coding system to use (by default) for | |
445 performing @var{operation} with @var{arguments}. The value has this | |
446 form: | |
447 | |
448 @example | |
449 (@var{decoding-system} @var{encoding-system}) | |
450 @end example | |
451 | |
452 The first element, @var{decoding-system}, is the coding system to use | |
453 for decoding (in case @var{operation} does decoding), and | |
454 @var{encoding-system} is the coding system for encoding (in case | |
455 @var{operation} does encoding). | |
456 | |
457 The argument @var{operation} should be an Emacs I/O primitive: | |
458 @code{insert-file-contents}, @code{write-region}, @code{call-process}, | |
459 @code{call-process-region}, @code{start-process}, or | |
460 @code{open-network-stream}. | |
461 | |
462 The remaining arguments should be the same arguments that might be given | |
463 to that I/O primitive. Depending on which primitive, one of those | |
464 arguments is selected as the @dfn{target}. For example, if | |
465 @var{operation} does file I/O, whichever argument specifies the file | |
466 name is the target. For subprocess primitives, the process name is the | |
467 target. For @code{open-network-stream}, the target is the service name | |
468 or port number. | |
469 | |
470 This function looks up the target in @code{file-coding-system-alist}, | |
471 @code{process-coding-system-alist}, or | |
472 @code{network-coding-system-alist}, depending on @var{operation}. | |
473 @xref{Default Coding Systems}. | |
474 @end defun | |
475 | |
476 @node Default Coding Systems | |
477 @section Default Coding Systems | |
478 | |
479 These variable specify which coding system to use by default for | |
480 certain files or when running certain subprograms. The idea of these | |
481 variables is that you set them once and for all to the defaults you | |
482 want, and then do not change them again. To specify a particular coding | |
483 system for a particular operation, don't change these variables; | |
484 instead, override them using @code{coding-system-for-read} and | |
485 @code{coding-system-for-write} (@pxref{Specifying Coding Systems}). | |
486 | |
487 @tindex file-coding-system-alist | |
488 @defvar file-coding-system-alist | |
489 This variable is an alist that specifies the coding systems to use for | |
490 reading and writing particular files. Each element has the form | |
491 @code{(@var{pattern} . @var{coding})}, where @var{pattern} is a regular | |
492 expression that matches certain file names. The element applies to file | |
493 names that match @var{pattern}. | |
494 | |
495 The @sc{cdr} of the element, @var{val}, should be either a coding | |
496 system, a cons cell containing two coding systems, or a function symbol. | |
497 If @var{val} is a coding system, that coding system is used for both | |
498 reading the file and writing it. If @var{val} is a cons cell containing | |
499 two coding systems, its @sc{car} specifies the coding system for | |
500 decoding, and its @sc{cdr} specifies the coding system for encoding. | |
501 | |
502 If @var{val} is a function symbol, the function must return a coding | |
503 system or a cons cell containing two coding systems. This value is used | |
504 as described above. | |
505 @end defvar | |
506 | |
507 @tindex process-coding-system-alist | |
508 @defvar process-coding-system-alist | |
509 This variable is an alist specifying which coding systems to use for a | |
510 subprocess, depending on which program is running in the subprocess. It | |
511 works like @code{file-coding-system-alist}, except that @var{pattern} is | |
512 matched against the program name used to start the subprocess. The coding | |
513 system or systems specified in this alist are used to initialize the | |
514 coding systems used for I/O to the subprocess, but you can specify | |
515 other coding systems later using @code{set-process-coding-system}. | |
516 @end defvar | |
517 | |
518 @tindex network-coding-system-alist | |
519 @defvar network-coding-system-alist | |
520 This variable is an alist that specifies the coding system to use for | |
521 network streams. It works much like @code{file-coding-system-alist}, | |
522 with the difference that the @var{pattern} in an elemetn may be either a | |
523 port number or a regular expression. If it is a regular expression, it | |
524 is matched against the network service name used to open the network | |
525 stream. | |
526 @end defvar | |
527 | |
528 @tindex default-process-coding-system | |
529 @defvar default-process-coding-system | |
530 This variable specifies the coding systems to use for subprocess (and | |
531 network stream) input and output, when nothing else specifies what to | |
532 do. | |
533 | |
534 The value should be a cons cell of the form @code{(@var{output-coding} | |
535 . @var{input-coding})}. Here @var{output-coding} applies to output to | |
536 the subprocess, and @var{input-coding} applies to input from it. | |
537 @end defvar | |
538 | |
539 @node Specifying Coding Systems | |
540 @section Specifying a Coding System for One Operation | |
541 | |
542 You can specify the coding system for a specific operation by binding | |
543 the variables @code{coding-system-for-read} and/or | |
544 @code{coding-system-for-write}. | |
545 | |
546 @tindex coding-system-for-read | |
547 @defvar coding-system-for-read | |
548 If this variable is non-@code{nil}, it specifies the coding system to | |
549 use for reading a file, or for input from a synchronous subprocess. | |
550 | |
551 It also applies to any asynchronous subprocess or network stream, but in | |
552 a different way: the value of @code{coding-system-for-read} when you | |
553 start the subprocess or open the network stream specifies the input | |
554 decoding method for that subprocess or network stream. It remains in | |
555 use for that subprocess or network stream unless and until overridden. | |
556 | |
557 The right way to use this variable is to bind it with @code{let} for a | |
558 specific I/O operation. Its global value is normally @code{nil}, and | |
559 you should not globally set it to any other value. Here is an example | |
560 of the right way to use the variable: | |
561 | |
562 @example | |
563 ;; @r{Read the file with no character code conversion.} | |
564 ;; @r{Assume CRLF represents end-of-line.} | |
565 (let ((coding-system-for-write 'emacs-mule-dos)) | |
566 (insert-file-contents filename)) | |
567 @end example | |
568 | |
569 When its value is non-@code{nil}, @code{coding-system-for-read} takes | |
570 precedence all other methods of specifying a coding system to use for | |
571 input, including @code{file-coding-system-alist}, | |
572 @code{process-coding-system-alist} and | |
573 @code{network-coding-system-alist}. | |
574 @end defvar | |
575 | |
576 @tindex coding-system-for-write | |
577 @defvar coding-system-for-write | |
578 This works much like @code{coding-system-for-read}, except that it | |
579 applies to output rather than input. It affects writing to files, | |
580 subprocesses, and net connections. | |
581 | |
582 When a single operation does both input and output, as do | |
583 @code{call-process-region} and @code{start-process}, both | |
584 @code{coding-system-for-read} and @code{coding-system-for-write} | |
585 affect it. | |
586 @end defvar | |
587 | |
588 @tindex last-coding-system-used | |
589 @defvar last-coding-system-used | |
590 All operations that use a coding system set this variable | |
591 to the coding system name that was used. | |
592 @end defvar | |
593 | |
594 @tindex inhibit-eol-conversion | |
595 @defvar inhibit-eol-conversion | |
596 When this variable is non-@code{nil}, no end-of-line conversion is done, | |
597 no matter which coding system is specified. This applies to all the | |
598 Emacs I/O and subprocess primitives, and to the explicit encoding and | |
599 decoding functions (@pxref{Explicit Encoding}). | |
600 @end defvar | |
601 | |
602 @tindex keyboard-coding-system | |
603 @defun keyboard-coding-system | |
604 This function returns the coding system that is in use for decoding | |
605 keyboard input---or @code{nil} if no coding system is to be used. | |
606 @end defun | |
607 | |
608 @tindex set-keyboard-coding-system | |
609 @defun set-keyboard-coding-system coding-system | |
610 This function specifies @var{coding-system} as the coding system to | |
611 use for decoding keyboard input. If @var{coding-system} is @code{nil}, | |
612 that means do not decode keyboard input. | |
613 @end defun | |
614 | |
615 @tindex terminal-coding-system | |
616 @defun terminal-coding-system | |
617 This function returns the coding system that is in use for encoding | |
618 terminal output---or @code{nil} for no encoding. | |
619 @end defun | |
620 | |
621 @tindex set-terminal-coding-system | |
622 @defun set-terminal-coding-system coding-system | |
623 This function specifies @var{coding-system} as the coding system to use | |
624 for encoding terminal output. If @var{coding-system} is @code{nil}, | |
625 that means do not encode terminal output. | |
626 @end defun | |
627 | |
628 See also the functions @code{process-coding-system} and | |
629 @code{set-process-coding-system}. @xref{Process Information}. | |
630 | |
631 See also @code{read-coding-system} in @ref{High-Level Completion}. | |
632 | |
633 @node Explicit Encoding | |
634 @section Explicit Encoding and Decoding | |
635 @cindex encoding text | |
636 @cindex decoding text | |
637 | |
638 All the operations that transfer text in and out of Emacs have the | |
639 ability to use a coding system to encode or decode the text. | |
640 You can also explicitly encode and decode text using the functions | |
641 in this section. | |
642 | |
643 @cindex raw bytes | |
644 The result of encoding, and the input to decoding, are not ordinary | |
645 text. They are ``raw bytes''---bytes that represent text in the same | |
646 way that an external file would. When a buffer contains raw bytes, it | |
647 is most natural to mark that buffer as using unibyte representation, | |
648 using @code{set-buffer-multibyte} (@pxref{Selecting a Representation}), | |
649 but this is not required. | |
650 | |
651 The usual way to get raw bytes in a buffer, for explicit decoding, is | |
652 to read them with from a file with @code{insert-file-contents-literally} | |
653 (@pxref{Reading from Files}) or specify a non-@code{nil} @var{rawfile} | |
654 arguments when visiting a file with @code{find-file-noselect}. | |
655 | |
656 The usual way to use the raw bytes that result from explicitly | |
657 encoding text is to copy them to a file or process---for example, to | |
658 write it with @code{write-region} (@pxref{Writing to Files}), and | |
659 suppress encoding for that @code{write-region} call by binding | |
660 @code{coding-system-for-write} to @code{no-conversion}. | |
661 | |
662 @tindex encode-coding-region | |
663 @defun encode-coding-region start end coding-system | |
664 This function encodes the text from @var{start} to @var{end} according | |
665 to coding system @var{coding-system}. The encoded text replaces | |
666 the original text in the buffer. The result of encoding is | |
667 ``raw bytes.'' | |
668 @end defun | |
669 | |
670 @tindex encode-coding-string | |
671 @defun encode-coding-string string coding-system | |
672 This function encodes the text in @var{string} according to coding | |
673 system @var{coding-system}. It returns a new string containing the | |
674 encoded text. The result of encoding is ``raw bytes.'' | |
675 @end defun | |
676 | |
677 @tindex decode-coding-region | |
678 @defun decode-coding-region start end coding-system | |
679 This function decodes the text from @var{start} to @var{end} according | |
680 to coding system @var{coding-system}. The decoded text replaces the | |
681 original text in the buffer. To make explicit decoding useful, the text | |
682 before decoding ought to be ``raw bytes.'' | |
683 @end defun | |
684 | |
685 @tindex decode-coding-string | |
686 @defun decode-coding-string string coding-system | |
687 This function decodes the text in @var{string} according to coding | |
688 system @var{coding-system}. It returns a new string containing the | |
689 decoded text. To make explicit decoding useful, the contents of | |
690 @var{string} ought to be ``raw bytes.'' | |
691 @end defun |