comparison lispref/nonascii.texi @ 21006:00022857f529

Initial revision
author Richard M. Stallman <rms@gnu.org>
date Sat, 28 Feb 1998 01:49:58 +0000
parents
children 90da2489c498
comparison
equal deleted inserted replaced
21005:fd60546a64f6 21006:00022857f529
1 @c -*-texinfo-*-
2 @c This is part of the GNU Emacs Lisp Reference Manual.
3 @c Copyright (C) 1998 Free Software Foundation, Inc.
4 @c See the file elisp.texi for copying conditions.
5 @setfilename ../info/characters
6 @node Non-ASCII Characters, Searching and Matching, Text, Top
7 @chapter Non-ASCII Characters
8 @cindex multibyte characters
9 @cindex non-ASCII characters
10
11 This chapter covers the special issues relating to non-@sc{ASCII}
12 characters and how they are stored in strings and buffers.
13
14 @menu
15 * Text Representations::
16 * Converting Representations::
17 * Selecting a Representation::
18 * Character Codes::
19 * Character Sets::
20 * Scanning Charsets::
21 * Chars and Bytes::
22 * Coding Systems::
23 * Default Coding Systems::
24 * Specifying Coding Systems::
25 * Explicit Encoding::
26 @end menu
27
28 @node Text Representations
29 @section Text Representations
30 @cindex text representations
31
32 Emacs has two @dfn{text representations}---two ways to represent text
33 in a string or buffer. These are called @dfn{unibyte} and
34 @dfn{multibyte}. Each string, and each buffer, uses one of these two
35 representations. For most purposes, you can ignore the issue of
36 representations, because Emacs converts text between them as
37 appropriate. Occasionally in Lisp programming you will need to pay
38 attention to the difference.
39
40 @cindex unibyte text
41 In unibyte representation, each character occupies one byte and
42 therefore the possible character codes range from 0 to 255. Codes 0
43 through 127 are @sc{ASCII} characters; the codes from 128 through 255
44 are used for one non-@sc{ASCII} character set (you can choose which one
45 by setting the variable @code{nonascii-insert-offset}).
46
47 @cindex leading code
48 @cindex multibyte text
49 In multibyte representation, a character may occupy more than one
50 byte, and as a result, the full range of Emacs character codes can be
51 stored. The first byte of a multibyte character is always in the range
52 128 through 159 (octal 0200 through 0237). These values are called
53 @dfn{leading codes}. The first byte determines which character set the
54 character belongs to (@pxref{Character Sets}); in particular, it
55 determines how many bytes long the sequence is. The second and
56 subsequent bytes of a multibyte character are always in the range 160
57 through 255 (octal 0240 through 0377).
58
59 In a buffer, the buffer-local value of the variable
60 @code{enable-multibyte-characters} specifies the representation used.
61 The representation for a string is determined based on the string
62 contents when the string is constructed.
63
64 @tindex enable-multibyte-characters
65 @defvar enable-multibyte-characters
66 This variable specifies the current buffer's text representation.
67 If it is non-@code{nil}, the buffer contains multibyte text; otherwise,
68 it contains unibyte text.
69
70 @strong{Warning:} do not set this variable directly; instead, use the
71 function @code{set-buffer-multibyte} to change a buffer's
72 representation.
73 @end defvar
74
75 @tindex default-enable-multibyte-characters
76 @defvar default-enable-multibyte-characters
77 This variable`s value is entirely equivalent to @code{(default-value
78 'enable-multibyte-characters)}, and setting this variable changes that
79 default value. Although setting the local binding of
80 @code{enable-multibyte-characters} in a specific buffer is dangerous,
81 changing the default value is safe, and it is a reasonable thing to do.
82
83 The @samp{--unibyte} command line option does its job by setting the
84 default value to @code{nil} early in startup.
85 @end defvar
86
87 @tindex multibyte-string-p
88 @defun multibyte-string-p string
89 Return @code{t} if @var{string} contains multibyte characters.
90 @end defun
91
92 @node Converting Representations
93 @section Converting Text Representations
94
95 Emacs can convert unibyte text to multibyte; it can also convert
96 multibyte text to unibyte, though this conversion loses information. In
97 general these conversions happen when inserting text into a buffer, or
98 when putting text from several strings together in one string. You can
99 also explicitly convert a string's contents to either representation.
100
101 Emacs chooses the representation for a string based on the text that
102 it is constructed from. The general rule is to convert unibyte text to
103 multibyte text when combining it with other multibyte text, because the
104 multibyte representation is more general and can hold whatever
105 characters the unibyte text has.
106
107 When inserting text into a buffer, Emacs converts the text to the
108 buffer's representation, as specified by
109 @code{enable-multibyte-characters} in that buffer. In particular, when
110 you insert multibyte text into a unibyte buffer, Emacs converts the text
111 to unibyte, even though this conversion cannot in general preserve all
112 the characters that might be in the multibyte text. The other natural
113 alternative, to convert the buffer contents to multibyte, is not
114 acceptable because the buffer's representation is a choice made by the
115 user that cannot simply be overrided.
116
117 Converting unibyte text to multibyte text leaves @sc{ASCII} characters
118 unchanged. It converts the non-@sc{ASCII} codes 128 through 255 by
119 adding the value @code{nonascii-insert-offset} to each character code.
120 By setting this variable, you specify which character set the unibyte
121 characters correspond to. For example, if @code{nonascii-insert-offset}
122 is 2048, which is @code{(- (make-char 'latin-iso8859-1 0) 128)}, then
123 the unibyte non-@sc{ASCII} characters correspond to Latin 1. If it is
124 2688, which is @code{(- (make-char 'greek-iso8859-7 0) 128)}, then they
125 correspond to Greek letters.
126
127 Converting multibyte text to unibyte is simpler: it performs
128 logical-and of each character code with 255. If
129 @code{nonascii-insert-offset} has a reasonable value, corresponding to
130 the beginning of some character set, this conversion is the inverse of
131 the other: converting unibyte text to multibyte and back to unibyte
132 reproduces the original unibyte text.
133
134 @tindex nonascii-insert-offset
135 @defvar nonascii-insert-offset
136 This variable specifies the amount to add to a non-@sc{ASCII} character
137 when converting unibyte text to multibyte. It also applies when
138 @code{insert-char} or @code{self-insert-command} inserts a character in
139 the unibyte non-@sc{ASCII} range, 128 through 255.
140
141 The right value to use to select character set @var{cs} is @code{(-
142 (make-char @var{cs} 0) 128)}. If the value of
143 @code{nonascii-insert-offset} is zero, then conversion actually uses the
144 value for the Latin 1 character set, rather than zero.
145 @end defvar
146
147 @tindex nonascii-translate-table
148 @defvar nonascii-translate-table
149 This variable provides a more general alternative to
150 @code{nonascii-insert-offset}. You can use it to specify independently
151 how to translate each code in the range of 128 through 255 into a
152 multibyte character. The value should be a vector, or @code{nil}.
153 @end defvar
154
155 @tindex string-make-unibyte
156 @defun string-make-unibyte string
157 This function converts the text of @var{string} to unibyte
158 representation, if it isn't already, and return the result. If
159 conversion does not change the contents, the value may be @var{string}
160 itself.
161 @end defun
162
163 @tindex string-make-multibyte
164 @defun string-make-multibyte string
165 This function converts the text of @var{string} to multibyte
166 representation, if it isn't already, and return the result. If
167 conversion does not change the contents, the value may be @var{string}
168 itself.
169 @end defun
170
171 @node Selecting a Representation
172 @section Selecting a Representation
173
174 Sometimes it is useful to examine an existing buffer or string as
175 multibyte when it was unibyte, or vice versa.
176
177 @tindex set-buffer-multibyte
178 @defun set-buffer-multibyte multibyte
179 Set the representation type of the current buffer. If @var{multibyte}
180 is non-@code{nil}, the buffer becomes multibyte. If @var{multibyte}
181 is @code{nil}, the buffer becomes unibyte.
182
183 This function leaves the buffer contents unchanged when viewed as a
184 sequence of bytes. As a consequence, it can change the contents viewed
185 as characters; a sequence of two bytes which is treated as one character
186 in multibyte representation will count as two characters in unibyte
187 representation.
188
189 This function sets @code{enable-multibyte-characters} to record which
190 representation is in use. It also adjusts various data in the buffer
191 (including its overlays, text properties and markers) so that they
192 cover or fall between the same text as they did before.
193 @end defun
194
195 @tindex string-as-unibyte
196 @defun string-as-unibyte string
197 This function returns a string with the same bytes as @var{string} but
198 treating each byte as a character. This means that the value may have
199 more characters than @var{string} has.
200
201 If @var{string} is unibyte already, then the value may be @var{string}
202 itself.
203 @end defun
204
205 @tindex string-as-multibyte
206 @defun string-as-multibyte string
207 This function returns a string with the same bytes as @var{string} but
208 treating each multibyte sequence as one character. This means that the
209 value may have fewer characters than @var{string} has.
210
211 If @var{string} is multibyte already, then the value may be @var{string}
212 itself.
213 @end defun
214
215 @node Character Codes
216 @section Character Codes
217 @cindex character codes
218
219 The unibyte and multibyte text representations use different character
220 codes. The valid character codes for unibyte representation range from
221 0 to 255---the values that can fit in one byte. The valid character
222 codes for multibyte representation range from 0 to 524287, but not all
223 values in that range are valid. In particular, the values 128 through
224 255 are not valid in multibyte text. Only the @sc{ASCII} codes 0
225 through 127 are used in both representations.
226
227 @defun char-valid-p charcode
228 This returns @code{t} if @var{charcode} is valid for either one of the two
229 text representations.
230
231 @example
232 (char-valid-p 65)
233 @result{} t
234 (char-valid-p 256)
235 @result{} nil
236 (char-valid-p 2248)
237 @result{} t
238 @end example
239 @end defun
240
241 @node Character Sets
242 @section Character Sets
243 @cindex character sets
244
245 Emacs classifies characters into various @dfn{character sets}, each of
246 which has a name which is a symbol. Each character belongs to one and
247 only one character set.
248
249 In general, there is one character set for each distinct script. For
250 example, @code{latin-iso8859-1} is one character set,
251 @code{greek-iso8859-7} is another, and @code{ascii} is another. An
252 Emacs character set can hold at most 9025 characters; therefore. in some
253 cases, a set of characters that would logically be grouped together are
254 split into several character sets. For example, one set of Chinese
255 characters is divided into eight Emacs character sets,
256 @code{chinese-cns11643-1} through @code{chinese-cns11643-7}.
257
258 @tindex charsetp
259 @defun charsetp object
260 Return @code{t} if @var{object} is a character set name symbol,
261 @code{nil} otherwise.
262 @end defun
263
264 @tindex charset-list
265 @defun charset-list
266 This function returns a list of all defined character set names.
267 @end defun
268
269 @tindex char-charset
270 @defun char-charset character
271 This function returns the the name of the character
272 set that @var{character} belongs to.
273 @end defun
274
275 @node Scanning Charsets
276 @section Scanning for Character Sets
277
278 Sometimes it is useful to find out which character sets appear in a
279 part of a buffer or a string. One use for this is in determining which
280 coding systems (@pxref{Coding Systems}) are capable of representing all
281 of the text in question.
282
283 @tindex find-charset-region
284 @defun find-charset-region beg end &optional unification
285 This function returns a list of the character sets
286 that appear in the current buffer between positions @var{beg}
287 and @var{end}.
288 @end defun
289
290 @tindex find-charset-string
291 @defun find-charset-string string &optional unification
292 This function returns a list of the character sets
293 that appear in the string @var{string}.
294 @end defun
295
296 @node Chars and Bytes
297 @section Characters and Bytes
298 @cindex bytes and characters
299
300 In multibyte representation, each character occupies one or more
301 bytes. The functions in this section convert between characters and the
302 byte values used to represent them.
303
304 @tindex char-bytes
305 @defun char-bytes character
306 This function returns the number of bytes used to represent the
307 character @var{character}. In most cases, this is the same as
308 @code{(length (split-char @var{character}))}; the only exception is for
309 ASCII characters, which use just one byte.
310
311 @example
312 (char-bytes 2248)
313 @result{} 2
314 (char-bytes 65)
315 @result{} 1
316 @end example
317
318 This function's values are correct for both multibyte and unibyte
319 representations, because the non-@sc{ASCII} character codes used in
320 those two representations do not overlap.
321
322 @example
323 (char-bytes 192)
324 @result{} 1
325 @end example
326 @end defun
327
328 @tindex split-char
329 @defun split-char character
330 Return a list containing the name of the character set of
331 @var{character}, followed by one or two byte-values which identify
332 @var{character} within that character set.
333
334 @example
335 (split-char 2248)
336 @result{} (latin-iso8859-1 72)
337 (split-char 65)
338 @result{} (ascii 65)
339 @end example
340
341 Unibyte non-@sc{ASCII} characters are considered as part of
342 the @code{ascii} character set:
343
344 @example
345 (split-char 192)
346 @result{} (ascii 192)
347 @end example
348 @end defun
349
350 @tindex make-char
351 @defun make-char charset &rest byte-values
352 Thus function returns the character in character set @var{charset}
353 identified by @var{byte-values}. This is roughly the opposite of
354 split-char.
355
356 @example
357 (make-char 'latin-iso8859-1 72)
358 @result{} 2248
359 @end example
360 @end defun
361
362 @node Coding Systems
363 @section Coding Systems
364
365 @cindex coding system
366 When Emacs reads or writes a file, and when Emacs sends text to a
367 subprocess or receives text from a subprocess, it normally performs
368 character code conversion and end-of-line conversion as specified
369 by a particular @dfn{coding system}.
370
371 @cindex character code conversion
372 @dfn{Character code conversion} involves conversion between the encoding
373 used inside Emacs and some other encoding. Emacs supports many
374 different encodings, in that it can convert to and from them. For
375 example, it can convert text to or from encodings such as Latin 1, Latin
376 2, Latin 3, Latin 4, Latin 5, and several variants of ISO 2022. In some
377 cases, Emacs supports several alternative encodings for the same
378 characters; for example, there are three coding systems for the Cyrillic
379 (Russian) alphabet: ISO, Alternativnyj, and KOI8.
380
381 @cindex end of line conversion
382 @dfn{End of line conversion} handles three different conventions used
383 on various systems for end of line. The Unix convention is to use the
384 linefeed character (also called newline). The DOS convention is to use
385 the two character sequence, carriage-return linefeed, at the end of a
386 line. The Mac convention is to use just carriage-return.
387
388 Most coding systems specify a particular character code for
389 conversion, but some of them leave this unspecified---to be chosen
390 heuristically based on the data.
391
392 @cindex base coding system
393 @cindex variant coding system
394 @dfn{Base coding systems} such as @code{latin-1} leave the end-of-line
395 conversion unspecified, to be chosen based on the data. @dfn{Variant
396 coding systems} such as @code{latin-1-unix}, @code{latin-1-dos} and
397 @code{latin-1-mac} specify the end-of-line conversion explicitly as
398 well. Each base coding system has three corresponding variants whose
399 names are formed by adding @samp{-unix}, @samp{-dos} and @samp{-mac}.
400
401 Here are Lisp facilities for working with coding systems;
402
403 @tindex coding-system-list
404 @defun coding-system-list &optional base-only
405 This function returns a list of all coding system names (symbols). If
406 @var{base-only} is non-@code{nil}, the value includes only the
407 base coding systems. Otherwise, it includes variant coding systems as well.
408 @end defun
409
410 @tindex coding-system-p
411 @defun coding-system-p object
412 This function returns @code{t} if @var{object} is a coding system
413 name.
414 @end defun
415
416 @tindex check-coding-system
417 @defun check-coding-system coding-system
418 This function checks the validity of @var{coding-system}.
419 If that is valid, it returns @var{coding-system}.
420 Otherwise it signals an error with condition @code{coding-system-error}.
421 @end defun
422
423 @tindex detect-coding-region
424 @defun detect-coding-region start end highest
425 This function chooses a plausible coding system for decoding the text
426 from @var{start} to @var{end}. This text should be ``raw bytes''
427 (@pxref{Specifying Coding Systems}).
428
429 Normally this function returns is a list of coding systems that could
430 handle decoding the text that was scanned. They are listed in order of
431 decreasing priority, based on the priority specified by the user with
432 @code{prefer-coding-system}. But if @var{highest} is non-@code{nil},
433 then the return value is just one coding system, the one that is highest
434 in priority.
435 @end defun
436
437 @tindex detect-coding-string string highest
438 @defun detect-coding-string
439 This function is like @code{detect-coding-region} except that it
440 operates on the contents of @var{string} instead of bytes in the buffer.
441 @end defun
442
443 @defun find-operation-coding-system operation &rest arguments
444 This function returns the coding system to use (by default) for
445 performing @var{operation} with @var{arguments}. The value has this
446 form:
447
448 @example
449 (@var{decoding-system} @var{encoding-system})
450 @end example
451
452 The first element, @var{decoding-system}, is the coding system to use
453 for decoding (in case @var{operation} does decoding), and
454 @var{encoding-system} is the coding system for encoding (in case
455 @var{operation} does encoding).
456
457 The argument @var{operation} should be an Emacs I/O primitive:
458 @code{insert-file-contents}, @code{write-region}, @code{call-process},
459 @code{call-process-region}, @code{start-process}, or
460 @code{open-network-stream}.
461
462 The remaining arguments should be the same arguments that might be given
463 to that I/O primitive. Depending on which primitive, one of those
464 arguments is selected as the @dfn{target}. For example, if
465 @var{operation} does file I/O, whichever argument specifies the file
466 name is the target. For subprocess primitives, the process name is the
467 target. For @code{open-network-stream}, the target is the service name
468 or port number.
469
470 This function looks up the target in @code{file-coding-system-alist},
471 @code{process-coding-system-alist}, or
472 @code{network-coding-system-alist}, depending on @var{operation}.
473 @xref{Default Coding Systems}.
474 @end defun
475
476 @node Default Coding Systems
477 @section Default Coding Systems
478
479 These variable specify which coding system to use by default for
480 certain files or when running certain subprograms. The idea of these
481 variables is that you set them once and for all to the defaults you
482 want, and then do not change them again. To specify a particular coding
483 system for a particular operation, don't change these variables;
484 instead, override them using @code{coding-system-for-read} and
485 @code{coding-system-for-write} (@pxref{Specifying Coding Systems}).
486
487 @tindex file-coding-system-alist
488 @defvar file-coding-system-alist
489 This variable is an alist that specifies the coding systems to use for
490 reading and writing particular files. Each element has the form
491 @code{(@var{pattern} . @var{coding})}, where @var{pattern} is a regular
492 expression that matches certain file names. The element applies to file
493 names that match @var{pattern}.
494
495 The @sc{cdr} of the element, @var{val}, should be either a coding
496 system, a cons cell containing two coding systems, or a function symbol.
497 If @var{val} is a coding system, that coding system is used for both
498 reading the file and writing it. If @var{val} is a cons cell containing
499 two coding systems, its @sc{car} specifies the coding system for
500 decoding, and its @sc{cdr} specifies the coding system for encoding.
501
502 If @var{val} is a function symbol, the function must return a coding
503 system or a cons cell containing two coding systems. This value is used
504 as described above.
505 @end defvar
506
507 @tindex process-coding-system-alist
508 @defvar process-coding-system-alist
509 This variable is an alist specifying which coding systems to use for a
510 subprocess, depending on which program is running in the subprocess. It
511 works like @code{file-coding-system-alist}, except that @var{pattern} is
512 matched against the program name used to start the subprocess. The coding
513 system or systems specified in this alist are used to initialize the
514 coding systems used for I/O to the subprocess, but you can specify
515 other coding systems later using @code{set-process-coding-system}.
516 @end defvar
517
518 @tindex network-coding-system-alist
519 @defvar network-coding-system-alist
520 This variable is an alist that specifies the coding system to use for
521 network streams. It works much like @code{file-coding-system-alist},
522 with the difference that the @var{pattern} in an elemetn may be either a
523 port number or a regular expression. If it is a regular expression, it
524 is matched against the network service name used to open the network
525 stream.
526 @end defvar
527
528 @tindex default-process-coding-system
529 @defvar default-process-coding-system
530 This variable specifies the coding systems to use for subprocess (and
531 network stream) input and output, when nothing else specifies what to
532 do.
533
534 The value should be a cons cell of the form @code{(@var{output-coding}
535 . @var{input-coding})}. Here @var{output-coding} applies to output to
536 the subprocess, and @var{input-coding} applies to input from it.
537 @end defvar
538
539 @node Specifying Coding Systems
540 @section Specifying a Coding System for One Operation
541
542 You can specify the coding system for a specific operation by binding
543 the variables @code{coding-system-for-read} and/or
544 @code{coding-system-for-write}.
545
546 @tindex coding-system-for-read
547 @defvar coding-system-for-read
548 If this variable is non-@code{nil}, it specifies the coding system to
549 use for reading a file, or for input from a synchronous subprocess.
550
551 It also applies to any asynchronous subprocess or network stream, but in
552 a different way: the value of @code{coding-system-for-read} when you
553 start the subprocess or open the network stream specifies the input
554 decoding method for that subprocess or network stream. It remains in
555 use for that subprocess or network stream unless and until overridden.
556
557 The right way to use this variable is to bind it with @code{let} for a
558 specific I/O operation. Its global value is normally @code{nil}, and
559 you should not globally set it to any other value. Here is an example
560 of the right way to use the variable:
561
562 @example
563 ;; @r{Read the file with no character code conversion.}
564 ;; @r{Assume CRLF represents end-of-line.}
565 (let ((coding-system-for-write 'emacs-mule-dos))
566 (insert-file-contents filename))
567 @end example
568
569 When its value is non-@code{nil}, @code{coding-system-for-read} takes
570 precedence all other methods of specifying a coding system to use for
571 input, including @code{file-coding-system-alist},
572 @code{process-coding-system-alist} and
573 @code{network-coding-system-alist}.
574 @end defvar
575
576 @tindex coding-system-for-write
577 @defvar coding-system-for-write
578 This works much like @code{coding-system-for-read}, except that it
579 applies to output rather than input. It affects writing to files,
580 subprocesses, and net connections.
581
582 When a single operation does both input and output, as do
583 @code{call-process-region} and @code{start-process}, both
584 @code{coding-system-for-read} and @code{coding-system-for-write}
585 affect it.
586 @end defvar
587
588 @tindex last-coding-system-used
589 @defvar last-coding-system-used
590 All operations that use a coding system set this variable
591 to the coding system name that was used.
592 @end defvar
593
594 @tindex inhibit-eol-conversion
595 @defvar inhibit-eol-conversion
596 When this variable is non-@code{nil}, no end-of-line conversion is done,
597 no matter which coding system is specified. This applies to all the
598 Emacs I/O and subprocess primitives, and to the explicit encoding and
599 decoding functions (@pxref{Explicit Encoding}).
600 @end defvar
601
602 @tindex keyboard-coding-system
603 @defun keyboard-coding-system
604 This function returns the coding system that is in use for decoding
605 keyboard input---or @code{nil} if no coding system is to be used.
606 @end defun
607
608 @tindex set-keyboard-coding-system
609 @defun set-keyboard-coding-system coding-system
610 This function specifies @var{coding-system} as the coding system to
611 use for decoding keyboard input. If @var{coding-system} is @code{nil},
612 that means do not decode keyboard input.
613 @end defun
614
615 @tindex terminal-coding-system
616 @defun terminal-coding-system
617 This function returns the coding system that is in use for encoding
618 terminal output---or @code{nil} for no encoding.
619 @end defun
620
621 @tindex set-terminal-coding-system
622 @defun set-terminal-coding-system coding-system
623 This function specifies @var{coding-system} as the coding system to use
624 for encoding terminal output. If @var{coding-system} is @code{nil},
625 that means do not encode terminal output.
626 @end defun
627
628 See also the functions @code{process-coding-system} and
629 @code{set-process-coding-system}. @xref{Process Information}.
630
631 See also @code{read-coding-system} in @ref{High-Level Completion}.
632
633 @node Explicit Encoding
634 @section Explicit Encoding and Decoding
635 @cindex encoding text
636 @cindex decoding text
637
638 All the operations that transfer text in and out of Emacs have the
639 ability to use a coding system to encode or decode the text.
640 You can also explicitly encode and decode text using the functions
641 in this section.
642
643 @cindex raw bytes
644 The result of encoding, and the input to decoding, are not ordinary
645 text. They are ``raw bytes''---bytes that represent text in the same
646 way that an external file would. When a buffer contains raw bytes, it
647 is most natural to mark that buffer as using unibyte representation,
648 using @code{set-buffer-multibyte} (@pxref{Selecting a Representation}),
649 but this is not required.
650
651 The usual way to get raw bytes in a buffer, for explicit decoding, is
652 to read them with from a file with @code{insert-file-contents-literally}
653 (@pxref{Reading from Files}) or specify a non-@code{nil} @var{rawfile}
654 arguments when visiting a file with @code{find-file-noselect}.
655
656 The usual way to use the raw bytes that result from explicitly
657 encoding text is to copy them to a file or process---for example, to
658 write it with @code{write-region} (@pxref{Writing to Files}), and
659 suppress encoding for that @code{write-region} call by binding
660 @code{coding-system-for-write} to @code{no-conversion}.
661
662 @tindex encode-coding-region
663 @defun encode-coding-region start end coding-system
664 This function encodes the text from @var{start} to @var{end} according
665 to coding system @var{coding-system}. The encoded text replaces
666 the original text in the buffer. The result of encoding is
667 ``raw bytes.''
668 @end defun
669
670 @tindex encode-coding-string
671 @defun encode-coding-string string coding-system
672 This function encodes the text in @var{string} according to coding
673 system @var{coding-system}. It returns a new string containing the
674 encoded text. The result of encoding is ``raw bytes.''
675 @end defun
676
677 @tindex decode-coding-region
678 @defun decode-coding-region start end coding-system
679 This function decodes the text from @var{start} to @var{end} according
680 to coding system @var{coding-system}. The decoded text replaces the
681 original text in the buffer. To make explicit decoding useful, the text
682 before decoding ought to be ``raw bytes.''
683 @end defun
684
685 @tindex decode-coding-string
686 @defun decode-coding-string string coding-system
687 This function decodes the text in @var{string} according to coding
688 system @var{coding-system}. It returns a new string containing the
689 decoded text. To make explicit decoding useful, the contents of
690 @var{string} ought to be ``raw bytes.''
691 @end defun