annotate doc/lispref/nonascii.texi @ 97366:d2c211c8ceda

(w32_list_system_processes, w32_system_process_attributes): Add prototypes. (Qeuid, Qegid, Qcomm, Qstate, Qppid, Qpgrp, Qsess, Qttname) (Qminflt, Qmajflt, Qcminflt, Qcmajflt, Qutime, Qstime, Qcutime) (Qpri, Qnice, Qthcount, Qstart, Qvsize, Qrss, Qargs, Quser, Qgroup) (Qetime, Qpcpu, Qpmem, Qtpgid, Qcstime): Add extern declarations.
author Eli Zaretskii <eliz@gnu.org>
date Sat, 09 Aug 2008 17:53:30 +0000
parents 0fd94280462b
children df0ee162b492
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
84090
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1 @c -*-texinfo-*-
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
2 @c This is part of the GNU Emacs Lisp Reference Manual.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
3 @c Copyright (C) 1998, 1999, 2001, 2002, 2003, 2004,
87649
107ccd98fa12 Merge from emacs--rel--22
Miles Bader <miles@gnu.org>
parents: 87276
diff changeset
4 @c 2005, 2006, 2007, 2008 Free Software Foundation, Inc.
84090
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
5 @c See the file elisp.texi for copying conditions.
84116
0ba80d073e27 (setfilename): Go up one more level to ../../info.
Glenn Morris <rgm@gnu.org>
parents: 84090
diff changeset
6 @setfilename ../../info/characters
84090
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
7 @node Non-ASCII Characters, Searching and Matching, Text, Top
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
8 @chapter Non-@acronym{ASCII} Characters
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
9 @cindex multibyte characters
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
10 @cindex characters, multi-byte
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
11 @cindex non-@acronym{ASCII} characters
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
12
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
13 This chapter covers the special issues relating to non-@acronym{ASCII}
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
14 characters and how they are stored in strings and buffers.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
15
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
16 @menu
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
17 * Text Representations:: Unibyte and multibyte representations
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
18 * Converting Representations:: Converting unibyte to multibyte and vice versa.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
19 * Selecting a Representation:: Treating a byte sequence as unibyte or multi.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
20 * Character Codes:: How unibyte and multibyte relate to
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
21 codes of individual characters.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
22 * Character Sets:: The space of possible character codes
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
23 is divided into various character sets.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
24 * Chars and Bytes:: More information about multibyte encodings.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
25 * Splitting Characters:: Converting a character to its byte sequence.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
26 * Scanning Charsets:: Which character sets are used in a buffer?
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
27 * Translation of Characters:: Translation tables are used for conversion.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
28 * Coding Systems:: Coding systems are conversions for saving files.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
29 * Input Methods:: Input methods allow users to enter various
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
30 non-ASCII characters without special keyboards.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
31 * Locales:: Interacting with the POSIX locale.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
32 @end menu
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
33
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
34 @node Text Representations
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
35 @section Text Representations
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
36 @cindex text representations
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
37
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
38 Emacs has two @dfn{text representations}---two ways to represent text
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
39 in a string or buffer. These are called @dfn{unibyte} and
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
40 @dfn{multibyte}. Each string, and each buffer, uses one of these two
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
41 representations. For most purposes, you can ignore the issue of
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
42 representations, because Emacs converts text between them as
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
43 appropriate. Occasionally in Lisp programming you will need to pay
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
44 attention to the difference.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
45
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
46 @cindex unibyte text
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
47 In unibyte representation, each character occupies one byte and
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
48 therefore the possible character codes range from 0 to 255. Codes 0
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
49 through 127 are @acronym{ASCII} characters; the codes from 128 through 255
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
50 are used for one non-@acronym{ASCII} character set (you can choose which
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
51 character set by setting the variable @code{nonascii-insert-offset}).
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
52
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
53 @cindex leading code
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
54 @cindex multibyte text
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
55 @cindex trailing codes
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
56 In multibyte representation, a character may occupy more than one
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
57 byte, and as a result, the full range of Emacs character codes can be
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
58 stored. The first byte of a multibyte character is always in the range
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
59 128 through 159 (octal 0200 through 0237). These values are called
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
60 @dfn{leading codes}. The second and subsequent bytes of a multibyte
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
61 character are always in the range 160 through 255 (octal 0240 through
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
62 0377); these values are @dfn{trailing codes}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
63
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
64 Some sequences of bytes are not valid in multibyte text: for example,
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
65 a single isolated byte in the range 128 through 159 is not allowed. But
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
66 character codes 128 through 159 can appear in multibyte text,
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
67 represented as two-byte sequences. All the character codes 128 through
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
68 255 are possible (though slightly abnormal) in multibyte text; they
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
69 appear in multibyte buffers and strings when you do explicit encoding
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
70 and decoding (@pxref{Explicit Encoding}).
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
71
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
72 In a buffer, the buffer-local value of the variable
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
73 @code{enable-multibyte-characters} specifies the representation used.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
74 The representation for a string is determined and recorded in the string
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
75 when the string is constructed.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
76
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
77 @defvar enable-multibyte-characters
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
78 This variable specifies the current buffer's text representation.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
79 If it is non-@code{nil}, the buffer contains multibyte text; otherwise,
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
80 it contains unibyte text.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
81
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
82 You cannot set this variable directly; instead, use the function
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
83 @code{set-buffer-multibyte} to change a buffer's representation.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
84 @end defvar
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
85
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
86 @defvar default-enable-multibyte-characters
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
87 This variable's value is entirely equivalent to @code{(default-value
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
88 'enable-multibyte-characters)}, and setting this variable changes that
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
89 default value. Setting the local binding of
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
90 @code{enable-multibyte-characters} in a specific buffer is not allowed,
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
91 but changing the default value is supported, and it is a reasonable
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
92 thing to do, because it has no effect on existing buffers.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
93
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
94 The @samp{--unibyte} command line option does its job by setting the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
95 default value to @code{nil} early in startup.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
96 @end defvar
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
97
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
98 @defun position-bytes position
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
99 Return the byte-position corresponding to buffer position
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
100 @var{position} in the current buffer. This is 1 at the start of the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
101 buffer, and counts upward in bytes. If @var{position} is out of
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
102 range, the value is @code{nil}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
103 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
104
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
105 @defun byte-to-position byte-position
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
106 Return the buffer position corresponding to byte-position
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
107 @var{byte-position} in the current buffer. If @var{byte-position} is
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
108 out of range, the value is @code{nil}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
109 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
110
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
111 @defun multibyte-string-p string
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
112 Return @code{t} if @var{string} is a multibyte string.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
113 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
114
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
115 @defun string-bytes string
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
116 @cindex string, number of bytes
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
117 This function returns the number of bytes in @var{string}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
118 If @var{string} is a multibyte string, this can be greater than
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
119 @code{(length @var{string})}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
120 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
121
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
122 @node Converting Representations
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
123 @section Converting Text Representations
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
124
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
125 Emacs can convert unibyte text to multibyte; it can also convert
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
126 multibyte text to unibyte, though this conversion loses information. In
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
127 general these conversions happen when inserting text into a buffer, or
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
128 when putting text from several strings together in one string. You can
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
129 also explicitly convert a string's contents to either representation.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
130
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
131 Emacs chooses the representation for a string based on the text that
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
132 it is constructed from. The general rule is to convert unibyte text to
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
133 multibyte text when combining it with other multibyte text, because the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
134 multibyte representation is more general and can hold whatever
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
135 characters the unibyte text has.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
136
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
137 When inserting text into a buffer, Emacs converts the text to the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
138 buffer's representation, as specified by
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
139 @code{enable-multibyte-characters} in that buffer. In particular, when
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
140 you insert multibyte text into a unibyte buffer, Emacs converts the text
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
141 to unibyte, even though this conversion cannot in general preserve all
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
142 the characters that might be in the multibyte text. The other natural
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
143 alternative, to convert the buffer contents to multibyte, is not
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
144 acceptable because the buffer's representation is a choice made by the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
145 user that cannot be overridden automatically.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
146
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
147 Converting unibyte text to multibyte text leaves @acronym{ASCII} characters
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
148 unchanged, and likewise character codes 128 through 159. It converts
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
149 the non-@acronym{ASCII} codes 160 through 255 by adding the value
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
150 @code{nonascii-insert-offset} to each character code. By setting this
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
151 variable, you specify which character set the unibyte characters
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
152 correspond to (@pxref{Character Sets}). For example, if
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
153 @code{nonascii-insert-offset} is 2048, which is @code{(- (make-char
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
154 'latin-iso8859-1) 128)}, then the unibyte non-@acronym{ASCII} characters
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
155 correspond to Latin 1. If it is 2688, which is @code{(- (make-char
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
156 'greek-iso8859-7) 128)}, then they correspond to Greek letters.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
157
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
158 Converting multibyte text to unibyte is simpler: it discards all but
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
159 the low 8 bits of each character code. If @code{nonascii-insert-offset}
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
160 has a reasonable value, corresponding to the beginning of some character
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
161 set, this conversion is the inverse of the other: converting unibyte
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
162 text to multibyte and back to unibyte reproduces the original unibyte
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
163 text.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
164
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
165 @defvar nonascii-insert-offset
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
166 This variable specifies the amount to add to a non-@acronym{ASCII} character
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
167 when converting unibyte text to multibyte. It also applies when
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
168 @code{self-insert-command} inserts a character in the unibyte
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
169 non-@acronym{ASCII} range, 128 through 255. However, the functions
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
170 @code{insert} and @code{insert-char} do not perform this conversion.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
171
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
172 The right value to use to select character set @var{cs} is @code{(-
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
173 (make-char @var{cs}) 128)}. If the value of
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
174 @code{nonascii-insert-offset} is zero, then conversion actually uses the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
175 value for the Latin 1 character set, rather than zero.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
176 @end defvar
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
177
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
178 @defvar nonascii-translation-table
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
179 This variable provides a more general alternative to
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
180 @code{nonascii-insert-offset}. You can use it to specify independently
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
181 how to translate each code in the range of 128 through 255 into a
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
182 multibyte character. The value should be a char-table, or @code{nil}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
183 If this is non-@code{nil}, it overrides @code{nonascii-insert-offset}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
184 @end defvar
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
185
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
186 The next three functions either return the argument @var{string}, or a
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
187 newly created string with no text properties.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
188
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
189 @defun string-make-unibyte string
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
190 This function converts the text of @var{string} to unibyte
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
191 representation, if it isn't already, and returns the result. If
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
192 @var{string} is a unibyte string, it is returned unchanged. Multibyte
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
193 character codes are converted to unibyte according to
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
194 @code{nonascii-translation-table} or, if that is @code{nil}, using
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
195 @code{nonascii-insert-offset}. If the lookup in the translation table
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
196 fails, this function takes just the low 8 bits of each character.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
197 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
198
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
199 @defun string-make-multibyte string
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
200 This function converts the text of @var{string} to multibyte
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
201 representation, if it isn't already, and returns the result. If
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
202 @var{string} is a multibyte string or consists entirely of
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
203 @acronym{ASCII} characters, it is returned unchanged. In particular,
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
204 if @var{string} is unibyte and entirely @acronym{ASCII}, the returned
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
205 string is unibyte. (When the characters are all @acronym{ASCII},
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
206 Emacs primitives will treat the string the same way whether it is
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
207 unibyte or multibyte.) If @var{string} is unibyte and contains
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
208 non-@acronym{ASCII} characters, the function
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
209 @code{unibyte-char-to-multibyte} is used to convert each unibyte
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
210 character to a multibyte character.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
211 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
212
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
213 @defun string-to-multibyte string
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
214 This function returns a multibyte string containing the same sequence
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
215 of character codes as @var{string}. Unlike
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
216 @code{string-make-multibyte}, this function unconditionally returns a
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
217 multibyte string. If @var{string} is a multibyte string, it is
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
218 returned unchanged.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
219 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
220
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
221 @defun multibyte-char-to-unibyte char
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
222 This convert the multibyte character @var{char} to a unibyte
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
223 character, based on @code{nonascii-translation-table} and
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
224 @code{nonascii-insert-offset}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
225 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
226
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
227 @defun unibyte-char-to-multibyte char
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
228 This convert the unibyte character @var{char} to a multibyte
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
229 character, based on @code{nonascii-translation-table} and
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
230 @code{nonascii-insert-offset}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
231 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
232
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
233 @node Selecting a Representation
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
234 @section Selecting a Representation
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
235
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
236 Sometimes it is useful to examine an existing buffer or string as
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
237 multibyte when it was unibyte, or vice versa.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
238
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
239 @defun set-buffer-multibyte multibyte
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
240 Set the representation type of the current buffer. If @var{multibyte}
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
241 is non-@code{nil}, the buffer becomes multibyte. If @var{multibyte}
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
242 is @code{nil}, the buffer becomes unibyte.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
243
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
244 This function leaves the buffer contents unchanged when viewed as a
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
245 sequence of bytes. As a consequence, it can change the contents viewed
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
246 as characters; a sequence of two bytes which is treated as one character
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
247 in multibyte representation will count as two characters in unibyte
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
248 representation. Character codes 128 through 159 are an exception. They
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
249 are represented by one byte in a unibyte buffer, but when the buffer is
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
250 set to multibyte, they are converted to two-byte sequences, and vice
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
251 versa.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
252
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
253 This function sets @code{enable-multibyte-characters} to record which
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
254 representation is in use. It also adjusts various data in the buffer
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
255 (including overlays, text properties and markers) so that they cover the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
256 same text as they did before.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
257
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
258 You cannot use @code{set-buffer-multibyte} on an indirect buffer,
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
259 because indirect buffers always inherit the representation of the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
260 base buffer.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
261 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
262
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
263 @defun string-as-unibyte string
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
264 This function returns a string with the same bytes as @var{string} but
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
265 treating each byte as a character. This means that the value may have
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
266 more characters than @var{string} has.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
267
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
268 If @var{string} is already a unibyte string, then the value is
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
269 @var{string} itself. Otherwise it is a newly created string, with no
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
270 text properties. If @var{string} is multibyte, any characters it
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
271 contains of charset @code{eight-bit-control} or @code{eight-bit-graphic}
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
272 are converted to the corresponding single byte.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
273 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
274
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
275 @defun string-as-multibyte string
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
276 This function returns a string with the same bytes as @var{string} but
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
277 treating each multibyte sequence as one character. This means that the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
278 value may have fewer characters than @var{string} has.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
279
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
280 If @var{string} is already a multibyte string, then the value is
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
281 @var{string} itself. Otherwise it is a newly created string, with no
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
282 text properties. If @var{string} is unibyte and contains any individual
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
283 8-bit bytes (i.e.@: not part of a multibyte form), they are converted to
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
284 the corresponding multibyte character of charset @code{eight-bit-control}
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
285 or @code{eight-bit-graphic}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
286 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
287
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
288 @node Character Codes
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
289 @section Character Codes
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
290 @cindex character codes
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
291
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
292 The unibyte and multibyte text representations use different character
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
293 codes. The valid character codes for unibyte representation range from
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
294 0 to 255---the values that can fit in one byte. The valid character
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
295 codes for multibyte representation range from 0 to 524287, but not all
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
296 values in that range are valid. The values 128 through 255 are not
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
297 entirely proper in multibyte text, but they can occur if you do explicit
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
298 encoding and decoding (@pxref{Explicit Encoding}). Some other character
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
299 codes cannot occur at all in multibyte text. Only the @acronym{ASCII} codes
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
300 0 through 127 are completely legitimate in both representations.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
301
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
302 @defun char-valid-p charcode &optional genericp
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
303 This returns @code{t} if @var{charcode} is valid (either for unibyte
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
304 text or for multibyte text).
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
305
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
306 @example
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
307 (char-valid-p 65)
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
308 @result{} t
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
309 (char-valid-p 256)
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
310 @result{} nil
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
311 (char-valid-p 2248)
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
312 @result{} t
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
313 @end example
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
314
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
315 If the optional argument @var{genericp} is non-@code{nil}, this
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
316 function also returns @code{t} if @var{charcode} is a generic
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
317 character (@pxref{Splitting Characters}).
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
318 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
319
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
320 @node Character Sets
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
321 @section Character Sets
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
322 @cindex character sets
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
323
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
324 Emacs classifies characters into various @dfn{character sets}, each of
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
325 which has a name which is a symbol. Each character belongs to one and
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
326 only one character set.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
327
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
328 In general, there is one character set for each distinct script. For
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
329 example, @code{latin-iso8859-1} is one character set,
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
330 @code{greek-iso8859-7} is another, and @code{ascii} is another. An
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
331 Emacs character set can hold at most 9025 characters; therefore, in some
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
332 cases, characters that would logically be grouped together are split
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
333 into several character sets. For example, one set of Chinese
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
334 characters, generally known as Big 5, is divided into two Emacs
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
335 character sets, @code{chinese-big5-1} and @code{chinese-big5-2}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
336
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
337 @acronym{ASCII} characters are in character set @code{ascii}. The
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
338 non-@acronym{ASCII} characters 128 through 159 are in character set
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
339 @code{eight-bit-control}, and codes 160 through 255 are in character set
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
340 @code{eight-bit-graphic}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
341
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
342 @defun charsetp object
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
343 Returns @code{t} if @var{object} is a symbol that names a character set,
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
344 @code{nil} otherwise.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
345 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
346
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
347 @defvar charset-list
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
348 The value is a list of all defined character set names.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
349 @end defvar
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
350
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
351 @defun charset-list
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
352 This function returns the value of @code{charset-list}. It is only
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
353 provided for backward compatibility.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
354 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
355
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
356 @defun char-charset character
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
357 This function returns the name of the character set that @var{character}
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
358 belongs to, or the symbol @code{unknown} if @var{character} is not a
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
359 valid character.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
360 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
361
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
362 @defun charset-plist charset
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
363 This function returns the charset property list of the character set
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
364 @var{charset}. Although @var{charset} is a symbol, this is not the same
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
365 as the property list of that symbol. Charset properties are used for
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
366 special purposes within Emacs.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
367 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
368
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
369 @deffn Command list-charset-chars charset
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
370 This command displays a list of characters in the character set
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
371 @var{charset}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
372 @end deffn
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
373
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
374 @node Chars and Bytes
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
375 @section Characters and Bytes
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
376 @cindex bytes and characters
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
377
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
378 @cindex introduction sequence (of character)
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
379 @cindex dimension (of character set)
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
380 In multibyte representation, each character occupies one or more
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
381 bytes. Each character set has an @dfn{introduction sequence}, which is
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
382 normally one or two bytes long. (Exception: the @code{ascii} character
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
383 set and the @code{eight-bit-graphic} character set have a zero-length
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
384 introduction sequence.) The introduction sequence is the beginning of
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
385 the byte sequence for any character in the character set. The rest of
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
386 the character's bytes distinguish it from the other characters in the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
387 same character set. Depending on the character set, there are either
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
388 one or two distinguishing bytes; the number of such bytes is called the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
389 @dfn{dimension} of the character set.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
390
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
391 @defun charset-dimension charset
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
392 This function returns the dimension of @var{charset}; at present, the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
393 dimension is always 1 or 2.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
394 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
395
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
396 @defun charset-bytes charset
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
397 This function returns the number of bytes used to represent a character
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
398 in character set @var{charset}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
399 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
400
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
401 This is the simplest way to determine the byte length of a character
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
402 set's introduction sequence:
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
403
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
404 @example
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
405 (- (charset-bytes @var{charset})
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
406 (charset-dimension @var{charset}))
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
407 @end example
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
408
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
409 @node Splitting Characters
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
410 @section Splitting Characters
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
411 @cindex character as bytes
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
412
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
413 The functions in this section convert between characters and the byte
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
414 values used to represent them. For most purposes, there is no need to
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
415 be concerned with the sequence of bytes used to represent a character,
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
416 because Emacs translates automatically when necessary.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
417
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
418 @defun split-char character
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
419 Return a list containing the name of the character set of
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
420 @var{character}, followed by one or two byte values (integers) which
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
421 identify @var{character} within that character set. The number of byte
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
422 values is the character set's dimension.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
423
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
424 If @var{character} is invalid as a character code, @code{split-char}
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
425 returns a list consisting of the symbol @code{unknown} and @var{character}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
426
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
427 @example
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
428 (split-char 2248)
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
429 @result{} (latin-iso8859-1 72)
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
430 (split-char 65)
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
431 @result{} (ascii 65)
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
432 (split-char 128)
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
433 @result{} (eight-bit-control 128)
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
434 @end example
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
435 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
436
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
437 @cindex generate characters in charsets
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
438 @defun make-char charset &optional code1 code2
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
439 This function returns the character in character set @var{charset} whose
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
440 position codes are @var{code1} and @var{code2}. This is roughly the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
441 inverse of @code{split-char}. Normally, you should specify either one
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
442 or both of @var{code1} and @var{code2} according to the dimension of
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
443 @var{charset}. For example,
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
444
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
445 @example
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
446 (make-char 'latin-iso8859-1 72)
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
447 @result{} 2248
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
448 @end example
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
449
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
450 Actually, the eighth bit of both @var{code1} and @var{code2} is zeroed
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
451 before they are used to index @var{charset}. Thus you may use, for
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
452 instance, an ISO 8859 character code rather than subtracting 128, as
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
453 is necessary to index the corresponding Emacs charset.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
454 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
455
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
456 @cindex generic characters
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
457 If you call @code{make-char} with no @var{byte-values}, the result is
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
458 a @dfn{generic character} which stands for @var{charset}. A generic
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
459 character is an integer, but it is @emph{not} valid for insertion in the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
460 buffer as a character. It can be used in @code{char-table-range} to
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
461 refer to the whole character set (@pxref{Char-Tables}).
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
462 @code{char-valid-p} returns @code{nil} for generic characters.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
463 For example:
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
464
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
465 @example
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
466 (make-char 'latin-iso8859-1)
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
467 @result{} 2176
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
468 (char-valid-p 2176)
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
469 @result{} nil
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
470 (char-valid-p 2176 t)
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
471 @result{} t
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
472 (split-char 2176)
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
473 @result{} (latin-iso8859-1 0)
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
474 @end example
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
475
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
476 The character sets @code{ascii}, @code{eight-bit-control}, and
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
477 @code{eight-bit-graphic} don't have corresponding generic characters. If
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
478 @var{charset} is one of them and you don't supply @var{code1},
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
479 @code{make-char} returns the character code corresponding to the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
480 smallest code in @var{charset}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
481
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
482 @node Scanning Charsets
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
483 @section Scanning for Character Sets
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
484
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
485 Sometimes it is useful to find out which character sets appear in a
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
486 part of a buffer or a string. One use for this is in determining which
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
487 coding systems (@pxref{Coding Systems}) are capable of representing all
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
488 of the text in question.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
489
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
490 @defun charset-after &optional pos
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
491 This function return the charset of a character in the current buffer
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
492 at position @var{pos}. If @var{pos} is omitted or @code{nil}, it
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
493 defaults to the current value of point. If @var{pos} is out of range,
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
494 the value is @code{nil}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
495 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
496
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
497 @defun find-charset-region beg end &optional translation
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
498 This function returns a list of the character sets that appear in the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
499 current buffer between positions @var{beg} and @var{end}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
500
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
501 The optional argument @var{translation} specifies a translation table to
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
502 be used in scanning the text (@pxref{Translation of Characters}). If it
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
503 is non-@code{nil}, then each character in the region is translated
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
504 through this table, and the value returned describes the translated
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
505 characters instead of the characters actually in the buffer.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
506 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
507
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
508 @defun find-charset-string string &optional translation
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
509 This function returns a list of the character sets that appear in the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
510 string @var{string}. It is just like @code{find-charset-region}, except
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
511 that it applies to the contents of @var{string} instead of part of the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
512 current buffer.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
513 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
514
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
515 @node Translation of Characters
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
516 @section Translation of Characters
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
517 @cindex character translation tables
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
518 @cindex translation tables
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
519
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
520 A @dfn{translation table} is a char-table that specifies a mapping
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
521 of characters into characters. These tables are used in encoding and
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
522 decoding, and for other purposes. Some coding systems specify their
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
523 own particular translation tables; there are also default translation
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
524 tables which apply to all other coding systems.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
525
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
526 For instance, the coding-system @code{utf-8} has a translation table
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
527 that maps characters of various charsets (e.g.,
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
528 @code{latin-iso8859-@var{x}}) into Unicode character sets. This way,
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
529 it can encode Latin-2 characters into UTF-8. Meanwhile,
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
530 @code{unify-8859-on-decoding-mode} operates by specifying
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
531 @code{standard-translation-table-for-decode} to translate
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
532 Latin-@var{x} characters into corresponding Unicode characters.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
533
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
534 @defun make-translation-table &rest translations
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
535 This function returns a translation table based on the argument
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
536 @var{translations}. Each element of @var{translations} should be a
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
537 list of elements of the form @code{(@var{from} . @var{to})}; this says
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
538 to translate the character @var{from} into @var{to}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
539
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
540 The arguments and the forms in each argument are processed in order,
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
541 and if a previous form already translates @var{to} to some other
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
542 character, say @var{to-alt}, @var{from} is also translated to
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
543 @var{to-alt}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
544
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
545 You can also map one whole character set into another character set with
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
546 the same dimension. To do this, you specify a generic character (which
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
547 designates a character set) for @var{from} (@pxref{Splitting Characters}).
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
548 In this case, if @var{to} is also a generic character, its character
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
549 set should have the same dimension as @var{from}'s. Then the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
550 translation table translates each character of @var{from}'s character
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
551 set into the corresponding character of @var{to}'s character set. If
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
552 @var{from} is a generic character and @var{to} is an ordinary
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
553 character, then the translation table translates every character of
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
554 @var{from}'s character set into @var{to}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
555 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
556
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
557 In decoding, the translation table's translations are applied to the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
558 characters that result from ordinary decoding. If a coding system has
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
559 property @code{translation-table-for-decode}, that specifies the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
560 translation table to use. (This is a property of the coding system,
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
561 as returned by @code{coding-system-get}, not a property of the symbol
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
562 that is the coding system's name. @xref{Coding System Basics,, Basic
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
563 Concepts of Coding Systems}.) Otherwise, if
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
564 @code{standard-translation-table-for-decode} is non-@code{nil},
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
565 decoding uses that table.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
566
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
567 In encoding, the translation table's translations are applied to the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
568 characters in the buffer, and the result of translation is actually
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
569 encoded. If a coding system has property
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
570 @code{translation-table-for-encode}, that specifies the translation
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
571 table to use. Otherwise the variable
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
572 @code{standard-translation-table-for-encode} specifies the translation
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
573 table.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
574
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
575 @defvar standard-translation-table-for-decode
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
576 This is the default translation table for decoding, for
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
577 coding systems that don't specify any other translation table.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
578 @end defvar
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
579
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
580 @defvar standard-translation-table-for-encode
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
581 This is the default translation table for encoding, for
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
582 coding systems that don't specify any other translation table.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
583 @end defvar
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
584
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
585 @node Coding Systems
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
586 @section Coding Systems
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
587
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
588 @cindex coding system
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
589 When Emacs reads or writes a file, and when Emacs sends text to a
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
590 subprocess or receives text from a subprocess, it normally performs
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
591 character code conversion and end-of-line conversion as specified
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
592 by a particular @dfn{coding system}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
593
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
594 How to define a coding system is an arcane matter, and is not
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
595 documented here.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
596
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
597 @menu
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
598 * Coding System Basics:: Basic concepts.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
599 * Encoding and I/O:: How file I/O functions handle coding systems.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
600 * Lisp and Coding Systems:: Functions to operate on coding system names.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
601 * User-Chosen Coding Systems:: Asking the user to choose a coding system.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
602 * Default Coding Systems:: Controlling the default choices.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
603 * Specifying Coding Systems:: Requesting a particular coding system
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
604 for a single file operation.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
605 * Explicit Encoding:: Encoding or decoding text without doing I/O.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
606 * Terminal I/O Encoding:: Use of encoding for terminal I/O.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
607 * MS-DOS File Types:: How DOS "text" and "binary" files
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
608 relate to coding systems.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
609 @end menu
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
610
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
611 @node Coding System Basics
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
612 @subsection Basic Concepts of Coding Systems
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
613
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
614 @cindex character code conversion
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
615 @dfn{Character code conversion} involves conversion between the encoding
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
616 used inside Emacs and some other encoding. Emacs supports many
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
617 different encodings, in that it can convert to and from them. For
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
618 example, it can convert text to or from encodings such as Latin 1, Latin
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
619 2, Latin 3, Latin 4, Latin 5, and several variants of ISO 2022. In some
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
620 cases, Emacs supports several alternative encodings for the same
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
621 characters; for example, there are three coding systems for the Cyrillic
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
622 (Russian) alphabet: ISO, Alternativnyj, and KOI8.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
623
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
624 Most coding systems specify a particular character code for
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
625 conversion, but some of them leave the choice unspecified---to be chosen
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
626 heuristically for each file, based on the data.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
627
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
628 In general, a coding system doesn't guarantee roundtrip identity:
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
629 decoding a byte sequence using coding system, then encoding the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
630 resulting text in the same coding system, can produce a different byte
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
631 sequence. However, the following coding systems do guarantee that the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
632 byte sequence will be the same as what you originally decoded:
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
633
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
634 @quotation
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
635 chinese-big5 chinese-iso-8bit cyrillic-iso-8bit emacs-mule
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
636 greek-iso-8bit hebrew-iso-8bit iso-latin-1 iso-latin-2 iso-latin-3
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
637 iso-latin-4 iso-latin-5 iso-latin-8 iso-latin-9 iso-safe
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
638 japanese-iso-8bit japanese-shift-jis korean-iso-8bit raw-text
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
639 @end quotation
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
640
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
641 Encoding buffer text and then decoding the result can also fail to
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
642 reproduce the original text. For instance, if you encode Latin-2
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
643 characters with @code{utf-8} and decode the result using the same
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
644 coding system, you'll get Unicode characters (of charset
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
645 @code{mule-unicode-0100-24ff}). If you encode Unicode characters with
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
646 @code{iso-latin-2} and decode the result with the same coding system,
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
647 you'll get Latin-2 characters.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
648
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
649 @cindex EOL conversion
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
650 @cindex end-of-line conversion
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
651 @cindex line end conversion
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
652 @dfn{End of line conversion} handles three different conventions used
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
653 on various systems for representing end of line in files. The Unix
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
654 convention is to use the linefeed character (also called newline). The
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
655 DOS convention is to use a carriage-return and a linefeed at the end of
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
656 a line. The Mac convention is to use just carriage-return.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
657
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
658 @cindex base coding system
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
659 @cindex variant coding system
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
660 @dfn{Base coding systems} such as @code{latin-1} leave the end-of-line
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
661 conversion unspecified, to be chosen based on the data. @dfn{Variant
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
662 coding systems} such as @code{latin-1-unix}, @code{latin-1-dos} and
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
663 @code{latin-1-mac} specify the end-of-line conversion explicitly as
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
664 well. Most base coding systems have three corresponding variants whose
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
665 names are formed by adding @samp{-unix}, @samp{-dos} and @samp{-mac}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
666
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
667 The coding system @code{raw-text} is special in that it prevents
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
668 character code conversion, and causes the buffer visited with that
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
669 coding system to be a unibyte buffer. It does not specify the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
670 end-of-line conversion, allowing that to be determined as usual by the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
671 data, and has the usual three variants which specify the end-of-line
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
672 conversion. @code{no-conversion} is equivalent to @code{raw-text-unix}:
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
673 it specifies no conversion of either character codes or end-of-line.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
674
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
675 The coding system @code{emacs-mule} specifies that the data is
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
676 represented in the internal Emacs encoding. This is like
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
677 @code{raw-text} in that no code conversion happens, but different in
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
678 that the result is multibyte data.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
679
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
680 @defun coding-system-get coding-system property
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
681 This function returns the specified property of the coding system
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
682 @var{coding-system}. Most coding system properties exist for internal
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
683 purposes, but one that you might find useful is @code{mime-charset}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
684 That property's value is the name used in MIME for the character coding
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
685 which this coding system can read and write. Examples:
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
686
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
687 @example
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
688 (coding-system-get 'iso-latin-1 'mime-charset)
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
689 @result{} iso-8859-1
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
690 (coding-system-get 'iso-2022-cn 'mime-charset)
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
691 @result{} iso-2022-cn
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
692 (coding-system-get 'cyrillic-koi8 'mime-charset)
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
693 @result{} koi8-r
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
694 @end example
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
695
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
696 The value of the @code{mime-charset} property is also defined
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
697 as an alias for the coding system.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
698 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
699
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
700 @node Encoding and I/O
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
701 @subsection Encoding and I/O
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
702
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
703 The principal purpose of coding systems is for use in reading and
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
704 writing files. The function @code{insert-file-contents} uses
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
705 a coding system for decoding the file data, and @code{write-region}
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
706 uses one to encode the buffer contents.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
707
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
708 You can specify the coding system to use either explicitly
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
709 (@pxref{Specifying Coding Systems}), or implicitly using a default
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
710 mechanism (@pxref{Default Coding Systems}). But these methods may not
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
711 completely specify what to do. For example, they may choose a coding
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
712 system such as @code{undefined} which leaves the character code
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
713 conversion to be determined from the data. In these cases, the I/O
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
714 operation finishes the job of choosing a coding system. Very often
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
715 you will want to find out afterwards which coding system was chosen.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
716
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
717 @defvar buffer-file-coding-system
87276
c9e81d5cb2e7 (Encoding and I/O): Reword to avoid saying
Martin Rudalics <rudalics@gmx.at>
parents: 84116
diff changeset
718 This buffer-local variable records the coding system used for saving the
c9e81d5cb2e7 (Encoding and I/O): Reword to avoid saying
Martin Rudalics <rudalics@gmx.at>
parents: 84116
diff changeset
719 buffer and for writing part of the buffer with @code{write-region}. If
c9e81d5cb2e7 (Encoding and I/O): Reword to avoid saying
Martin Rudalics <rudalics@gmx.at>
parents: 84116
diff changeset
720 the text to be written cannot be safely encoded using the coding system
c9e81d5cb2e7 (Encoding and I/O): Reword to avoid saying
Martin Rudalics <rudalics@gmx.at>
parents: 84116
diff changeset
721 specified by this variable, these operations select an alternative
c9e81d5cb2e7 (Encoding and I/O): Reword to avoid saying
Martin Rudalics <rudalics@gmx.at>
parents: 84116
diff changeset
722 encoding by calling the function @code{select-safe-coding-system}
c9e81d5cb2e7 (Encoding and I/O): Reword to avoid saying
Martin Rudalics <rudalics@gmx.at>
parents: 84116
diff changeset
723 (@pxref{User-Chosen Coding Systems}). If selecting a different encoding
c9e81d5cb2e7 (Encoding and I/O): Reword to avoid saying
Martin Rudalics <rudalics@gmx.at>
parents: 84116
diff changeset
724 requires to ask the user to specify a coding system,
c9e81d5cb2e7 (Encoding and I/O): Reword to avoid saying
Martin Rudalics <rudalics@gmx.at>
parents: 84116
diff changeset
725 @code{buffer-file-coding-system} is updated to the newly selected coding
c9e81d5cb2e7 (Encoding and I/O): Reword to avoid saying
Martin Rudalics <rudalics@gmx.at>
parents: 84116
diff changeset
726 system.
84090
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
727
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
728 @code{buffer-file-coding-system} does @emph{not} affect sending text
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
729 to a subprocess.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
730 @end defvar
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
731
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
732 @defvar save-buffer-coding-system
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
733 This variable specifies the coding system for saving the buffer (by
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
734 overriding @code{buffer-file-coding-system}). Note that it is not used
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
735 for @code{write-region}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
736
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
737 When a command to save the buffer starts out to use
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
738 @code{buffer-file-coding-system} (or @code{save-buffer-coding-system}),
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
739 and that coding system cannot handle
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
740 the actual text in the buffer, the command asks the user to choose
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
741 another coding system (by calling @code{select-safe-coding-system}).
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
742 After that happens, the command also updates
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
743 @code{buffer-file-coding-system} to represent the coding system that
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
744 the user specified.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
745 @end defvar
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
746
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
747 @defvar last-coding-system-used
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
748 I/O operations for files and subprocesses set this variable to the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
749 coding system name that was used. The explicit encoding and decoding
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
750 functions (@pxref{Explicit Encoding}) set it too.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
751
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
752 @strong{Warning:} Since receiving subprocess output sets this variable,
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
753 it can change whenever Emacs waits; therefore, you should copy the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
754 value shortly after the function call that stores the value you are
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
755 interested in.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
756 @end defvar
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
757
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
758 The variable @code{selection-coding-system} specifies how to encode
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
759 selections for the window system. @xref{Window System Selections}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
760
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
761 @defvar file-name-coding-system
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
762 The variable @code{file-name-coding-system} specifies the coding
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
763 system to use for encoding file names. Emacs encodes file names using
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
764 that coding system for all file operations. If
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
765 @code{file-name-coding-system} is @code{nil}, Emacs uses a default
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
766 coding system determined by the selected language environment. In the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
767 default language environment, any non-@acronym{ASCII} characters in
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
768 file names are not encoded specially; they appear in the file system
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
769 using the internal Emacs representation.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
770 @end defvar
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
771
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
772 @strong{Warning:} if you change @code{file-name-coding-system} (or
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
773 the language environment) in the middle of an Emacs session, problems
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
774 can result if you have already visited files whose names were encoded
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
775 using the earlier coding system and are handled differently under the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
776 new coding system. If you try to save one of these buffers under the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
777 visited file name, saving may use the wrong file name, or it may get
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
778 an error. If such a problem happens, use @kbd{C-x C-w} to specify a
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
779 new file name for that buffer.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
780
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
781 @node Lisp and Coding Systems
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
782 @subsection Coding Systems in Lisp
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
783
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
784 Here are the Lisp facilities for working with coding systems:
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
785
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
786 @defun coding-system-list &optional base-only
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
787 This function returns a list of all coding system names (symbols). If
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
788 @var{base-only} is non-@code{nil}, the value includes only the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
789 base coding systems. Otherwise, it includes alias and variant coding
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
790 systems as well.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
791 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
792
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
793 @defun coding-system-p object
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
794 This function returns @code{t} if @var{object} is a coding system
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
795 name or @code{nil}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
796 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
797
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
798 @defun check-coding-system coding-system
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
799 This function checks the validity of @var{coding-system}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
800 If that is valid, it returns @var{coding-system}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
801 Otherwise it signals an error with condition @code{coding-system-error}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
802 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
803
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
804 @defun coding-system-eol-type coding-system
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
805 This function returns the type of end-of-line (a.k.a.@: @dfn{eol})
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
806 conversion used by @var{coding-system}. If @var{coding-system}
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
807 specifies a certain eol conversion, the return value is an integer 0,
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
808 1, or 2, standing for @code{unix}, @code{dos}, and @code{mac},
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
809 respectively. If @var{coding-system} doesn't specify eol conversion
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
810 explicitly, the return value is a vector of coding systems, each one
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
811 with one of the possible eol conversion types, like this:
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
812
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
813 @lisp
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
814 (coding-system-eol-type 'latin-1)
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
815 @result{} [latin-1-unix latin-1-dos latin-1-mac]
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
816 @end lisp
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
817
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
818 @noindent
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
819 If this function returns a vector, Emacs will decide, as part of the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
820 text encoding or decoding process, what eol conversion to use. For
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
821 decoding, the end-of-line format of the text is auto-detected, and the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
822 eol conversion is set to match it (e.g., DOS-style CRLF format will
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
823 imply @code{dos} eol conversion). For encoding, the eol conversion is
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
824 taken from the appropriate default coding system (e.g.,
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
825 @code{default-buffer-file-coding-system} for
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
826 @code{buffer-file-coding-system}), or from the default eol conversion
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
827 appropriate for the underlying platform.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
828 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
829
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
830 @defun coding-system-change-eol-conversion coding-system eol-type
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
831 This function returns a coding system which is like @var{coding-system}
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
832 except for its eol conversion, which is specified by @code{eol-type}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
833 @var{eol-type} should be @code{unix}, @code{dos}, @code{mac}, or
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
834 @code{nil}. If it is @code{nil}, the returned coding system determines
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
835 the end-of-line conversion from the data.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
836
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
837 @var{eol-type} may also be 0, 1 or 2, standing for @code{unix},
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
838 @code{dos} and @code{mac}, respectively.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
839 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
840
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
841 @defun coding-system-change-text-conversion eol-coding text-coding
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
842 This function returns a coding system which uses the end-of-line
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
843 conversion of @var{eol-coding}, and the text conversion of
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
844 @var{text-coding}. If @var{text-coding} is @code{nil}, it returns
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
845 @code{undecided}, or one of its variants according to @var{eol-coding}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
846 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
847
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
848 @defun find-coding-systems-region from to
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
849 This function returns a list of coding systems that could be used to
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
850 encode a text between @var{from} and @var{to}. All coding systems in
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
851 the list can safely encode any multibyte characters in that portion of
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
852 the text.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
853
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
854 If the text contains no multibyte characters, the function returns the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
855 list @code{(undecided)}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
856 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
857
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
858 @defun find-coding-systems-string string
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
859 This function returns a list of coding systems that could be used to
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
860 encode the text of @var{string}. All coding systems in the list can
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
861 safely encode any multibyte characters in @var{string}. If the text
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
862 contains no multibyte characters, this returns the list
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
863 @code{(undecided)}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
864 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
865
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
866 @defun find-coding-systems-for-charsets charsets
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
867 This function returns a list of coding systems that could be used to
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
868 encode all the character sets in the list @var{charsets}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
869 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
870
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
871 @defun detect-coding-region start end &optional highest
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
872 This function chooses a plausible coding system for decoding the text
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
873 from @var{start} to @var{end}. This text should be a byte sequence
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
874 (@pxref{Explicit Encoding}).
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
875
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
876 Normally this function returns a list of coding systems that could
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
877 handle decoding the text that was scanned. They are listed in order of
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
878 decreasing priority. But if @var{highest} is non-@code{nil}, then the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
879 return value is just one coding system, the one that is highest in
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
880 priority.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
881
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
882 If the region contains only @acronym{ASCII} characters except for such
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
883 ISO-2022 control characters ISO-2022 as @code{ESC}, the value is
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
884 @code{undecided} or @code{(undecided)}, or a variant specifying
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
885 end-of-line conversion, if that can be deduced from the text.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
886 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
887
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
888 @defun detect-coding-string string &optional highest
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
889 This function is like @code{detect-coding-region} except that it
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
890 operates on the contents of @var{string} instead of bytes in the buffer.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
891 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
892
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
893 @xref{Coding systems for a subprocess,, Process Information}, in
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
894 particular the description of the functions
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
895 @code{process-coding-system} and @code{set-process-coding-system}, for
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
896 how to examine or set the coding systems used for I/O to a subprocess.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
897
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
898 @node User-Chosen Coding Systems
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
899 @subsection User-Chosen Coding Systems
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
900
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
901 @cindex select safe coding system
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
902 @defun select-safe-coding-system from to &optional default-coding-system accept-default-p file
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
903 This function selects a coding system for encoding specified text,
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
904 asking the user to choose if necessary. Normally the specified text
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
905 is the text in the current buffer between @var{from} and @var{to}. If
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
906 @var{from} is a string, the string specifies the text to encode, and
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
907 @var{to} is ignored.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
908
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
909 If @var{default-coding-system} is non-@code{nil}, that is the first
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
910 coding system to try; if that can handle the text,
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
911 @code{select-safe-coding-system} returns that coding system. It can
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
912 also be a list of coding systems; then the function tries each of them
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
913 one by one. After trying all of them, it next tries the current
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
914 buffer's value of @code{buffer-file-coding-system} (if it is not
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
915 @code{undecided}), then the value of
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
916 @code{default-buffer-file-coding-system} and finally the user's most
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
917 preferred coding system, which the user can set using the command
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
918 @code{prefer-coding-system} (@pxref{Recognize Coding,, Recognizing
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
919 Coding Systems, emacs, The GNU Emacs Manual}).
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
920
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
921 If one of those coding systems can safely encode all the specified
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
922 text, @code{select-safe-coding-system} chooses it and returns it.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
923 Otherwise, it asks the user to choose from a list of coding systems
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
924 which can encode all the text, and returns the user's choice.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
925
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
926 @var{default-coding-system} can also be a list whose first element is
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
927 t and whose other elements are coding systems. Then, if no coding
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
928 system in the list can handle the text, @code{select-safe-coding-system}
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
929 queries the user immediately, without trying any of the three
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
930 alternatives described above.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
931
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
932 The optional argument @var{accept-default-p}, if non-@code{nil},
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
933 should be a function to determine whether a coding system selected
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
934 without user interaction is acceptable. @code{select-safe-coding-system}
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
935 calls this function with one argument, the base coding system of the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
936 selected coding system. If @var{accept-default-p} returns @code{nil},
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
937 @code{select-safe-coding-system} rejects the silently selected coding
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
938 system, and asks the user to select a coding system from a list of
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
939 possible candidates.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
940
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
941 @vindex select-safe-coding-system-accept-default-p
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
942 If the variable @code{select-safe-coding-system-accept-default-p} is
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
943 non-@code{nil}, its value overrides the value of
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
944 @var{accept-default-p}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
945
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
946 As a final step, before returning the chosen coding system,
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
947 @code{select-safe-coding-system} checks whether that coding system is
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
948 consistent with what would be selected if the contents of the region
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
949 were read from a file. (If not, this could lead to data corruption in
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
950 a file subsequently re-visited and edited.) Normally,
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
951 @code{select-safe-coding-system} uses @code{buffer-file-name} as the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
952 file for this purpose, but if @var{file} is non-@code{nil}, it uses
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
953 that file instead (this can be relevant for @code{write-region} and
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
954 similar functions). If it detects an apparent inconsistency,
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
955 @code{select-safe-coding-system} queries the user before selecting the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
956 coding system.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
957 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
958
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
959 Here are two functions you can use to let the user specify a coding
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
960 system, with completion. @xref{Completion}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
961
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
962 @defun read-coding-system prompt &optional default
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
963 This function reads a coding system using the minibuffer, prompting with
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
964 string @var{prompt}, and returns the coding system name as a symbol. If
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
965 the user enters null input, @var{default} specifies which coding system
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
966 to return. It should be a symbol or a string.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
967 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
968
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
969 @defun read-non-nil-coding-system prompt
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
970 This function reads a coding system using the minibuffer, prompting with
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
971 string @var{prompt}, and returns the coding system name as a symbol. If
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
972 the user tries to enter null input, it asks the user to try again.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
973 @xref{Coding Systems}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
974 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
975
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
976 @node Default Coding Systems
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
977 @subsection Default Coding Systems
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
978
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
979 This section describes variables that specify the default coding
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
980 system for certain files or when running certain subprograms, and the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
981 function that I/O operations use to access them.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
982
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
983 The idea of these variables is that you set them once and for all to the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
984 defaults you want, and then do not change them again. To specify a
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
985 particular coding system for a particular operation in a Lisp program,
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
986 don't change these variables; instead, override them using
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
987 @code{coding-system-for-read} and @code{coding-system-for-write}
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
988 (@pxref{Specifying Coding Systems}).
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
989
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
990 @defvar auto-coding-regexp-alist
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
991 This variable is an alist of text patterns and corresponding coding
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
992 systems. Each element has the form @code{(@var{regexp}
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
993 . @var{coding-system})}; a file whose first few kilobytes match
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
994 @var{regexp} is decoded with @var{coding-system} when its contents are
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
995 read into a buffer. The settings in this alist take priority over
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
996 @code{coding:} tags in the files and the contents of
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
997 @code{file-coding-system-alist} (see below). The default value is set
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
998 so that Emacs automatically recognizes mail files in Babyl format and
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
999 reads them with no code conversions.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1000 @end defvar
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1001
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1002 @defvar file-coding-system-alist
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1003 This variable is an alist that specifies the coding systems to use for
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1004 reading and writing particular files. Each element has the form
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1005 @code{(@var{pattern} . @var{coding})}, where @var{pattern} is a regular
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1006 expression that matches certain file names. The element applies to file
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1007 names that match @var{pattern}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1008
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1009 The @sc{cdr} of the element, @var{coding}, should be either a coding
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1010 system, a cons cell containing two coding systems, or a function name (a
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1011 symbol with a function definition). If @var{coding} is a coding system,
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1012 that coding system is used for both reading the file and writing it. If
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1013 @var{coding} is a cons cell containing two coding systems, its @sc{car}
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1014 specifies the coding system for decoding, and its @sc{cdr} specifies the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1015 coding system for encoding.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1016
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1017 If @var{coding} is a function name, the function should take one
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1018 argument, a list of all arguments passed to
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1019 @code{find-operation-coding-system}. It must return a coding system
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1020 or a cons cell containing two coding systems. This value has the same
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1021 meaning as described above.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1022
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1023 If @var{coding} (or what returned by the above function) is
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1024 @code{undecided}, the normal code-detection is performed.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1025 @end defvar
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1026
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1027 @defvar process-coding-system-alist
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1028 This variable is an alist specifying which coding systems to use for a
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1029 subprocess, depending on which program is running in the subprocess. It
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1030 works like @code{file-coding-system-alist}, except that @var{pattern} is
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1031 matched against the program name used to start the subprocess. The coding
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1032 system or systems specified in this alist are used to initialize the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1033 coding systems used for I/O to the subprocess, but you can specify
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1034 other coding systems later using @code{set-process-coding-system}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1035 @end defvar
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1036
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1037 @strong{Warning:} Coding systems such as @code{undecided}, which
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1038 determine the coding system from the data, do not work entirely reliably
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1039 with asynchronous subprocess output. This is because Emacs handles
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1040 asynchronous subprocess output in batches, as it arrives. If the coding
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1041 system leaves the character code conversion unspecified, or leaves the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1042 end-of-line conversion unspecified, Emacs must try to detect the proper
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1043 conversion from one batch at a time, and this does not always work.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1044
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1045 Therefore, with an asynchronous subprocess, if at all possible, use a
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1046 coding system which determines both the character code conversion and
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1047 the end of line conversion---that is, one like @code{latin-1-unix},
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1048 rather than @code{undecided} or @code{latin-1}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1049
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1050 @defvar network-coding-system-alist
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1051 This variable is an alist that specifies the coding system to use for
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1052 network streams. It works much like @code{file-coding-system-alist},
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1053 with the difference that the @var{pattern} in an element may be either a
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1054 port number or a regular expression. If it is a regular expression, it
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1055 is matched against the network service name used to open the network
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1056 stream.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1057 @end defvar
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1058
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1059 @defvar default-process-coding-system
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1060 This variable specifies the coding systems to use for subprocess (and
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1061 network stream) input and output, when nothing else specifies what to
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1062 do.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1063
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1064 The value should be a cons cell of the form @code{(@var{input-coding}
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1065 . @var{output-coding})}. Here @var{input-coding} applies to input from
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1066 the subprocess, and @var{output-coding} applies to output to it.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1067 @end defvar
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1068
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1069 @defvar auto-coding-functions
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1070 This variable holds a list of functions that try to determine a
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1071 coding system for a file based on its undecoded contents.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1072
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1073 Each function in this list should be written to look at text in the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1074 current buffer, but should not modify it in any way. The buffer will
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1075 contain undecoded text of parts of the file. Each function should
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1076 take one argument, @var{size}, which tells it how many characters to
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1077 look at, starting from point. If the function succeeds in determining
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1078 a coding system for the file, it should return that coding system.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1079 Otherwise, it should return @code{nil}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1080
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1081 If a file has a @samp{coding:} tag, that takes precedence, so these
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1082 functions won't be called.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1083 @end defvar
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1084
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1085 @defun find-operation-coding-system operation &rest arguments
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1086 This function returns the coding system to use (by default) for
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1087 performing @var{operation} with @var{arguments}. The value has this
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1088 form:
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1089
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1090 @example
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1091 (@var{decoding-system} . @var{encoding-system})
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1092 @end example
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1093
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1094 The first element, @var{decoding-system}, is the coding system to use
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1095 for decoding (in case @var{operation} does decoding), and
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1096 @var{encoding-system} is the coding system for encoding (in case
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1097 @var{operation} does encoding).
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1098
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1099 The argument @var{operation} is a symbol, one of @code{write-region},
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1100 @code{start-process}, @code{call-process}, @code{call-process-region},
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1101 @code{insert-file-contents}, or @code{open-network-stream}. These are
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1102 the names of the Emacs I/O primitives that can do character code and
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1103 eol conversion.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1104
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1105 The remaining arguments should be the same arguments that might be given
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1106 to the corresponding I/O primitive. Depending on the primitive, one
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1107 of those arguments is selected as the @dfn{target}. For example, if
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1108 @var{operation} does file I/O, whichever argument specifies the file
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1109 name is the target. For subprocess primitives, the process name is the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1110 target. For @code{open-network-stream}, the target is the service name
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1111 or port number.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1112
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1113 Depending on @var{operation}, this function looks up the target in
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1114 @code{file-coding-system-alist}, @code{process-coding-system-alist},
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1115 or @code{network-coding-system-alist}. If the target is found in the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1116 alist, @code{find-operation-coding-system} returns its association in
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1117 the alist; otherwise it returns @code{nil}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1118
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1119 If @var{operation} is @code{insert-file-contents}, the argument
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1120 corresponding to the target may be a cons cell of the form
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1121 @code{(@var{filename} . @var{buffer})}). In that case, @var{filename}
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1122 is a file name to look up in @code{file-coding-system-alist}, and
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1123 @var{buffer} is a buffer that contains the file's contents (not yet
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1124 decoded). If @code{file-coding-system-alist} specifies a function to
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1125 call for this file, and that function needs to examine the file's
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1126 contents (as it usually does), it should examine the contents of
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1127 @var{buffer} instead of reading the file.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1128 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1129
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1130 @node Specifying Coding Systems
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1131 @subsection Specifying a Coding System for One Operation
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1132
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1133 You can specify the coding system for a specific operation by binding
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1134 the variables @code{coding-system-for-read} and/or
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1135 @code{coding-system-for-write}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1136
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1137 @defvar coding-system-for-read
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1138 If this variable is non-@code{nil}, it specifies the coding system to
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1139 use for reading a file, or for input from a synchronous subprocess.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1140
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1141 It also applies to any asynchronous subprocess or network stream, but in
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1142 a different way: the value of @code{coding-system-for-read} when you
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1143 start the subprocess or open the network stream specifies the input
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1144 decoding method for that subprocess or network stream. It remains in
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1145 use for that subprocess or network stream unless and until overridden.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1146
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1147 The right way to use this variable is to bind it with @code{let} for a
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1148 specific I/O operation. Its global value is normally @code{nil}, and
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1149 you should not globally set it to any other value. Here is an example
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1150 of the right way to use the variable:
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1151
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1152 @example
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1153 ;; @r{Read the file with no character code conversion.}
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1154 ;; @r{Assume @acronym{crlf} represents end-of-line.}
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1155 (let ((coding-system-for-read 'emacs-mule-dos))
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1156 (insert-file-contents filename))
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1157 @end example
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1158
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1159 When its value is non-@code{nil}, this variable takes precedence over
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1160 all other methods of specifying a coding system to use for input,
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1161 including @code{file-coding-system-alist},
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1162 @code{process-coding-system-alist} and
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1163 @code{network-coding-system-alist}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1164 @end defvar
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1165
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1166 @defvar coding-system-for-write
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1167 This works much like @code{coding-system-for-read}, except that it
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1168 applies to output rather than input. It affects writing to files,
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1169 as well as sending output to subprocesses and net connections.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1170
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1171 When a single operation does both input and output, as do
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1172 @code{call-process-region} and @code{start-process}, both
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1173 @code{coding-system-for-read} and @code{coding-system-for-write}
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1174 affect it.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1175 @end defvar
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1176
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1177 @defvar inhibit-eol-conversion
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1178 When this variable is non-@code{nil}, no end-of-line conversion is done,
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1179 no matter which coding system is specified. This applies to all the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1180 Emacs I/O and subprocess primitives, and to the explicit encoding and
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1181 decoding functions (@pxref{Explicit Encoding}).
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1182 @end defvar
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1183
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1184 @node Explicit Encoding
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1185 @subsection Explicit Encoding and Decoding
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1186 @cindex encoding in coding systems
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1187 @cindex decoding in coding systems
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1188
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1189 All the operations that transfer text in and out of Emacs have the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1190 ability to use a coding system to encode or decode the text.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1191 You can also explicitly encode and decode text using the functions
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1192 in this section.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1193
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1194 The result of encoding, and the input to decoding, are not ordinary
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1195 text. They logically consist of a series of byte values; that is, a
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1196 series of characters whose codes are in the range 0 through 255. In a
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1197 multibyte buffer or string, character codes 128 through 159 are
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1198 represented by multibyte sequences, but this is invisible to Lisp
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1199 programs.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1200
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1201 The usual way to read a file into a buffer as a sequence of bytes, so
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1202 you can decode the contents explicitly, is with
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1203 @code{insert-file-contents-literally} (@pxref{Reading from Files});
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1204 alternatively, specify a non-@code{nil} @var{rawfile} argument when
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1205 visiting a file with @code{find-file-noselect}. These methods result in
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1206 a unibyte buffer.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1207
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1208 The usual way to use the byte sequence that results from explicitly
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1209 encoding text is to copy it to a file or process---for example, to write
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1210 it with @code{write-region} (@pxref{Writing to Files}), and suppress
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1211 encoding by binding @code{coding-system-for-write} to
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1212 @code{no-conversion}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1213
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1214 Here are the functions to perform explicit encoding or decoding. The
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1215 encoding functions produce sequences of bytes; the decoding functions
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1216 are meant to operate on sequences of bytes. All of these functions
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1217 discard text properties.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1218
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1219 @deffn Command encode-coding-region start end coding-system
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1220 This command encodes the text from @var{start} to @var{end} according
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1221 to coding system @var{coding-system}. The encoded text replaces the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1222 original text in the buffer. The result of encoding is logically a
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1223 sequence of bytes, but the buffer remains multibyte if it was multibyte
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1224 before.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1225
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1226 This command returns the length of the encoded text.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1227 @end deffn
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1228
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1229 @defun encode-coding-string string coding-system &optional nocopy
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1230 This function encodes the text in @var{string} according to coding
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1231 system @var{coding-system}. It returns a new string containing the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1232 encoded text, except when @var{nocopy} is non-@code{nil}, in which
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1233 case the function may return @var{string} itself if the encoding
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1234 operation is trivial. The result of encoding is a unibyte string.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1235 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1236
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1237 @deffn Command decode-coding-region start end coding-system
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1238 This command decodes the text from @var{start} to @var{end} according
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1239 to coding system @var{coding-system}. The decoded text replaces the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1240 original text in the buffer. To make explicit decoding useful, the text
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1241 before decoding ought to be a sequence of byte values, but both
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1242 multibyte and unibyte buffers are acceptable.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1243
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1244 This command returns the length of the decoded text.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1245 @end deffn
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1246
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1247 @defun decode-coding-string string coding-system &optional nocopy
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1248 This function decodes the text in @var{string} according to coding
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1249 system @var{coding-system}. It returns a new string containing the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1250 decoded text, except when @var{nocopy} is non-@code{nil}, in which
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1251 case the function may return @var{string} itself if the decoding
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1252 operation is trivial. To make explicit decoding useful, the contents
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1253 of @var{string} ought to be a sequence of byte values, but a multibyte
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1254 string is acceptable.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1255 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1256
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1257 @defun decode-coding-inserted-region from to filename &optional visit beg end replace
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1258 This function decodes the text from @var{from} to @var{to} as if
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1259 it were being read from file @var{filename} using @code{insert-file-contents}
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1260 using the rest of the arguments provided.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1261
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1262 The normal way to use this function is after reading text from a file
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1263 without decoding, if you decide you would rather have decoded it.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1264 Instead of deleting the text and reading it again, this time with
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1265 decoding, you can call this function.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1266 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1267
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1268 @node Terminal I/O Encoding
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1269 @subsection Terminal I/O Encoding
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1270
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1271 Emacs can decode keyboard input using a coding system, and encode
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1272 terminal output. This is useful for terminals that transmit or display
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1273 text using a particular encoding such as Latin-1. Emacs does not set
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1274 @code{last-coding-system-used} for encoding or decoding for the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1275 terminal.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1276
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1277 @defun keyboard-coding-system
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1278 This function returns the coding system that is in use for decoding
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1279 keyboard input---or @code{nil} if no coding system is to be used.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1280 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1281
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1282 @deffn Command set-keyboard-coding-system coding-system
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1283 This command specifies @var{coding-system} as the coding system to
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1284 use for decoding keyboard input. If @var{coding-system} is @code{nil},
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1285 that means do not decode keyboard input.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1286 @end deffn
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1287
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1288 @defun terminal-coding-system
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1289 This function returns the coding system that is in use for encoding
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1290 terminal output---or @code{nil} for no encoding.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1291 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1292
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1293 @deffn Command set-terminal-coding-system coding-system
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1294 This command specifies @var{coding-system} as the coding system to use
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1295 for encoding terminal output. If @var{coding-system} is @code{nil},
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1296 that means do not encode terminal output.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1297 @end deffn
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1298
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1299 @node MS-DOS File Types
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1300 @subsection MS-DOS File Types
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1301 @cindex DOS file types
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1302 @cindex MS-DOS file types
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1303 @cindex Windows file types
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1304 @cindex file types on MS-DOS and Windows
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1305 @cindex text files and binary files
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1306 @cindex binary files and text files
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1307
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1308 On MS-DOS and Microsoft Windows, Emacs guesses the appropriate
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1309 end-of-line conversion for a file by looking at the file's name. This
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1310 feature classifies files as @dfn{text files} and @dfn{binary files}. By
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1311 ``binary file'' we mean a file of literal byte values that are not
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1312 necessarily meant to be characters; Emacs does no end-of-line conversion
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1313 and no character code conversion for them. On the other hand, the bytes
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1314 in a text file are intended to represent characters; when you create a
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1315 new file whose name implies that it is a text file, Emacs uses DOS
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1316 end-of-line conversion.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1317
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1318 @defvar buffer-file-type
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1319 This variable, automatically buffer-local in each buffer, records the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1320 file type of the buffer's visited file. When a buffer does not specify
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1321 a coding system with @code{buffer-file-coding-system}, this variable is
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1322 used to determine which coding system to use when writing the contents
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1323 of the buffer. It should be @code{nil} for text, @code{t} for binary.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1324 If it is @code{t}, the coding system is @code{no-conversion}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1325 Otherwise, @code{undecided-dos} is used.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1326
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1327 Normally this variable is set by visiting a file; it is set to
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1328 @code{nil} if the file was visited without any actual conversion.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1329 @end defvar
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1330
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1331 @defopt file-name-buffer-file-type-alist
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1332 This variable holds an alist for recognizing text and binary files.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1333 Each element has the form (@var{regexp} . @var{type}), where
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1334 @var{regexp} is matched against the file name, and @var{type} may be
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1335 @code{nil} for text, @code{t} for binary, or a function to call to
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1336 compute which. If it is a function, then it is called with a single
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1337 argument (the file name) and should return @code{t} or @code{nil}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1338
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1339 When running on MS-DOS or MS-Windows, Emacs checks this alist to decide
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1340 which coding system to use when reading a file. For a text file,
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1341 @code{undecided-dos} is used. For a binary file, @code{no-conversion}
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1342 is used.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1343
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1344 If no element in this alist matches a given file name, then
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1345 @code{default-buffer-file-type} says how to treat the file.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1346 @end defopt
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1347
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1348 @defopt default-buffer-file-type
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1349 This variable says how to handle files for which
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1350 @code{file-name-buffer-file-type-alist} says nothing about the type.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1351
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1352 If this variable is non-@code{nil}, then these files are treated as
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1353 binary: the coding system @code{no-conversion} is used. Otherwise,
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1354 nothing special is done for them---the coding system is deduced solely
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1355 from the file contents, in the usual Emacs fashion.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1356 @end defopt
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1357
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1358 @node Input Methods
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1359 @section Input Methods
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1360 @cindex input methods
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1361
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1362 @dfn{Input methods} provide convenient ways of entering non-@acronym{ASCII}
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1363 characters from the keyboard. Unlike coding systems, which translate
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1364 non-@acronym{ASCII} characters to and from encodings meant to be read by
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1365 programs, input methods provide human-friendly commands. (@xref{Input
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1366 Methods,,, emacs, The GNU Emacs Manual}, for information on how users
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1367 use input methods to enter text.) How to define input methods is not
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1368 yet documented in this manual, but here we describe how to use them.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1369
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1370 Each input method has a name, which is currently a string;
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1371 in the future, symbols may also be usable as input method names.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1372
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1373 @defvar current-input-method
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1374 This variable holds the name of the input method now active in the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1375 current buffer. (It automatically becomes local in each buffer when set
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1376 in any fashion.) It is @code{nil} if no input method is active in the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1377 buffer now.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1378 @end defvar
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1379
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1380 @defopt default-input-method
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1381 This variable holds the default input method for commands that choose an
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1382 input method. Unlike @code{current-input-method}, this variable is
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1383 normally global.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1384 @end defopt
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1385
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1386 @deffn Command set-input-method input-method
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1387 This command activates input method @var{input-method} for the current
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1388 buffer. It also sets @code{default-input-method} to @var{input-method}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1389 If @var{input-method} is @code{nil}, this command deactivates any input
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1390 method for the current buffer.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1391 @end deffn
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1392
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1393 @defun read-input-method-name prompt &optional default inhibit-null
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1394 This function reads an input method name with the minibuffer, prompting
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1395 with @var{prompt}. If @var{default} is non-@code{nil}, that is returned
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1396 by default, if the user enters empty input. However, if
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1397 @var{inhibit-null} is non-@code{nil}, empty input signals an error.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1398
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1399 The returned value is a string.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1400 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1401
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1402 @defvar input-method-alist
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1403 This variable defines all the supported input methods.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1404 Each element defines one input method, and should have the form:
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1405
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1406 @example
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1407 (@var{input-method} @var{language-env} @var{activate-func}
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1408 @var{title} @var{description} @var{args}...)
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1409 @end example
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1410
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1411 Here @var{input-method} is the input method name, a string;
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1412 @var{language-env} is another string, the name of the language
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1413 environment this input method is recommended for. (That serves only for
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1414 documentation purposes.)
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1415
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1416 @var{activate-func} is a function to call to activate this method. The
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1417 @var{args}, if any, are passed as arguments to @var{activate-func}. All
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1418 told, the arguments to @var{activate-func} are @var{input-method} and
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1419 the @var{args}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1420
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1421 @var{title} is a string to display in the mode line while this method is
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1422 active. @var{description} is a string describing this method and what
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1423 it is good for.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1424 @end defvar
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1425
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1426 The fundamental interface to input methods is through the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1427 variable @code{input-method-function}. @xref{Reading One Event},
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1428 and @ref{Invoking the Input Method}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1429
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1430 @node Locales
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1431 @section Locales
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1432 @cindex locale
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1433
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1434 POSIX defines a concept of ``locales'' which control which language
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1435 to use in language-related features. These Emacs variables control
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1436 how Emacs interacts with these features.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1437
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1438 @defvar locale-coding-system
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1439 @cindex keyboard input decoding on X
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1440 This variable specifies the coding system to use for decoding system
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1441 error messages and---on X Window system only---keyboard input, for
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1442 encoding the format argument to @code{format-time-string}, and for
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1443 decoding the return value of @code{format-time-string}.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1444 @end defvar
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1445
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1446 @defvar system-messages-locale
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1447 This variable specifies the locale to use for generating system error
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1448 messages. Changing the locale can cause messages to come out in a
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1449 different language or in a different orthography. If the variable is
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1450 @code{nil}, the locale is specified by environment variables in the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1451 usual POSIX fashion.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1452 @end defvar
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1453
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1454 @defvar system-time-locale
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1455 This variable specifies the locale to use for formatting time values.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1456 Changing the locale can cause messages to appear according to the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1457 conventions of a different language. If the variable is @code{nil}, the
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1458 locale is specified by environment variables in the usual POSIX fashion.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1459 @end defvar
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1460
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1461 @defun locale-info item
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1462 This function returns locale data @var{item} for the current POSIX
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1463 locale, if available. @var{item} should be one of these symbols:
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1464
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1465 @table @code
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1466 @item codeset
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1467 Return the character set as a string (locale item @code{CODESET}).
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1468
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1469 @item days
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1470 Return a 7-element vector of day names (locale items
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1471 @code{DAY_1} through @code{DAY_7});
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1472
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1473 @item months
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1474 Return a 12-element vector of month names (locale items @code{MON_1}
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1475 through @code{MON_12}).
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1476
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1477 @item paper
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1478 Return a list @code{(@var{width} @var{height})} for the default paper
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1479 size measured in millimeters (locale items @code{PAPER_WIDTH} and
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1480 @code{PAPER_HEIGHT}).
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1481 @end table
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1482
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1483 If the system can't provide the requested information, or if
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1484 @var{item} is not one of those symbols, the value is @code{nil}. All
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1485 strings in the return value are decoded using
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1486 @code{locale-coding-system}. @xref{Locales,,, libc, The GNU Libc Manual},
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1487 for more information about locales and locale items.
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1488 @end defun
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1489
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1490 @ignore
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1491 arch-tag: be705bf8-941b-4c35-84fc-ad7d20ddb7cb
e7e0d9a379c7 Move here from ../../lispref
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1492 @end ignore