annotate lispref/nonascii.texi @ 36150:46e59561af4c

Display Vars node renamed Display Custom. Include info there about customizing cursor appearance. Clean up aggressive scrolling. Clarify horizontal scrolling discussion. Fix index entries for line number mode.
author Richard M. Stallman <rms@gnu.org>
date Sat, 17 Feb 2001 16:45:37 +0000
parents e1d9a16467ae
children 8f8df4d24f48
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1 @c -*-texinfo-*-
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
2 @c This is part of the GNU Emacs Lisp Reference Manual.
27189
d2e5f1b7d8e2 Update copyrights.
Gerd Moellmann <gerd@gnu.org>
parents: 27187
diff changeset
3 @c Copyright (C) 1998, 1999 Free Software Foundation, Inc.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
4 @c See the file elisp.texi for copying conditions.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
5 @setfilename ../info/characters
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
6 @node Non-ASCII Characters, Searching and Matching, Text, Top
27374
0f5edee5242b *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27362
diff changeset
7 @chapter Non-@sc{ascii} Characters
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
8 @cindex multibyte characters
27374
0f5edee5242b *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27362
diff changeset
9 @cindex non-@sc{ascii} characters
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
10
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
11 This chapter covers the special issues relating to non-@sc{ascii}
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
12 characters and how they are stored in strings and buffers.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
13
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
14 @menu
28635
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
15 * Text Representations:: Unibyte and multibyte representations
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
16 * Converting Representations:: Converting unibyte to multibyte and vice versa.
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
17 * Selecting a Representation:: Treating a byte sequence as unibyte or multi.
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
18 * Character Codes:: How unibyte and multibyte relate to
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
19 codes of individual characters.
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
20 * Character Sets:: The space of possible characters codes
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
21 is divided into various character sets.
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
22 * Chars and Bytes:: More information about multibyte encodings.
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
23 * Splitting Characters:: Converting a character to its byte sequence.
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
24 * Scanning Charsets:: Which character sets are used in a buffer?
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
25 * Translation of Characters:: Translation tables are used for conversion.
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
26 * Coding Systems:: Coding systems are conversions for saving files.
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
27 * Input Methods:: Input methods allow users to enter various
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
28 non-ASCII characters without speciak keyboards.
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
29 * Locales:: Interacting with the POSIX locale.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
30 @end menu
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
31
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
32 @node Text Representations
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
33 @section Text Representations
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
34 @cindex text representations
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
35
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
36 Emacs has two @dfn{text representations}---two ways to represent text
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
37 in a string or buffer. These are called @dfn{unibyte} and
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
38 @dfn{multibyte}. Each string, and each buffer, uses one of these two
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
39 representations. For most purposes, you can ignore the issue of
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
40 representations, because Emacs converts text between them as
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
41 appropriate. Occasionally in Lisp programming you will need to pay
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
42 attention to the difference.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
43
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
44 @cindex unibyte text
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
45 In unibyte representation, each character occupies one byte and
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
46 therefore the possible character codes range from 0 to 255. Codes 0
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
47 through 127 are @sc{ascii} characters; the codes from 128 through 255
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
48 are used for one non-@sc{ascii} character set (you can choose which
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
49 character set by setting the variable @code{nonascii-insert-offset}).
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
50
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
51 @cindex leading code
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
52 @cindex multibyte text
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
53 @cindex trailing codes
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
54 In multibyte representation, a character may occupy more than one
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
55 byte, and as a result, the full range of Emacs character codes can be
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
56 stored. The first byte of a multibyte character is always in the range
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
57 128 through 159 (octal 0200 through 0237). These values are called
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
58 @dfn{leading codes}. The second and subsequent bytes of a multibyte
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
59 character are always in the range 160 through 255 (octal 0240 through
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
60 0377); these values are @dfn{trailing codes}.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
61
28877
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
62 Some sequences of bytes are not valid in multibyte text: for example,
32523
4881cd839f12 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 29339
diff changeset
63 a single isolated byte in the range 128 through 159 is not allowed. But
4881cd839f12 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 29339
diff changeset
64 character codes 128 through 159 can appear in multibyte text,
4881cd839f12 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 29339
diff changeset
65 represented as two-byte sequences. All the character codes 128 through
4881cd839f12 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 29339
diff changeset
66 255 are possible (though slightly abnormal) in multibyte text; they
28877
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
67 appear in multibyte buffers and strings when you do explicit encoding
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
68 and decoding (@pxref{Explicit Encoding}).
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
69
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
70 In a buffer, the buffer-local value of the variable
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
71 @code{enable-multibyte-characters} specifies the representation used.
24952
a6db4671c7a0 *** empty log message ***
Karl Heuer <kwzh@gnu.org>
parents: 24951
diff changeset
72 The representation for a string is determined and recorded in the string
a6db4671c7a0 *** empty log message ***
Karl Heuer <kwzh@gnu.org>
parents: 24951
diff changeset
73 when the string is constructed.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
74
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
75 @defvar enable-multibyte-characters
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
76 This variable specifies the current buffer's text representation.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
77 If it is non-@code{nil}, the buffer contains multibyte text; otherwise,
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
78 it contains unibyte text.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
79
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
80 You cannot set this variable directly; instead, use the function
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
81 @code{set-buffer-multibyte} to change a buffer's representation.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
82 @end defvar
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
83
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
84 @defvar default-enable-multibyte-characters
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
85 This variable's value is entirely equivalent to @code{(default-value
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
86 'enable-multibyte-characters)}, and setting this variable changes that
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
87 default value. Setting the local binding of
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
88 @code{enable-multibyte-characters} in a specific buffer is not allowed,
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
89 but changing the default value is supported, and it is a reasonable
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
90 thing to do, because it has no effect on existing buffers.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
91
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
92 The @samp{--unibyte} command line option does its job by setting the
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
93 default value to @code{nil} early in startup.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
94 @end defvar
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
95
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
96 @defun position-bytes position
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
97 @tindex position-bytes
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
98 Return the byte-position corresponding to buffer position @var{position}
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
99 in the current buffer.
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
100 @end defun
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
101
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
102 @defun byte-to-position byte-position
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
103 @tindex byte-to-position
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
104 Return the buffer position corresponding to byte-position
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
105 @var{byte-position} in the current buffer.
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
106 @end defun
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
107
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
108 @defun multibyte-string-p string
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
109 Return @code{t} if @var{string} is a multibyte string.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
110 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
111
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
112 @node Converting Representations
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
113 @section Converting Text Representations
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
114
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
115 Emacs can convert unibyte text to multibyte; it can also convert
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
116 multibyte text to unibyte, though this conversion loses information. In
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
117 general these conversions happen when inserting text into a buffer, or
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
118 when putting text from several strings together in one string. You can
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
119 also explicitly convert a string's contents to either representation.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
120
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
121 Emacs chooses the representation for a string based on the text that
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
122 it is constructed from. The general rule is to convert unibyte text to
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
123 multibyte text when combining it with other multibyte text, because the
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
124 multibyte representation is more general and can hold whatever
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
125 characters the unibyte text has.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
126
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
127 When inserting text into a buffer, Emacs converts the text to the
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
128 buffer's representation, as specified by
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
129 @code{enable-multibyte-characters} in that buffer. In particular, when
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
130 you insert multibyte text into a unibyte buffer, Emacs converts the text
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
131 to unibyte, even though this conversion cannot in general preserve all
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
132 the characters that might be in the multibyte text. The other natural
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
133 alternative, to convert the buffer contents to multibyte, is not
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
134 acceptable because the buffer's representation is a choice made by the
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
135 user that cannot be overridden automatically.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
136
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
137 Converting unibyte text to multibyte text leaves @sc{ascii} characters
32523
4881cd839f12 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 29339
diff changeset
138 unchanged, and likewise character codes 128 through 159. It converts
4881cd839f12 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 29339
diff changeset
139 the non-@sc{ascii} codes 160 through 255 by adding the value
4881cd839f12 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 29339
diff changeset
140 @code{nonascii-insert-offset} to each character code. By setting this
4881cd839f12 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 29339
diff changeset
141 variable, you specify which character set the unibyte characters
4881cd839f12 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 29339
diff changeset
142 correspond to (@pxref{Character Sets}). For example, if
4881cd839f12 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 29339
diff changeset
143 @code{nonascii-insert-offset} is 2048, which is @code{(- (make-char
4881cd839f12 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 29339
diff changeset
144 'latin-iso8859-1) 128)}, then the unibyte non-@sc{ascii} characters
4881cd839f12 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 29339
diff changeset
145 correspond to Latin 1. If it is 2688, which is @code{(- (make-char
4881cd839f12 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 29339
diff changeset
146 'greek-iso8859-7) 128)}, then they correspond to Greek letters.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
147
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
148 Converting multibyte text to unibyte is simpler: it discards all but
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
149 the low 8 bits of each character code. If @code{nonascii-insert-offset}
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
150 has a reasonable value, corresponding to the beginning of some character
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
151 set, this conversion is the inverse of the other: converting unibyte
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
152 text to multibyte and back to unibyte reproduces the original unibyte
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
153 text.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
154
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
155 @defvar nonascii-insert-offset
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
156 This variable specifies the amount to add to a non-@sc{ascii} character
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
157 when converting unibyte text to multibyte. It also applies when
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
158 @code{self-insert-command} inserts a character in the unibyte
29339
d831c2ad9313 Fix xref
Dave Love <fx@gnu.org>
parents: 29265
diff changeset
159 non-@sc{ascii} range, 128 through 255. However, the functions
29265
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
160 @code{insert} and @code{insert-char} do not perform this conversion.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
161
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
162 The right value to use to select character set @var{cs} is @code{(-
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
163 (make-char @var{cs}) 128)}. If the value of
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
164 @code{nonascii-insert-offset} is zero, then conversion actually uses the
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
165 value for the Latin 1 character set, rather than zero.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
166 @end defvar
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
167
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
168 @defvar nonascii-translation-table
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
169 This variable provides a more general alternative to
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
170 @code{nonascii-insert-offset}. You can use it to specify independently
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
171 how to translate each code in the range of 128 through 255 into a
29265
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
172 multibyte character. The value should be a char-table, or @code{nil}.
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
173 If this is non-@code{nil}, it overrides @code{nonascii-insert-offset}.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
174 @end defvar
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
175
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
176 @defun string-make-unibyte string
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
177 This function converts the text of @var{string} to unibyte
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
178 representation, if it isn't already, and returns the result. If
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
179 @var{string} is a unibyte string, it is returned unchanged.
33912
67b6bdbd95c6 8-bit tweaks
Dave Love <fx@gnu.org>
parents: 32523
diff changeset
180 Multibyte character codes are converted to unibyte
67b6bdbd95c6 8-bit tweaks
Dave Love <fx@gnu.org>
parents: 32523
diff changeset
181 by using just the low 8 bits.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
182 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
183
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
184 @defun string-make-multibyte string
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
185 This function converts the text of @var{string} to multibyte
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
186 representation, if it isn't already, and returns the result. If
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
187 @var{string} is a multibyte string, it is returned unchanged.
33912
67b6bdbd95c6 8-bit tweaks
Dave Love <fx@gnu.org>
parents: 32523
diff changeset
188 The function @code{unibyte-char-to-multibyte} is used to convert
67b6bdbd95c6 8-bit tweaks
Dave Love <fx@gnu.org>
parents: 32523
diff changeset
189 each unibyte character to a multibyte character.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
190 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
191
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
192 @node Selecting a Representation
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
193 @section Selecting a Representation
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
194
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
195 Sometimes it is useful to examine an existing buffer or string as
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
196 multibyte when it was unibyte, or vice versa.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
197
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
198 @defun set-buffer-multibyte multibyte
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
199 Set the representation type of the current buffer. If @var{multibyte}
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
200 is non-@code{nil}, the buffer becomes multibyte. If @var{multibyte}
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
201 is @code{nil}, the buffer becomes unibyte.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
202
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
203 This function leaves the buffer contents unchanged when viewed as a
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
204 sequence of bytes. As a consequence, it can change the contents viewed
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
205 as characters; a sequence of two bytes which is treated as one character
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
206 in multibyte representation will count as two characters in unibyte
29265
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
207 representation. Character codes 128 through 159 are an exception. They
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
208 are represented by one byte in a unibyte buffer, but when the buffer is
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
209 set to multibyte, they are converted to two-byte sequences, and vice
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
210 versa.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
211
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
212 This function sets @code{enable-multibyte-characters} to record which
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
213 representation is in use. It also adjusts various data in the buffer
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
214 (including overlays, text properties and markers) so that they cover the
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
215 same text as they did before.
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
216
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
217 You cannot use @code{set-buffer-multibyte} on an indirect buffer,
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
218 because indirect buffers always inherit the representation of the
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
219 base buffer.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
220 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
221
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
222 @defun string-as-unibyte string
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
223 This function returns a string with the same bytes as @var{string} but
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
224 treating each byte as a character. This means that the value may have
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
225 more characters than @var{string} has.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
226
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
227 If @var{string} is already a unibyte string, then the value is
33912
67b6bdbd95c6 8-bit tweaks
Dave Love <fx@gnu.org>
parents: 32523
diff changeset
228 @var{string} itself. Otherwise it is a newly created string, with no
67b6bdbd95c6 8-bit tweaks
Dave Love <fx@gnu.org>
parents: 32523
diff changeset
229 text properties. If @var{string} is multibyte, any characters it
67b6bdbd95c6 8-bit tweaks
Dave Love <fx@gnu.org>
parents: 32523
diff changeset
230 contains of charset @var{eight-bit-control} or @var{eight-bit-graphic}
67b6bdbd95c6 8-bit tweaks
Dave Love <fx@gnu.org>
parents: 32523
diff changeset
231 are converted to the corresponding single byte.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
232 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
233
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
234 @defun string-as-multibyte string
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
235 This function returns a string with the same bytes as @var{string} but
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
236 treating each multibyte sequence as one character. This means that the
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
237 value may have fewer characters than @var{string} has.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
238
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
239 If @var{string} is already a multibyte string, then the value is
33912
67b6bdbd95c6 8-bit tweaks
Dave Love <fx@gnu.org>
parents: 32523
diff changeset
240 @var{string} itself. Otherwise it is a newly created string, with no
67b6bdbd95c6 8-bit tweaks
Dave Love <fx@gnu.org>
parents: 32523
diff changeset
241 text properties. If @var{string} is unibyte and contains any individual
67b6bdbd95c6 8-bit tweaks
Dave Love <fx@gnu.org>
parents: 32523
diff changeset
242 8-bit bytes (i.e.@: not part of a multibyte form), they are converted to
67b6bdbd95c6 8-bit tweaks
Dave Love <fx@gnu.org>
parents: 32523
diff changeset
243 the corresponding multibyte character of charset @var{eight-bit-control}
67b6bdbd95c6 8-bit tweaks
Dave Love <fx@gnu.org>
parents: 32523
diff changeset
244 or @var{eight-bit-graphic}.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
245 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
246
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
247 @node Character Codes
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
248 @section Character Codes
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
249 @cindex character codes
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
250
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
251 The unibyte and multibyte text representations use different character
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
252 codes. The valid character codes for unibyte representation range from
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
253 0 to 255---the values that can fit in one byte. The valid character
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
254 codes for multibyte representation range from 0 to 524287, but not all
28877
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
255 values in that range are valid. The values 128 through 255 are not
32523
4881cd839f12 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 29339
diff changeset
256 entirely proper in multibyte text, but they can occur if you do explicit
28877
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
257 encoding and decoding (@pxref{Explicit Encoding}). Some other character
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
258 codes cannot occur at all in multibyte text. Only the @sc{ascii} codes
32523
4881cd839f12 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 29339
diff changeset
259 0 through 127 are completely legitimate in both representations.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
260
29265
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
261 @defun char-valid-p charcode &optional genericp
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
262 This returns @code{t} if @var{charcode} is valid for either one of the two
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
263 text representations.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
264
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
265 @example
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
266 (char-valid-p 65)
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
267 @result{} t
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
268 (char-valid-p 256)
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
269 @result{} nil
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
270 (char-valid-p 2248)
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
271 @result{} t
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
272 @end example
29265
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
273
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
274 If the optional argument @var{genericp} is non-nil, this function
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
275 returns @code{t} if @var{charcode} is a generic character
29339
d831c2ad9313 Fix xref
Dave Love <fx@gnu.org>
parents: 29265
diff changeset
276 (@pxref{Splitting Characters}).
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
277 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
278
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
279 @node Character Sets
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
280 @section Character Sets
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
281 @cindex character sets
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
282
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
283 Emacs classifies characters into various @dfn{character sets}, each of
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
284 which has a name which is a symbol. Each character belongs to one and
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
285 only one character set.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
286
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
287 In general, there is one character set for each distinct script. For
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
288 example, @code{latin-iso8859-1} is one character set,
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
289 @code{greek-iso8859-7} is another, and @code{ascii} is another. An
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
290 Emacs character set can hold at most 9025 characters; therefore, in some
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
291 cases, characters that would logically be grouped together are split
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
292 into several character sets. For example, one set of Chinese
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
293 characters, generally known as Big 5, is divided into two Emacs
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
294 character sets, @code{chinese-big5-1} and @code{chinese-big5-2}.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
295
28900
ac620ff5fd5d *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28887
diff changeset
296 @sc{ascii} characters are in character set @code{ascii}. The
ac620ff5fd5d *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28887
diff changeset
297 non-@sc{ascii} characters 128 through 159 are in character set
ac620ff5fd5d *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28887
diff changeset
298 @code{eight-bit-control}, and codes 160 through 255 are in character set
ac620ff5fd5d *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28887
diff changeset
299 @code{eight-bit-graphic}.
ac620ff5fd5d *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28887
diff changeset
300
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
301 @defun charsetp object
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
302 Returns @code{t} if @var{object} is a symbol that names a character set,
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
303 @code{nil} otherwise.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
304 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
305
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
306 @defun charset-list
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
307 This function returns a list of all defined character set names.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
308 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
309
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
310 @defun char-charset character
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
311 This function returns the name of the character set that @var{character}
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
312 belongs to.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
313 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
314
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
315 @defun charset-plist charset
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
316 @tindex charset-plist
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
317 This function returns the charset property list of the character set
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
318 @var{charset}. Although @var{charset} is a symbol, this is not the same
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
319 as the property list of that symbol. Charset properties are used for
29265
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
320 special purposes within Emacs; for example,
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
321 @code{preferred-coding-system} helps determine which coding system to
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
322 use to encode characters in a charset.
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
323 @end defun
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
324
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
325 @node Chars and Bytes
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
326 @section Characters and Bytes
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
327 @cindex bytes and characters
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
328
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
329 @cindex introduction sequence
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
330 @cindex dimension (of character set)
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
331 In multibyte representation, each character occupies one or more
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
332 bytes. Each character set has an @dfn{introduction sequence}, which is
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
333 normally one or two bytes long. (Exception: the @sc{ascii} character
29265
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
334 set and the @sc{eight-bit-graphic} character set have a zero-length
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
335 introduction sequence.) The introduction sequence is the beginning of
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
336 the byte sequence for any character in the character set. The rest of
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
337 the character's bytes distinguish it from the other characters in the
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
338 same character set. Depending on the character set, there are either
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
339 one or two distinguishing bytes; the number of such bytes is called the
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
340 @dfn{dimension} of the character set.
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
341
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
342 @defun charset-dimension charset
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
343 This function returns the dimension of @var{charset}; at present, the
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
344 dimension is always 1 or 2.
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
345 @end defun
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
346
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
347 @defun charset-bytes charset
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
348 @tindex charset-bytes
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
349 This function returns the number of bytes used to represent a character
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
350 in character set @var{charset}.
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
351 @end defun
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
352
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
353 This is the simplest way to determine the byte length of a character
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
354 set's introduction sequence:
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
355
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
356 @example
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
357 (- (charset-bytes @var{charset})
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
358 (charset-dimension @var{charset}))
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
359 @end example
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
360
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
361 @node Splitting Characters
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
362 @section Splitting Characters
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
363
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
364 The functions in this section convert between characters and the byte
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
365 values used to represent them. For most purposes, there is no need to
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
366 be concerned with the sequence of bytes used to represent a character,
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
367 because Emacs translates automatically when necessary.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
368
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
369 @defun split-char character
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
370 Return a list containing the name of the character set of
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
371 @var{character}, followed by one or two byte values (integers) which
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
372 identify @var{character} within that character set. The number of byte
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
373 values is the character set's dimension.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
374
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
375 @example
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
376 (split-char 2248)
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
377 @result{} (latin-iso8859-1 72)
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
378 (split-char 65)
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
379 @result{} (ascii 65)
29265
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
380 (split-char 128)
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
381 @result{} (eight-bit-control 128)
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
382 @end example
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
383 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
384
34811
c2170032744b make-char change
Dave Love <fx@gnu.org>
parents: 33912
diff changeset
385 @defun make-char charset &optional code1 code2
c2170032744b make-char change
Dave Love <fx@gnu.org>
parents: 33912
diff changeset
386 This function returns the character in character set @var{charset} whose
c2170032744b make-char change
Dave Love <fx@gnu.org>
parents: 33912
diff changeset
387 position codes are @var{code1} and @var{code2}. This is roughly the
c2170032744b make-char change
Dave Love <fx@gnu.org>
parents: 33912
diff changeset
388 inverse of @code{split-char}. Normally, you should specify either one
c2170032744b make-char change
Dave Love <fx@gnu.org>
parents: 33912
diff changeset
389 or both of @var{code1} and @var{code2} according to the dimension of
c2170032744b make-char change
Dave Love <fx@gnu.org>
parents: 33912
diff changeset
390 @var{charset}. For example,
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
391
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
392 @example
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
393 (make-char 'latin-iso8859-1 72)
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
394 @result{} 2248
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
395 @end example
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
396 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
397
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
398 @cindex generic characters
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
399 If you call @code{make-char} with no @var{byte-values}, the result is
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
400 a @dfn{generic character} which stands for @var{charset}. A generic
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
401 character is an integer, but it is @emph{not} valid for insertion in the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
402 buffer as a character. It can be used in @code{char-table-range} to
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
403 refer to the whole character set (@pxref{Char-Tables}).
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
404 @code{char-valid-p} returns @code{nil} for generic characters.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
405 For example:
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
406
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
407 @example
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
408 (make-char 'latin-iso8859-1)
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
409 @result{} 2176
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
410 (char-valid-p 2176)
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
411 @result{} nil
29265
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
412 (char-valid-p 2176 t)
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
413 @result{} t
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
414 (split-char 2176)
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
415 @result{} (latin-iso8859-1 0)
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
416 @end example
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
417
29265
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
418 The character sets @sc{ascii}, @sc{eight-bit-control}, and
34811
c2170032744b make-char change
Dave Love <fx@gnu.org>
parents: 33912
diff changeset
419 @sc{eight-bit-graphic} don't have corresponding generic characters. If
c2170032744b make-char change
Dave Love <fx@gnu.org>
parents: 33912
diff changeset
420 @var{charset} is one of them and you don't supply @var{code1},
c2170032744b make-char change
Dave Love <fx@gnu.org>
parents: 33912
diff changeset
421 @code{make-char} returns the character code corresponding to the
c2170032744b make-char change
Dave Love <fx@gnu.org>
parents: 33912
diff changeset
422 smallest code in @var{charset}.
29265
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
423
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
424 @node Scanning Charsets
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
425 @section Scanning for Character Sets
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
426
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
427 Sometimes it is useful to find out which character sets appear in a
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
428 part of a buffer or a string. One use for this is in determining which
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
429 coding systems (@pxref{Coding Systems}) are capable of representing all
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
430 of the text in question.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
431
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
432 @defun find-charset-region beg end &optional translation
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
433 This function returns a list of the character sets that appear in the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
434 current buffer between positions @var{beg} and @var{end}.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
435
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
436 The optional argument @var{translation} specifies a translation table to
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
437 be used in scanning the text (@pxref{Translation of Characters}). If it
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
438 is non-@code{nil}, then each character in the region is translated
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
439 through this table, and the value returned describes the translated
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
440 characters instead of the characters actually in the buffer.
28887
0778eff185b6 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28877
diff changeset
441 @end defun
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
442
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
443 @defun find-charset-string string &optional translation
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
444 This function returns a list of the character sets that appear in the
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
445 string @var{string}. It is just like @code{find-charset-region}, except
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
446 that it applies to the contents of @var{string} instead of part of the
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
447 current buffer.
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
448 @end defun
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
449
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
450 @node Translation of Characters
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
451 @section Translation of Characters
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
452 @cindex character translation tables
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
453 @cindex translation tables
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
454
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
455 A @dfn{translation table} specifies a mapping of characters
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
456 into characters. These tables are used in encoding and decoding, and
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
457 for other purposes. Some coding systems specify their own particular
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
458 translation tables; there are also default translation tables which
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
459 apply to all other coding systems.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
460
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
461 @defun make-translation-table &rest translations
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
462 This function returns a translation table based on the argument
35752
e1d9a16467ae *** empty log message ***
Dave Love <fx@gnu.org>
parents: 35493
diff changeset
463 @var{translations}. Each element of @var{translations} should be a
e1d9a16467ae *** empty log message ***
Dave Love <fx@gnu.org>
parents: 35493
diff changeset
464 list of elements of the form @code{(@var{from} . @var{to})}; this says
e1d9a16467ae *** empty log message ***
Dave Love <fx@gnu.org>
parents: 35493
diff changeset
465 to translate the character @var{from} into @var{to}.
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
466
35493
679a73dad19a make-translation-table addition
Dave Love <fx@gnu.org>
parents: 34811
diff changeset
467 The arguments and the forms in each argument are processed in order,
679a73dad19a make-translation-table addition
Dave Love <fx@gnu.org>
parents: 34811
diff changeset
468 and if a previous form already translates @var{to} to some other
679a73dad19a make-translation-table addition
Dave Love <fx@gnu.org>
parents: 34811
diff changeset
469 character, say @var{to-alt}, @var{from} is also translated to
679a73dad19a make-translation-table addition
Dave Love <fx@gnu.org>
parents: 34811
diff changeset
470 @var{to-alt}.
679a73dad19a make-translation-table addition
Dave Love <fx@gnu.org>
parents: 34811
diff changeset
471
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
472 You can also map one whole character set into another character set with
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
473 the same dimension. To do this, you specify a generic character (which
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
474 designates a character set) for @var{from} (@pxref{Splitting Characters}).
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
475 In this case, @var{to} should also be a generic character, for another
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
476 character set of the same dimension. Then the translation table
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
477 translates each character of @var{from}'s character set into the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
478 corresponding character of @var{to}'s character set.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
479 @end defun
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
480
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
481 In decoding, the translation table's translations are applied to the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
482 characters that result from ordinary decoding. If a coding system has
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
483 property @code{character-translation-table-for-decode}, that specifies
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
484 the translation table to use. Otherwise, if
23433
a53274056f20 Fix names of standard-translation-table-for-decode(encode).
Richard M. Stallman <rms@gnu.org>
parents: 23110
diff changeset
485 @code{standard-translation-table-for-decode} is non-@code{nil}, decoding
a53274056f20 Fix names of standard-translation-table-for-decode(encode).
Richard M. Stallman <rms@gnu.org>
parents: 23110
diff changeset
486 uses that table.
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
487
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
488 In encoding, the translation table's translations are applied to the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
489 characters in the buffer, and the result of translation is actually
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
490 encoded. If a coding system has property
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
491 @code{character-translation-table-for-encode}, that specifies the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
492 translation table to use. Otherwise the variable
23433
a53274056f20 Fix names of standard-translation-table-for-decode(encode).
Richard M. Stallman <rms@gnu.org>
parents: 23110
diff changeset
493 @code{standard-translation-table-for-encode} specifies the translation
a53274056f20 Fix names of standard-translation-table-for-decode(encode).
Richard M. Stallman <rms@gnu.org>
parents: 23110
diff changeset
494 table.
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
495
23433
a53274056f20 Fix names of standard-translation-table-for-decode(encode).
Richard M. Stallman <rms@gnu.org>
parents: 23110
diff changeset
496 @defvar standard-translation-table-for-decode
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
497 This is the default translation table for decoding, for
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
498 coding systems that don't specify any other translation table.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
499 @end defvar
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
500
23433
a53274056f20 Fix names of standard-translation-table-for-decode(encode).
Richard M. Stallman <rms@gnu.org>
parents: 23110
diff changeset
501 @defvar standard-translation-table-for-encode
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
502 This is the default translation table for encoding, for
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
503 coding systems that don't specify any other translation table.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
504 @end defvar
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
505
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
506 @node Coding Systems
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
507 @section Coding Systems
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
508
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
509 @cindex coding system
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
510 When Emacs reads or writes a file, and when Emacs sends text to a
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
511 subprocess or receives text from a subprocess, it normally performs
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
512 character code conversion and end-of-line conversion as specified
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
513 by a particular @dfn{coding system}.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
514
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
515 How to define a coding system is an arcane matter, and is not
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
516 documented here.
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
517
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
518 @menu
28635
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
519 * Coding System Basics:: Basic concepts.
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
520 * Encoding and I/O:: How file I/O functions handle coding systems.
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
521 * Lisp and Coding Systems:: Functions to operate on coding system names.
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
522 * User-Chosen Coding Systems:: Asking the user to choose a coding system.
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
523 * Default Coding Systems:: Controlling the default choices.
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
524 * Specifying Coding Systems:: Requesting a particular coding system
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
525 for a single file operation.
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
526 * Explicit Encoding:: Encoding or decoding text without doing I/O.
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
527 * Terminal I/O Encoding:: Use of encoding for terminal I/O.
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
528 * MS-DOS File Types:: How DOS "text" and "binary" files
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
529 relate to coding systems.
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
530 @end menu
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
531
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
532 @node Coding System Basics
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
533 @subsection Basic Concepts of Coding Systems
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
534
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
535 @cindex character code conversion
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
536 @dfn{Character code conversion} involves conversion between the encoding
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
537 used inside Emacs and some other encoding. Emacs supports many
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
538 different encodings, in that it can convert to and from them. For
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
539 example, it can convert text to or from encodings such as Latin 1, Latin
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
540 2, Latin 3, Latin 4, Latin 5, and several variants of ISO 2022. In some
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
541 cases, Emacs supports several alternative encodings for the same
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
542 characters; for example, there are three coding systems for the Cyrillic
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
543 (Russian) alphabet: ISO, Alternativnyj, and KOI8.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
544
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
545 Most coding systems specify a particular character code for
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
546 conversion, but some of them leave the choice unspecified---to be chosen
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
547 heuristically for each file, based on the data.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
548
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
549 @cindex end of line conversion
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
550 @dfn{End of line conversion} handles three different conventions used
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
551 on various systems for representing end of line in files. The Unix
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
552 convention is to use the linefeed character (also called newline). The
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
553 DOS convention is to use a carriage-return and a linefeed at the end of
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
554 a line. The Mac convention is to use just carriage-return.
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
555
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
556 @cindex base coding system
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
557 @cindex variant coding system
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
558 @dfn{Base coding systems} such as @code{latin-1} leave the end-of-line
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
559 conversion unspecified, to be chosen based on the data. @dfn{Variant
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
560 coding systems} such as @code{latin-1-unix}, @code{latin-1-dos} and
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
561 @code{latin-1-mac} specify the end-of-line conversion explicitly as
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
562 well. Most base coding systems have three corresponding variants whose
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
563 names are formed by adding @samp{-unix}, @samp{-dos} and @samp{-mac}.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
564
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
565 The coding system @code{raw-text} is special in that it prevents
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
566 character code conversion, and causes the buffer visited with that
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
567 coding system to be a unibyte buffer. It does not specify the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
568 end-of-line conversion, allowing that to be determined as usual by the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
569 data, and has the usual three variants which specify the end-of-line
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
570 conversion. @code{no-conversion} is equivalent to @code{raw-text-unix}:
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
571 it specifies no conversion of either character codes or end-of-line.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
572
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
573 The coding system @code{emacs-mule} specifies that the data is
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
574 represented in the internal Emacs encoding. This is like
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
575 @code{raw-text} in that no code conversion happens, but different in
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
576 that the result is multibyte data.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
577
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
578 @defun coding-system-get coding-system property
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
579 This function returns the specified property of the coding system
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
580 @var{coding-system}. Most coding system properties exist for internal
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
581 purposes, but one that you might find useful is @code{mime-charset}.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
582 That property's value is the name used in MIME for the character coding
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
583 which this coding system can read and write. Examples:
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
584
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
585 @example
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
586 (coding-system-get 'iso-latin-1 'mime-charset)
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
587 @result{} iso-8859-1
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
588 (coding-system-get 'iso-2022-cn 'mime-charset)
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
589 @result{} iso-2022-cn
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
590 (coding-system-get 'cyrillic-koi8 'mime-charset)
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
591 @result{} koi8-r
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
592 @end example
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
593
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
594 The value of the @code{mime-charset} property is also defined
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
595 as an alias for the coding system.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
596 @end defun
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
597
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
598 @node Encoding and I/O
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
599 @subsection Encoding and I/O
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
600
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
601 The principal purpose of coding systems is for use in reading and
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
602 writing files. The function @code{insert-file-contents} uses
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
603 a coding system for decoding the file data, and @code{write-region}
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
604 uses one to encode the buffer contents.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
605
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
606 You can specify the coding system to use either explicitly
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
607 (@pxref{Specifying Coding Systems}), or implicitly using the defaulting
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
608 mechanism (@pxref{Default Coding Systems}). But these methods may not
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
609 completely specify what to do. For example, they may choose a coding
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
610 system such as @code{undefined} which leaves the character code
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
611 conversion to be determined from the data. In these cases, the I/O
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
612 operation finishes the job of choosing a coding system. Very often
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
613 you will want to find out afterwards which coding system was chosen.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
614
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
615 @defvar buffer-file-coding-system
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
616 This variable records the coding system that was used for visiting the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
617 current buffer. It is used for saving the buffer, and for writing part
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
618 of the buffer with @code{write-region}. When those operations ask the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
619 user to specify a different coding system,
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
620 @code{buffer-file-coding-system} is updated to the coding system
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
621 specified.
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
622
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
623 However, @code{buffer-file-coding-system} does not affect sending text
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
624 to a subprocess.
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
625 @end defvar
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
626
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
627 @defvar save-buffer-coding-system
29265
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
628 This variable specifies the coding system for saving the buffer (by
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
629 overriding @code{buffer-file-coding-system}). Note that it is not used
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
630 for @code{write-region}.
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
631
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
632 When a command to save the buffer starts out to use
29265
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
633 @code{buffer-file-coding-system} (or @code{save-buffer-coding-system}),
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
634 and that coding system cannot handle
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
635 the actual text in the buffer, the command asks the user to choose
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
636 another coding system. After that happens, the command also updates
29265
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
637 @code{buffer-file-coding-system} to represent the coding system that the
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
638 user specified.
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
639 @end defvar
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
640
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
641 @defvar last-coding-system-used
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
642 I/O operations for files and subprocesses set this variable to the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
643 coding system name that was used. The explicit encoding and decoding
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
644 functions (@pxref{Explicit Encoding}) set it too.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
645
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
646 @strong{Warning:} Since receiving subprocess output sets this variable,
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
647 it can change whenever Emacs waits; therefore, you should copy the
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
648 value shortly after the function call that stores the value you are
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
649 interested in.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
650 @end defvar
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
651
23110
0d84817a4973 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22267
diff changeset
652 The variable @code{selection-coding-system} specifies how to encode
0d84817a4973 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22267
diff changeset
653 selections for the window system. @xref{Window System Selections}.
0d84817a4973 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22267
diff changeset
654
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
655 @node Lisp and Coding Systems
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
656 @subsection Coding Systems in Lisp
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
657
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
658 Here are the Lisp facilities for working with coding systems:
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
659
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
660 @defun coding-system-list &optional base-only
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
661 This function returns a list of all coding system names (symbols). If
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
662 @var{base-only} is non-@code{nil}, the value includes only the
29265
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
663 base coding systems. Otherwise, it includes alias and variant coding
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
664 systems as well.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
665 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
666
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
667 @defun coding-system-p object
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
668 This function returns @code{t} if @var{object} is a coding system
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
669 name.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
670 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
671
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
672 @defun check-coding-system coding-system
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
673 This function checks the validity of @var{coding-system}.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
674 If that is valid, it returns @var{coding-system}.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
675 Otherwise it signals an error with condition @code{coding-system-error}.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
676 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
677
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
678 @defun coding-system-change-eol-conversion coding-system eol-type
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
679 This function returns a coding system which is like @var{coding-system}
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
680 except for its eol conversion, which is specified by @code{eol-type}.
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
681 @var{eol-type} should be @code{unix}, @code{dos}, @code{mac}, or
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
682 @code{nil}. If it is @code{nil}, the returned coding system determines
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
683 the end-of-line conversion from the data.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
684 @end defun
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
685
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
686 @defun coding-system-change-text-conversion eol-coding text-coding
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
687 This function returns a coding system which uses the end-of-line
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
688 conversion of @var{eol-coding}, and the text conversion of
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
689 @var{text-coding}. If @var{text-coding} is @code{nil}, it returns
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
690 @code{undecided}, or one of its variants according to @var{eol-coding}.
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
691 @end defun
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
692
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
693 @defun find-coding-systems-region from to
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
694 This function returns a list of coding systems that could be used to
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
695 encode a text between @var{from} and @var{to}. All coding systems in
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
696 the list can safely encode any multibyte characters in that portion of
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
697 the text.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
698
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
699 If the text contains no multibyte characters, the function returns the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
700 list @code{(undecided)}.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
701 @end defun
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
702
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
703 @defun find-coding-systems-string string
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
704 This function returns a list of coding systems that could be used to
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
705 encode the text of @var{string}. All coding systems in the list can
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
706 safely encode any multibyte characters in @var{string}. If the text
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
707 contains no multibyte characters, this returns the list
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
708 @code{(undecided)}.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
709 @end defun
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
710
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
711 @defun find-coding-systems-for-charsets charsets
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
712 This function returns a list of coding systems that could be used to
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
713 encode all the character sets in the list @var{charsets}.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
714 @end defun
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
715
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
716 @defun detect-coding-region start end &optional highest
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
717 This function chooses a plausible coding system for decoding the text
28877
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
718 from @var{start} to @var{end}. This text should be a byte sequence
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
719 (@pxref{Explicit Encoding}).
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
720
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
721 Normally this function returns a list of coding systems that could
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
722 handle decoding the text that was scanned. They are listed in order of
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
723 decreasing priority. But if @var{highest} is non-@code{nil}, then the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
724 return value is just one coding system, the one that is highest in
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
725 priority.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
726
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
727 If the region contains only @sc{ascii} characters, the value
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
728 is @code{undecided} or @code{(undecided)}.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
729 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
730
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
731 @defun detect-coding-string string highest
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
732 This function is like @code{detect-coding-region} except that it
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
733 operates on the contents of @var{string} instead of bytes in the buffer.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
734 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
735
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
736 @xref{Process Information}, for how to examine or set the coding
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
737 systems used for I/O to a subprocess.
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
738
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
739 @node User-Chosen Coding Systems
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
740 @subsection User-Chosen Coding Systems
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
741
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
742 @defun select-safe-coding-system from to &optional preferred-coding-system
22267
dfac7398266b *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22252
diff changeset
743 This function selects a coding system for encoding the text between
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
744 @var{from} and @var{to}, asking the user to choose if necessary.
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
745
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
746 The optional argument @var{preferred-coding-system} specifies a coding
22267
dfac7398266b *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22252
diff changeset
747 system to try first. If that one can handle the text in the specified
dfac7398266b *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22252
diff changeset
748 region, then it is used. If this argument is omitted, the current
dfac7398266b *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22252
diff changeset
749 buffer's value of @code{buffer-file-coding-system} is tried first.
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
750
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
751 If the region contains some multibyte characters that the preferred
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
752 coding system cannot encode, this function asks the user to choose from
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
753 a list of coding systems which can encode the text, and returns the
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
754 user's choice.
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
755
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
756 One other kludgy feature: if @var{from} is a string, the string is the
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
757 target text, and @var{to} is ignored.
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
758 @end defun
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
759
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
760 Here are two functions you can use to let the user specify a coding
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
761 system, with completion. @xref{Completion}.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
762
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
763 @defun read-coding-system prompt &optional default
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
764 This function reads a coding system using the minibuffer, prompting with
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
765 string @var{prompt}, and returns the coding system name as a symbol. If
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
766 the user enters null input, @var{default} specifies which coding system
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
767 to return. It should be a symbol or a string.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
768 @end defun
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
769
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
770 @defun read-non-nil-coding-system prompt
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
771 This function reads a coding system using the minibuffer, prompting with
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
772 string @var{prompt}, and returns the coding system name as a symbol. If
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
773 the user tries to enter null input, it asks the user to try again.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
774 @xref{Coding Systems}.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
775 @end defun
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
776
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
777 @node Default Coding Systems
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
778 @subsection Default Coding Systems
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
779
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
780 This section describes variables that specify the default coding
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
781 system for certain files or when running certain subprograms, and the
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
782 function that I/O operations use to access them.
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
783
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
784 The idea of these variables is that you set them once and for all to the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
785 defaults you want, and then do not change them again. To specify a
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
786 particular coding system for a particular operation in a Lisp program,
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
787 don't change these variables; instead, override them using
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
788 @code{coding-system-for-read} and @code{coding-system-for-write}
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
789 (@pxref{Specifying Coding Systems}).
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
790
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
791 @defvar file-coding-system-alist
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
792 This variable is an alist that specifies the coding systems to use for
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
793 reading and writing particular files. Each element has the form
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
794 @code{(@var{pattern} . @var{coding})}, where @var{pattern} is a regular
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
795 expression that matches certain file names. The element applies to file
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
796 names that match @var{pattern}.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
797
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
798 The @sc{cdr} of the element, @var{coding}, should be either a coding
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
799 system, a cons cell containing two coding systems, or a function name (a
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
800 symbol with a function definition). If @var{coding} is a coding system,
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
801 that coding system is used for both reading the file and writing it. If
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
802 @var{coding} is a cons cell containing two coding systems, its @sc{car}
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
803 specifies the coding system for decoding, and its @sc{cdr} specifies the
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
804 coding system for encoding.
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
805
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
806 If @var{coding} is a function name, the function must return a coding
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
807 system or a cons cell containing two coding systems. This value is used
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
808 as described above.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
809 @end defvar
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
810
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
811 @defvar process-coding-system-alist
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
812 This variable is an alist specifying which coding systems to use for a
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
813 subprocess, depending on which program is running in the subprocess. It
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
814 works like @code{file-coding-system-alist}, except that @var{pattern} is
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
815 matched against the program name used to start the subprocess. The coding
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
816 system or systems specified in this alist are used to initialize the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
817 coding systems used for I/O to the subprocess, but you can specify
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
818 other coding systems later using @code{set-process-coding-system}.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
819 @end defvar
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
820
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
821 @strong{Warning:} Coding systems such as @code{undecided}, which
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
822 determine the coding system from the data, do not work entirely reliably
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
823 with asynchronous subprocess output. This is because Emacs handles
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
824 asynchronous subprocess output in batches, as it arrives. If the coding
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
825 system leaves the character code conversion unspecified, or leaves the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
826 end-of-line conversion unspecified, Emacs must try to detect the proper
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
827 conversion from one batch at a time, and this does not always work.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
828
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
829 Therefore, with an asynchronous subprocess, if at all possible, use a
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
830 coding system which determines both the character code conversion and
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
831 the end of line conversion---that is, one like @code{latin-1-unix},
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
832 rather than @code{undecided} or @code{latin-1}.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
833
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
834 @defvar network-coding-system-alist
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
835 This variable is an alist that specifies the coding system to use for
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
836 network streams. It works much like @code{file-coding-system-alist},
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
837 with the difference that the @var{pattern} in an element may be either a
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
838 port number or a regular expression. If it is a regular expression, it
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
839 is matched against the network service name used to open the network
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
840 stream.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
841 @end defvar
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
842
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
843 @defvar default-process-coding-system
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
844 This variable specifies the coding systems to use for subprocess (and
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
845 network stream) input and output, when nothing else specifies what to
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
846 do.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
847
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
848 The value should be a cons cell of the form @code{(@var{input-coding}
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
849 . @var{output-coding})}. Here @var{input-coding} applies to input from
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
850 the subprocess, and @var{output-coding} applies to output to it.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
851 @end defvar
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
852
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
853 @defun find-operation-coding-system operation &rest arguments
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
854 This function returns the coding system to use (by default) for
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
855 performing @var{operation} with @var{arguments}. The value has this
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
856 form:
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
857
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
858 @example
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
859 (@var{decoding-system} @var{encoding-system})
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
860 @end example
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
861
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
862 The first element, @var{decoding-system}, is the coding system to use
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
863 for decoding (in case @var{operation} does decoding), and
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
864 @var{encoding-system} is the coding system for encoding (in case
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
865 @var{operation} does encoding).
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
866
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
867 The argument @var{operation} should be a symbol, one of
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
868 @code{insert-file-contents}, @code{write-region}, @code{call-process},
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
869 @code{call-process-region}, @code{start-process}, or
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
870 @code{open-network-stream}. These are the names of the Emacs I/O primitives
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
871 that can do coding system conversion.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
872
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
873 The remaining arguments should be the same arguments that might be given
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
874 to that I/O primitive. Depending on the primitive, one of those
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
875 arguments is selected as the @dfn{target}. For example, if
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
876 @var{operation} does file I/O, whichever argument specifies the file
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
877 name is the target. For subprocess primitives, the process name is the
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
878 target. For @code{open-network-stream}, the target is the service name
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
879 or port number.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
880
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
881 This function looks up the target in @code{file-coding-system-alist},
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
882 @code{process-coding-system-alist}, or
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
883 @code{network-coding-system-alist}, depending on @var{operation}.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
884 @xref{Default Coding Systems}.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
885 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
886
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
887 @node Specifying Coding Systems
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
888 @subsection Specifying a Coding System for One Operation
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
889
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
890 You can specify the coding system for a specific operation by binding
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
891 the variables @code{coding-system-for-read} and/or
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
892 @code{coding-system-for-write}.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
893
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
894 @defvar coding-system-for-read
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
895 If this variable is non-@code{nil}, it specifies the coding system to
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
896 use for reading a file, or for input from a synchronous subprocess.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
897
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
898 It also applies to any asynchronous subprocess or network stream, but in
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
899 a different way: the value of @code{coding-system-for-read} when you
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
900 start the subprocess or open the network stream specifies the input
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
901 decoding method for that subprocess or network stream. It remains in
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
902 use for that subprocess or network stream unless and until overridden.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
903
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
904 The right way to use this variable is to bind it with @code{let} for a
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
905 specific I/O operation. Its global value is normally @code{nil}, and
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
906 you should not globally set it to any other value. Here is an example
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
907 of the right way to use the variable:
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
908
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
909 @example
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
910 ;; @r{Read the file with no character code conversion.}
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
911 ;; @r{Assume @sc{crlf} represents end-of-line.}
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
912 (let ((coding-system-for-write 'emacs-mule-dos))
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
913 (insert-file-contents filename))
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
914 @end example
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
915
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
916 When its value is non-@code{nil}, @code{coding-system-for-read} takes
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
917 precedence over all other methods of specifying a coding system to use for
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
918 input, including @code{file-coding-system-alist},
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
919 @code{process-coding-system-alist} and
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
920 @code{network-coding-system-alist}.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
921 @end defvar
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
922
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
923 @defvar coding-system-for-write
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
924 This works much like @code{coding-system-for-read}, except that it
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
925 applies to output rather than input. It affects writing to files,
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
926 as well as sending output to subprocesses and net connections.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
927
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
928 When a single operation does both input and output, as do
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
929 @code{call-process-region} and @code{start-process}, both
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
930 @code{coding-system-for-read} and @code{coding-system-for-write}
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
931 affect it.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
932 @end defvar
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
933
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
934 @defvar inhibit-eol-conversion
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
935 When this variable is non-@code{nil}, no end-of-line conversion is done,
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
936 no matter which coding system is specified. This applies to all the
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
937 Emacs I/O and subprocess primitives, and to the explicit encoding and
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
938 decoding functions (@pxref{Explicit Encoding}).
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
939 @end defvar
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
940
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
941 @node Explicit Encoding
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
942 @subsection Explicit Encoding and Decoding
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
943 @cindex encoding text
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
944 @cindex decoding text
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
945
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
946 All the operations that transfer text in and out of Emacs have the
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
947 ability to use a coding system to encode or decode the text.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
948 You can also explicitly encode and decode text using the functions
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
949 in this section.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
950
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
951 The result of encoding, and the input to decoding, are not ordinary
28877
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
952 text. They logically consist of a series of byte values; that is, a
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
953 series of characters whose codes are in the range 0 through 255. In a
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
954 multibyte buffer or string, character codes 128 through 159 are
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
955 represented by multibyte sequences, but this is invisible to Lisp
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
956 programs.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
957
28877
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
958 The usual way to read a file into a buffer as a sequence of bytes, so
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
959 you can decode the contents explicitly, is with
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
960 @code{insert-file-contents-literally} (@pxref{Reading from Files});
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
961 alternatively, specify a non-@code{nil} @var{rawfile} argument when
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
962 visiting a file with @code{find-file-noselect}. These methods result in
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
963 a unibyte buffer.
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
964
28877
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
965 The usual way to use the byte sequence that results from explicitly
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
966 encoding text is to copy it to a file or process---for example, to write
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
967 it with @code{write-region} (@pxref{Writing to Files}), and suppress
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
968 encoding by binding @code{coding-system-for-write} to
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
969 @code{no-conversion}.
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
970
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
971 Here are the functions to perform explicit encoding or decoding. The
28877
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
972 decoding functions produce sequences of bytes; the encoding functions
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
973 are meant to operate on sequences of bytes. All of these functions
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
974 discard text properties.
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
975
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
976 @defun encode-coding-region start end coding-system
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
977 This function encodes the text from @var{start} to @var{end} according
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
978 to coding system @var{coding-system}. The encoded text replaces the
28877
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
979 original text in the buffer. The result of encoding is logically a
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
980 sequence of bytes, but the buffer remains multibyte if it was multibyte
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
981 before.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
982 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
983
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
984 @defun encode-coding-string string coding-system
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
985 This function encodes the text in @var{string} according to coding
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
986 system @var{coding-system}. It returns a new string containing the
28877
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
987 encoded text. The result of encoding is a unibyte string.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
988 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
989
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
990 @defun decode-coding-region start end coding-system
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
991 This function decodes the text from @var{start} to @var{end} according
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
992 to coding system @var{coding-system}. The decoded text replaces the
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
993 original text in the buffer. To make explicit decoding useful, the text
28877
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
994 before decoding ought to be a sequence of byte values, but both
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
995 multibyte and unibyte buffers are acceptable.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
996 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
997
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
998 @defun decode-coding-string string coding-system
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
999 This function decodes the text in @var{string} according to coding
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1000 system @var{coding-system}. It returns a new string containing the
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1001 decoded text. To make explicit decoding useful, the contents of
28877
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
1002 @var{string} ought to be a sequence of byte values, but a multibyte
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
1003 string is acceptable.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1004 @end defun
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1005
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1006 @node Terminal I/O Encoding
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1007 @subsection Terminal I/O Encoding
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1008
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1009 Emacs can decode keyboard input using a coding system, and encode
23110
0d84817a4973 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22267
diff changeset
1010 terminal output. This is useful for terminals that transmit or display
0d84817a4973 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22267
diff changeset
1011 text using a particular encoding such as Latin-1. Emacs does not set
0d84817a4973 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22267
diff changeset
1012 @code{last-coding-system-used} for encoding or decoding for the
0d84817a4973 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22267
diff changeset
1013 terminal.
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1014
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1015 @defun keyboard-coding-system
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1016 This function returns the coding system that is in use for decoding
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1017 keyboard input---or @code{nil} if no coding system is to be used.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1018 @end defun
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1019
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1020 @defun set-keyboard-coding-system coding-system
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1021 This function specifies @var{coding-system} as the coding system to
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1022 use for decoding keyboard input. If @var{coding-system} is @code{nil},
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1023 that means do not decode keyboard input.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1024 @end defun
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1025
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1026 @defun terminal-coding-system
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1027 This function returns the coding system that is in use for encoding
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1028 terminal output---or @code{nil} for no encoding.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1029 @end defun
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1030
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1031 @defun set-terminal-coding-system coding-system
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1032 This function specifies @var{coding-system} as the coding system to use
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1033 for encoding terminal output. If @var{coding-system} is @code{nil},
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1034 that means do not encode terminal output.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1035 @end defun
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1036
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1037 @node MS-DOS File Types
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1038 @subsection MS-DOS File Types
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1039 @cindex DOS file types
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1040 @cindex MS-DOS file types
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1041 @cindex Windows file types
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1042 @cindex file types on MS-DOS and Windows
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1043 @cindex text files and binary files
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1044 @cindex binary files and text files
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1045
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
1046 On MS-DOS and Microsoft Windows, Emacs guesses the appropriate
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
1047 end-of-line conversion for a file by looking at the file's name. This
28877
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
1048 feature classifies files as @dfn{text files} and @dfn{binary files}. By
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
1049 ``binary file'' we mean a file of literal byte values that are not
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
1050 necessarily meant to be characters; Emacs does no end-of-line conversion
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
1051 and no character code conversion for them. On the other hand, the bytes
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
1052 in a text file are intended to represent characters; when you create a
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
1053 new file whose name implies that it is a text file, Emacs uses DOS
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
1054 end-of-line conversion.
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1055
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1056 @defvar buffer-file-type
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1057 This variable, automatically buffer-local in each buffer, records the
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1058 file type of the buffer's visited file. When a buffer does not specify
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1059 a coding system with @code{buffer-file-coding-system}, this variable is
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1060 used to determine which coding system to use when writing the contents
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1061 of the buffer. It should be @code{nil} for text, @code{t} for binary.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1062 If it is @code{t}, the coding system is @code{no-conversion}.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1063 Otherwise, @code{undecided-dos} is used.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1064
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1065 Normally this variable is set by visiting a file; it is set to
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1066 @code{nil} if the file was visited without any actual conversion.
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1067 @end defvar
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1068
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1069 @defopt file-name-buffer-file-type-alist
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1070 This variable holds an alist for recognizing text and binary files.
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1071 Each element has the form (@var{regexp} . @var{type}), where
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1072 @var{regexp} is matched against the file name, and @var{type} may be
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1073 @code{nil} for text, @code{t} for binary, or a function to call to
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1074 compute which. If it is a function, then it is called with a single
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1075 argument (the file name) and should return @code{t} or @code{nil}.
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1076
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
1077 When running on MS-DOS or MS-Windows, Emacs checks this alist to decide
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1078 which coding system to use when reading a file. For a text file,
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1079 @code{undecided-dos} is used. For a binary file, @code{no-conversion}
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1080 is used.
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1081
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1082 If no element in this alist matches a given file name, then
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1083 @code{default-buffer-file-type} says how to treat the file.
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1084 @end defopt
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1085
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1086 @defopt default-buffer-file-type
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1087 This variable says how to handle files for which
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1088 @code{file-name-buffer-file-type-alist} says nothing about the type.
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1089
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1090 If this variable is non-@code{nil}, then these files are treated as
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1091 binary: the coding system @code{no-conversion} is used. Otherwise,
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1092 nothing special is done for them---the coding system is deduced solely
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1093 from the file contents, in the usual Emacs fashion.
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1094 @end defopt
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1095
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1096 @node Input Methods
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1097 @section Input Methods
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1098 @cindex input methods
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1099
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
1100 @dfn{Input methods} provide convenient ways of entering non-@sc{ascii}
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1101 characters from the keyboard. Unlike coding systems, which translate
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
1102 non-@sc{ascii} characters to and from encodings meant to be read by
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1103 programs, input methods provide human-friendly commands. (@xref{Input
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1104 Methods,,, emacs, The GNU Emacs Manual}, for information on how users
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1105 use input methods to enter text.) How to define input methods is not
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1106 yet documented in this manual, but here we describe how to use them.
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1107
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1108 Each input method has a name, which is currently a string;
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1109 in the future, symbols may also be usable as input method names.
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1110
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1111 @defvar current-input-method
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1112 This variable holds the name of the input method now active in the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1113 current buffer. (It automatically becomes local in each buffer when set
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1114 in any fashion.) It is @code{nil} if no input method is active in the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1115 buffer now.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1116 @end defvar
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1117
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1118 @defvar default-input-method
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1119 This variable holds the default input method for commands that choose an
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1120 input method. Unlike @code{current-input-method}, this variable is
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1121 normally global.
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1122 @end defvar
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1123
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1124 @defun set-input-method input-method
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1125 This function activates input method @var{input-method} for the current
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1126 buffer. It also sets @code{default-input-method} to @var{input-method}.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1127 If @var{input-method} is @code{nil}, this function deactivates any input
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1128 method for the current buffer.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1129 @end defun
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1130
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1131 @defun read-input-method-name prompt &optional default inhibit-null
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1132 This function reads an input method name with the minibuffer, prompting
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1133 with @var{prompt}. If @var{default} is non-@code{nil}, that is returned
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1134 by default, if the user enters empty input. However, if
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1135 @var{inhibit-null} is non-@code{nil}, empty input signals an error.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1136
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1137 The returned value is a string.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1138 @end defun
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1139
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1140 @defvar input-method-alist
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1141 This variable defines all the supported input methods.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1142 Each element defines one input method, and should have the form:
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1143
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1144 @example
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
1145 (@var{input-method} @var{language-env} @var{activate-func}
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
1146 @var{title} @var{description} @var{args}...)
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1147 @end example
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1148
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
1149 Here @var{input-method} is the input method name, a string;
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
1150 @var{language-env} is another string, the name of the language
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
1151 environment this input method is recommended for. (That serves only for
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
1152 documentation purposes.)
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1153
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1154 @var{activate-func} is a function to call to activate this method. The
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1155 @var{args}, if any, are passed as arguments to @var{activate-func}. All
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1156 told, the arguments to @var{activate-func} are @var{input-method} and
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1157 the @var{args}.
28877
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
1158
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
1159 @var{title} is a string to display in the mode line while this method is
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
1160 active. @var{description} is a string describing this method and what
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
1161 it is good for.
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
1162 @end defvar
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1163
23110
0d84817a4973 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22267
diff changeset
1164 The fundamental interface to input methods is through the
0d84817a4973 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22267
diff changeset
1165 variable @code{input-method-function}. @xref{Reading One Event}.
26696
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1166
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1167 @node Locales
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1168 @section Locales
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1169 @cindex locale
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1170
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1171 POSIX defines a concept of ``locales'' which control which language
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1172 to use in language-related features. These Emacs variables control
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1173 how Emacs interacts with these features.
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1174
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1175 @defvar locale-coding-system
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1176 @tindex locale-coding-system
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1177 This variable specifies the coding system to use for decoding system
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1178 error messages, for encoding the format argument to
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1179 @code{format-time-string}, and for decoding the return value of
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1180 @code{format-time-string}.
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1181 @end defvar
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1182
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1183 @defvar system-messages-locale
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1184 @tindex system-messages-locale
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1185 This variable specifies the locale to use for generating system error
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1186 messages. Changing the locale can cause messages to come out in a
27362
ce0641caaa76 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27189
diff changeset
1187 different language or in a different orthography. If the variable is
26696
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1188 @code{nil}, the locale is specified by environment variables in the
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1189 usual POSIX fashion.
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1190 @end defvar
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1191
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1192 @defvar system-time-locale
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1193 @tindex system-time-locale
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1194 This variable specifies the locale to use for formatting time values.
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1195 Changing the locale can cause messages to appear according to the
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1196 conventions of a different language. If the variable is @code{nil}, the
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1197 locale is specified by environment variables in the usual POSIX fashion.
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1198 @end defvar
28877
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
1199