annotate lispref/nonascii.texi @ 62149:e64f1e2ecec2

(easy-mmode-pretty-mode-name): Explain more about the LIGHTER arg's usage in the doc string. Add commentary to clarify what the code does. Fix the regexp that strips whitespace from LIGHTER. Quote LIGHTER before using it, since it could have characters special to regular expressions.
author Eli Zaretskii <eliz@gnu.org>
date Sat, 07 May 2005 15:05:00 +0000
parents 5c7a0c8de2df
children 99e9892a51d9 02f1dbc4a199
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1 @c -*-texinfo-*-
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
2 @c This is part of the GNU Emacs Lisp Reference Manual.
49600
23a1cea22d13 Trailing whitespace deleted.
Juanma Barranquero <lekktu@gmail.com>
parents: 45652
diff changeset
3 @c Copyright (C) 1998, 1999 Free Software Foundation, Inc.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
4 @c See the file elisp.texi for copying conditions.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
5 @setfilename ../info/characters
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
6 @node Non-ASCII Characters, Searching and Matching, Text, Top
52978
1a5c50faf357 Replace @sc{foo} with @acronym{FOO}.
Eli Zaretskii <eliz@gnu.org>
parents: 52788
diff changeset
7 @chapter Non-@acronym{ASCII} Characters
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
8 @cindex multibyte characters
52978
1a5c50faf357 Replace @sc{foo} with @acronym{FOO}.
Eli Zaretskii <eliz@gnu.org>
parents: 52788
diff changeset
9 @cindex non-@acronym{ASCII} characters
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
10
52978
1a5c50faf357 Replace @sc{foo} with @acronym{FOO}.
Eli Zaretskii <eliz@gnu.org>
parents: 52788
diff changeset
11 This chapter covers the special issues relating to non-@acronym{ASCII}
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
12 characters and how they are stored in strings and buffers.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
13
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
14 @menu
28635
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
15 * Text Representations:: Unibyte and multibyte representations
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
16 * Converting Representations:: Converting unibyte to multibyte and vice versa.
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
17 * Selecting a Representation:: Treating a byte sequence as unibyte or multi.
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
18 * Character Codes:: How unibyte and multibyte relate to
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
19 codes of individual characters.
54036
9706b0221102 (Translation of Characters): Give examples of use.
Richard M. Stallman <rms@gnu.org>
parents: 53431
diff changeset
20 * Character Sets:: The space of possible character codes
28635
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
21 is divided into various character sets.
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
22 * Chars and Bytes:: More information about multibyte encodings.
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
23 * Splitting Characters:: Converting a character to its byte sequence.
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
24 * Scanning Charsets:: Which character sets are used in a buffer?
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
25 * Translation of Characters:: Translation tables are used for conversion.
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
26 * Coding Systems:: Coding systems are conversions for saving files.
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
27 * Input Methods:: Input methods allow users to enter various
40834
9552d64e0367 Fix typo.
Richard M. Stallman <rms@gnu.org>
parents: 39221
diff changeset
28 non-ASCII characters without special keyboards.
28635
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
29 * Locales:: Interacting with the POSIX locale.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
30 @end menu
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
31
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
32 @node Text Representations
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
33 @section Text Representations
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
34 @cindex text representations
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
35
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
36 Emacs has two @dfn{text representations}---two ways to represent text
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
37 in a string or buffer. These are called @dfn{unibyte} and
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
38 @dfn{multibyte}. Each string, and each buffer, uses one of these two
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
39 representations. For most purposes, you can ignore the issue of
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
40 representations, because Emacs converts text between them as
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
41 appropriate. Occasionally in Lisp programming you will need to pay
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
42 attention to the difference.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
43
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
44 @cindex unibyte text
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
45 In unibyte representation, each character occupies one byte and
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
46 therefore the possible character codes range from 0 to 255. Codes 0
52978
1a5c50faf357 Replace @sc{foo} with @acronym{FOO}.
Eli Zaretskii <eliz@gnu.org>
parents: 52788
diff changeset
47 through 127 are @acronym{ASCII} characters; the codes from 128 through 255
1a5c50faf357 Replace @sc{foo} with @acronym{FOO}.
Eli Zaretskii <eliz@gnu.org>
parents: 52788
diff changeset
48 are used for one non-@acronym{ASCII} character set (you can choose which
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
49 character set by setting the variable @code{nonascii-insert-offset}).
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
50
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
51 @cindex leading code
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
52 @cindex multibyte text
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
53 @cindex trailing codes
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
54 In multibyte representation, a character may occupy more than one
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
55 byte, and as a result, the full range of Emacs character codes can be
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
56 stored. The first byte of a multibyte character is always in the range
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
57 128 through 159 (octal 0200 through 0237). These values are called
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
58 @dfn{leading codes}. The second and subsequent bytes of a multibyte
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
59 character are always in the range 160 through 255 (octal 0240 through
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
60 0377); these values are @dfn{trailing codes}.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
61
28877
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
62 Some sequences of bytes are not valid in multibyte text: for example,
32523
4881cd839f12 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 29339
diff changeset
63 a single isolated byte in the range 128 through 159 is not allowed. But
4881cd839f12 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 29339
diff changeset
64 character codes 128 through 159 can appear in multibyte text,
4881cd839f12 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 29339
diff changeset
65 represented as two-byte sequences. All the character codes 128 through
4881cd839f12 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 29339
diff changeset
66 255 are possible (though slightly abnormal) in multibyte text; they
28877
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
67 appear in multibyte buffers and strings when you do explicit encoding
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
68 and decoding (@pxref{Explicit Encoding}).
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
69
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
70 In a buffer, the buffer-local value of the variable
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
71 @code{enable-multibyte-characters} specifies the representation used.
24952
a6db4671c7a0 *** empty log message ***
Karl Heuer <kwzh@gnu.org>
parents: 24951
diff changeset
72 The representation for a string is determined and recorded in the string
a6db4671c7a0 *** empty log message ***
Karl Heuer <kwzh@gnu.org>
parents: 24951
diff changeset
73 when the string is constructed.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
74
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
75 @defvar enable-multibyte-characters
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
76 This variable specifies the current buffer's text representation.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
77 If it is non-@code{nil}, the buffer contains multibyte text; otherwise,
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
78 it contains unibyte text.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
79
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
80 You cannot set this variable directly; instead, use the function
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
81 @code{set-buffer-multibyte} to change a buffer's representation.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
82 @end defvar
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
83
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
84 @defvar default-enable-multibyte-characters
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
85 This variable's value is entirely equivalent to @code{(default-value
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
86 'enable-multibyte-characters)}, and setting this variable changes that
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
87 default value. Setting the local binding of
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
88 @code{enable-multibyte-characters} in a specific buffer is not allowed,
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
89 but changing the default value is supported, and it is a reasonable
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
90 thing to do, because it has no effect on existing buffers.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
91
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
92 The @samp{--unibyte} command line option does its job by setting the
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
93 default value to @code{nil} early in startup.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
94 @end defvar
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
95
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
96 @defun position-bytes position
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
97 @tindex position-bytes
60501
ac9848689bc2 (Text Representations): Clarify position-bytes.
Richard M. Stallman <rms@gnu.org>
parents: 54036
diff changeset
98 Return the byte-position corresponding to buffer position
ac9848689bc2 (Text Representations): Clarify position-bytes.
Richard M. Stallman <rms@gnu.org>
parents: 54036
diff changeset
99 @var{position} in the current buffer. This is 1 at the start of the
ac9848689bc2 (Text Representations): Clarify position-bytes.
Richard M. Stallman <rms@gnu.org>
parents: 54036
diff changeset
100 buffer, and counts upward in bytes. If @var{position} is out of
ac9848689bc2 (Text Representations): Clarify position-bytes.
Richard M. Stallman <rms@gnu.org>
parents: 54036
diff changeset
101 range, the value is @code{nil}.
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
102 @end defun
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
103
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
104 @defun byte-to-position byte-position
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
105 @tindex byte-to-position
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
106 Return the buffer position corresponding to byte-position
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
107 @var{byte-position} in the current buffer. If @var{byte-position} is
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
108 out of range, the value is @code{nil}.
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
109 @end defun
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
110
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
111 @defun multibyte-string-p string
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
112 Return @code{t} if @var{string} is a multibyte string.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
113 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
114
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
115 @node Converting Representations
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
116 @section Converting Text Representations
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
117
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
118 Emacs can convert unibyte text to multibyte; it can also convert
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
119 multibyte text to unibyte, though this conversion loses information. In
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
120 general these conversions happen when inserting text into a buffer, or
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
121 when putting text from several strings together in one string. You can
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
122 also explicitly convert a string's contents to either representation.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
123
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
124 Emacs chooses the representation for a string based on the text that
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
125 it is constructed from. The general rule is to convert unibyte text to
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
126 multibyte text when combining it with other multibyte text, because the
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
127 multibyte representation is more general and can hold whatever
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
128 characters the unibyte text has.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
129
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
130 When inserting text into a buffer, Emacs converts the text to the
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
131 buffer's representation, as specified by
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
132 @code{enable-multibyte-characters} in that buffer. In particular, when
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
133 you insert multibyte text into a unibyte buffer, Emacs converts the text
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
134 to unibyte, even though this conversion cannot in general preserve all
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
135 the characters that might be in the multibyte text. The other natural
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
136 alternative, to convert the buffer contents to multibyte, is not
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
137 acceptable because the buffer's representation is a choice made by the
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
138 user that cannot be overridden automatically.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
139
52978
1a5c50faf357 Replace @sc{foo} with @acronym{FOO}.
Eli Zaretskii <eliz@gnu.org>
parents: 52788
diff changeset
140 Converting unibyte text to multibyte text leaves @acronym{ASCII} characters
32523
4881cd839f12 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 29339
diff changeset
141 unchanged, and likewise character codes 128 through 159. It converts
52978
1a5c50faf357 Replace @sc{foo} with @acronym{FOO}.
Eli Zaretskii <eliz@gnu.org>
parents: 52788
diff changeset
142 the non-@acronym{ASCII} codes 160 through 255 by adding the value
32523
4881cd839f12 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 29339
diff changeset
143 @code{nonascii-insert-offset} to each character code. By setting this
4881cd839f12 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 29339
diff changeset
144 variable, you specify which character set the unibyte characters
4881cd839f12 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 29339
diff changeset
145 correspond to (@pxref{Character Sets}). For example, if
4881cd839f12 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 29339
diff changeset
146 @code{nonascii-insert-offset} is 2048, which is @code{(- (make-char
52978
1a5c50faf357 Replace @sc{foo} with @acronym{FOO}.
Eli Zaretskii <eliz@gnu.org>
parents: 52788
diff changeset
147 'latin-iso8859-1) 128)}, then the unibyte non-@acronym{ASCII} characters
32523
4881cd839f12 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 29339
diff changeset
148 correspond to Latin 1. If it is 2688, which is @code{(- (make-char
4881cd839f12 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 29339
diff changeset
149 'greek-iso8859-7) 128)}, then they correspond to Greek letters.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
150
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
151 Converting multibyte text to unibyte is simpler: it discards all but
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
152 the low 8 bits of each character code. If @code{nonascii-insert-offset}
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
153 has a reasonable value, corresponding to the beginning of some character
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
154 set, this conversion is the inverse of the other: converting unibyte
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
155 text to multibyte and back to unibyte reproduces the original unibyte
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
156 text.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
157
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
158 @defvar nonascii-insert-offset
52978
1a5c50faf357 Replace @sc{foo} with @acronym{FOO}.
Eli Zaretskii <eliz@gnu.org>
parents: 52788
diff changeset
159 This variable specifies the amount to add to a non-@acronym{ASCII} character
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
160 when converting unibyte text to multibyte. It also applies when
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
161 @code{self-insert-command} inserts a character in the unibyte
52978
1a5c50faf357 Replace @sc{foo} with @acronym{FOO}.
Eli Zaretskii <eliz@gnu.org>
parents: 52788
diff changeset
162 non-@acronym{ASCII} range, 128 through 255. However, the functions
29265
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
163 @code{insert} and @code{insert-char} do not perform this conversion.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
164
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
165 The right value to use to select character set @var{cs} is @code{(-
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
166 (make-char @var{cs}) 128)}. If the value of
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
167 @code{nonascii-insert-offset} is zero, then conversion actually uses the
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
168 value for the Latin 1 character set, rather than zero.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
169 @end defvar
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
170
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
171 @defvar nonascii-translation-table
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
172 This variable provides a more general alternative to
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
173 @code{nonascii-insert-offset}. You can use it to specify independently
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
174 how to translate each code in the range of 128 through 255 into a
29265
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
175 multibyte character. The value should be a char-table, or @code{nil}.
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
176 If this is non-@code{nil}, it overrides @code{nonascii-insert-offset}.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
177 @end defvar
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
178
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
179 The next three functions either return the argument @var{string}, or a
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
180 newly created string with no text properties.
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
181
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
182 @defun string-make-unibyte string
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
183 This function converts the text of @var{string} to unibyte
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
184 representation, if it isn't already, and returns the result. If
45652
ccaf0199f9dc (Converting Representations): Update the description of what
Eli Zaretskii <eliz@gnu.org>
parents: 43634
diff changeset
185 @var{string} is a unibyte string, it is returned unchanged. Multibyte
ccaf0199f9dc (Converting Representations): Update the description of what
Eli Zaretskii <eliz@gnu.org>
parents: 43634
diff changeset
186 character codes are converted to unibyte according to
ccaf0199f9dc (Converting Representations): Update the description of what
Eli Zaretskii <eliz@gnu.org>
parents: 43634
diff changeset
187 @code{nonascii-translation-table} or, if that is @code{nil}, using
ccaf0199f9dc (Converting Representations): Update the description of what
Eli Zaretskii <eliz@gnu.org>
parents: 43634
diff changeset
188 @code{nonascii-insert-offset}. If the lookup in the translation table
ccaf0199f9dc (Converting Representations): Update the description of what
Eli Zaretskii <eliz@gnu.org>
parents: 43634
diff changeset
189 fails, this function takes just the low 8 bits of each character.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
190 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
191
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
192 @defun string-make-multibyte string
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
193 This function converts the text of @var{string} to multibyte
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
194 representation, if it isn't already, and returns the result. If
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
195 @var{string} is a multibyte string or consists entirely of
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
196 @acronym{ASCII} characters, it is returned unchanged. In particular,
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
197 if @var{string} is unibyte and entirely @acronym{ASCII}, the returned
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
198 string is unibyte. (When the characters are all @acronym{ASCII},
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
199 Emacs primitives will treat the string the same way whether it is
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
200 unibyte or multibyte.) If @var{string} is unibyte and contains
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
201 non-@acronym{ASCII} characters, the function
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
202 @code{unibyte-char-to-multibyte} is used to convert each unibyte
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
203 character to a multibyte character.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
204 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
205
51990
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
206 @defun string-to-multibyte string
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
207 This function returns a multibyte string containing the same sequence
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
208 of character codes as @var{string}. Unlike
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
209 @code{string-make-multibyte}, this function unconditionally returns a
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
210 multibyte string. If @var{string} is a multibyte string, it is
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
211 returned unchanged.
51990
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
212 @end defun
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
213
53431
3addbe38d8a6 (Converting Representations):
Richard M. Stallman <rms@gnu.org>
parents: 53302
diff changeset
214 @defun multibyte-char-to-unibyte char
3addbe38d8a6 (Converting Representations):
Richard M. Stallman <rms@gnu.org>
parents: 53302
diff changeset
215 This convert the multibyte character @var{char} to a unibyte
3addbe38d8a6 (Converting Representations):
Richard M. Stallman <rms@gnu.org>
parents: 53302
diff changeset
216 character, based on @code{nonascii-translation-table} and
3addbe38d8a6 (Converting Representations):
Richard M. Stallman <rms@gnu.org>
parents: 53302
diff changeset
217 @code{nonascii-insert-offset}.
3addbe38d8a6 (Converting Representations):
Richard M. Stallman <rms@gnu.org>
parents: 53302
diff changeset
218 @end defun
3addbe38d8a6 (Converting Representations):
Richard M. Stallman <rms@gnu.org>
parents: 53302
diff changeset
219
3addbe38d8a6 (Converting Representations):
Richard M. Stallman <rms@gnu.org>
parents: 53302
diff changeset
220 @defun unibyte-char-to-multibyte char
3addbe38d8a6 (Converting Representations):
Richard M. Stallman <rms@gnu.org>
parents: 53302
diff changeset
221 This convert the unibyte character @var{char} to a multibyte
3addbe38d8a6 (Converting Representations):
Richard M. Stallman <rms@gnu.org>
parents: 53302
diff changeset
222 character, based on @code{nonascii-translation-table} and
3addbe38d8a6 (Converting Representations):
Richard M. Stallman <rms@gnu.org>
parents: 53302
diff changeset
223 @code{nonascii-insert-offset}.
3addbe38d8a6 (Converting Representations):
Richard M. Stallman <rms@gnu.org>
parents: 53302
diff changeset
224 @end defun
3addbe38d8a6 (Converting Representations):
Richard M. Stallman <rms@gnu.org>
parents: 53302
diff changeset
225
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
226 @node Selecting a Representation
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
227 @section Selecting a Representation
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
228
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
229 Sometimes it is useful to examine an existing buffer or string as
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
230 multibyte when it was unibyte, or vice versa.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
231
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
232 @defun set-buffer-multibyte multibyte
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
233 Set the representation type of the current buffer. If @var{multibyte}
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
234 is non-@code{nil}, the buffer becomes multibyte. If @var{multibyte}
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
235 is @code{nil}, the buffer becomes unibyte.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
236
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
237 This function leaves the buffer contents unchanged when viewed as a
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
238 sequence of bytes. As a consequence, it can change the contents viewed
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
239 as characters; a sequence of two bytes which is treated as one character
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
240 in multibyte representation will count as two characters in unibyte
29265
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
241 representation. Character codes 128 through 159 are an exception. They
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
242 are represented by one byte in a unibyte buffer, but when the buffer is
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
243 set to multibyte, they are converted to two-byte sequences, and vice
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
244 versa.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
245
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
246 This function sets @code{enable-multibyte-characters} to record which
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
247 representation is in use. It also adjusts various data in the buffer
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
248 (including overlays, text properties and markers) so that they cover the
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
249 same text as they did before.
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
250
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
251 You cannot use @code{set-buffer-multibyte} on an indirect buffer,
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
252 because indirect buffers always inherit the representation of the
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
253 base buffer.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
254 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
255
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
256 @defun string-as-unibyte string
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
257 This function returns a string with the same bytes as @var{string} but
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
258 treating each byte as a character. This means that the value may have
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
259 more characters than @var{string} has.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
260
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
261 If @var{string} is already a unibyte string, then the value is
33912
67b6bdbd95c6 8-bit tweaks
Dave Love <fx@gnu.org>
parents: 32523
diff changeset
262 @var{string} itself. Otherwise it is a newly created string, with no
67b6bdbd95c6 8-bit tweaks
Dave Love <fx@gnu.org>
parents: 32523
diff changeset
263 text properties. If @var{string} is multibyte, any characters it
50653
6f6abeeda7ed (Selecting a Representation): Fix Texinfo usage.
Richard M. Stallman <rms@gnu.org>
parents: 49600
diff changeset
264 contains of charset @code{eight-bit-control} or @code{eight-bit-graphic}
33912
67b6bdbd95c6 8-bit tweaks
Dave Love <fx@gnu.org>
parents: 32523
diff changeset
265 are converted to the corresponding single byte.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
266 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
267
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
268 @defun string-as-multibyte string
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
269 This function returns a string with the same bytes as @var{string} but
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
270 treating each multibyte sequence as one character. This means that the
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
271 value may have fewer characters than @var{string} has.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
272
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
273 If @var{string} is already a multibyte string, then the value is
33912
67b6bdbd95c6 8-bit tweaks
Dave Love <fx@gnu.org>
parents: 32523
diff changeset
274 @var{string} itself. Otherwise it is a newly created string, with no
67b6bdbd95c6 8-bit tweaks
Dave Love <fx@gnu.org>
parents: 32523
diff changeset
275 text properties. If @var{string} is unibyte and contains any individual
67b6bdbd95c6 8-bit tweaks
Dave Love <fx@gnu.org>
parents: 32523
diff changeset
276 8-bit bytes (i.e.@: not part of a multibyte form), they are converted to
50653
6f6abeeda7ed (Selecting a Representation): Fix Texinfo usage.
Richard M. Stallman <rms@gnu.org>
parents: 49600
diff changeset
277 the corresponding multibyte character of charset @code{eight-bit-control}
6f6abeeda7ed (Selecting a Representation): Fix Texinfo usage.
Richard M. Stallman <rms@gnu.org>
parents: 49600
diff changeset
278 or @code{eight-bit-graphic}.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
279 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
280
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
281 @node Character Codes
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
282 @section Character Codes
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
283 @cindex character codes
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
284
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
285 The unibyte and multibyte text representations use different character
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
286 codes. The valid character codes for unibyte representation range from
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
287 0 to 255---the values that can fit in one byte. The valid character
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
288 codes for multibyte representation range from 0 to 524287, but not all
28877
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
289 values in that range are valid. The values 128 through 255 are not
32523
4881cd839f12 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 29339
diff changeset
290 entirely proper in multibyte text, but they can occur if you do explicit
28877
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
291 encoding and decoding (@pxref{Explicit Encoding}). Some other character
52978
1a5c50faf357 Replace @sc{foo} with @acronym{FOO}.
Eli Zaretskii <eliz@gnu.org>
parents: 52788
diff changeset
292 codes cannot occur at all in multibyte text. Only the @acronym{ASCII} codes
32523
4881cd839f12 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 29339
diff changeset
293 0 through 127 are completely legitimate in both representations.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
294
29265
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
295 @defun char-valid-p charcode &optional genericp
60677
0daf01e514e4 (Character Codes): Minor fix.
Richard M. Stallman <rms@gnu.org>
parents: 60501
diff changeset
296 This returns @code{t} if @var{charcode} is valid (either for unibyte
0daf01e514e4 (Character Codes): Minor fix.
Richard M. Stallman <rms@gnu.org>
parents: 60501
diff changeset
297 text or for multibyte text).
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
298
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
299 @example
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
300 (char-valid-p 65)
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
301 @result{} t
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
302 (char-valid-p 256)
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
303 @result{} nil
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
304 (char-valid-p 2248)
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
305 @result{} t
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
306 @end example
29265
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
307
51703
b8860fc285cb Minor Texinfo usage fix.
Richard M. Stallman <rms@gnu.org>
parents: 50653
diff changeset
308 If the optional argument @var{genericp} is non-@code{nil}, this
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
309 function also returns @code{t} if @var{charcode} is a generic
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
310 character (@pxref{Splitting Characters}).
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
311 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
312
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
313 @node Character Sets
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
314 @section Character Sets
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
315 @cindex character sets
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
316
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
317 Emacs classifies characters into various @dfn{character sets}, each of
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
318 which has a name which is a symbol. Each character belongs to one and
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
319 only one character set.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
320
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
321 In general, there is one character set for each distinct script. For
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
322 example, @code{latin-iso8859-1} is one character set,
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
323 @code{greek-iso8859-7} is another, and @code{ascii} is another. An
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
324 Emacs character set can hold at most 9025 characters; therefore, in some
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
325 cases, characters that would logically be grouped together are split
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
326 into several character sets. For example, one set of Chinese
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
327 characters, generally known as Big 5, is divided into two Emacs
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
328 character sets, @code{chinese-big5-1} and @code{chinese-big5-2}.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
329
52978
1a5c50faf357 Replace @sc{foo} with @acronym{FOO}.
Eli Zaretskii <eliz@gnu.org>
parents: 52788
diff changeset
330 @acronym{ASCII} characters are in character set @code{ascii}. The
1a5c50faf357 Replace @sc{foo} with @acronym{FOO}.
Eli Zaretskii <eliz@gnu.org>
parents: 52788
diff changeset
331 non-@acronym{ASCII} characters 128 through 159 are in character set
28900
ac620ff5fd5d *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28887
diff changeset
332 @code{eight-bit-control}, and codes 160 through 255 are in character set
ac620ff5fd5d *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28887
diff changeset
333 @code{eight-bit-graphic}.
ac620ff5fd5d *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28887
diff changeset
334
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
335 @defun charsetp object
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
336 Returns @code{t} if @var{object} is a symbol that names a character set,
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
337 @code{nil} otherwise.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
338 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
339
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
340 @defvar charset-list
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
341 The value is a list of all defined character set names.
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
342 @end defvar
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
343
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
344 @defun charset-list
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
345 This function returns the value of @code{charset-list}. It is only
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
346 provided for backward compatibility.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
347 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
348
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
349 @defun char-charset character
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
350 This function returns the name of the character set that @var{character}
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
351 belongs to, or the symbol @code{unknown} if @var{character} is not a
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
352 valid character.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
353 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
354
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
355 @defun charset-plist charset
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
356 @tindex charset-plist
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
357 This function returns the charset property list of the character set
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
358 @var{charset}. Although @var{charset} is a symbol, this is not the same
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
359 as the property list of that symbol. Charset properties are used for
52788
814620b1c1af Don't mention preferred-coding-system.
Dave Love <fx@gnu.org>
parents: 52401
diff changeset
360 special purposes within Emacs.
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
361 @end defun
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
362
60501
ac9848689bc2 (Text Representations): Clarify position-bytes.
Richard M. Stallman <rms@gnu.org>
parents: 54036
diff changeset
363 @deffn Command list-charset-chars charset
ac9848689bc2 (Text Representations): Clarify position-bytes.
Richard M. Stallman <rms@gnu.org>
parents: 54036
diff changeset
364 This command displays a list of characters in the character set
ac9848689bc2 (Text Representations): Clarify position-bytes.
Richard M. Stallman <rms@gnu.org>
parents: 54036
diff changeset
365 @var{charset}.
ac9848689bc2 (Text Representations): Clarify position-bytes.
Richard M. Stallman <rms@gnu.org>
parents: 54036
diff changeset
366 @end deffn
ac9848689bc2 (Text Representations): Clarify position-bytes.
Richard M. Stallman <rms@gnu.org>
parents: 54036
diff changeset
367
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
368 @node Chars and Bytes
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
369 @section Characters and Bytes
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
370 @cindex bytes and characters
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
371
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
372 @cindex introduction sequence
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
373 @cindex dimension (of character set)
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
374 In multibyte representation, each character occupies one or more
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
375 bytes. Each character set has an @dfn{introduction sequence}, which is
52978
1a5c50faf357 Replace @sc{foo} with @acronym{FOO}.
Eli Zaretskii <eliz@gnu.org>
parents: 52788
diff changeset
376 normally one or two bytes long. (Exception: the @code{ascii} character
1a5c50faf357 Replace @sc{foo} with @acronym{FOO}.
Eli Zaretskii <eliz@gnu.org>
parents: 52788
diff changeset
377 set and the @code{eight-bit-graphic} character set have a zero-length
29265
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
378 introduction sequence.) The introduction sequence is the beginning of
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
379 the byte sequence for any character in the character set. The rest of
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
380 the character's bytes distinguish it from the other characters in the
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
381 same character set. Depending on the character set, there are either
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
382 one or two distinguishing bytes; the number of such bytes is called the
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
383 @dfn{dimension} of the character set.
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
384
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
385 @defun charset-dimension charset
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
386 This function returns the dimension of @var{charset}; at present, the
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
387 dimension is always 1 or 2.
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
388 @end defun
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
389
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
390 @defun charset-bytes charset
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
391 @tindex charset-bytes
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
392 This function returns the number of bytes used to represent a character
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
393 in character set @var{charset}.
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
394 @end defun
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
395
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
396 This is the simplest way to determine the byte length of a character
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
397 set's introduction sequence:
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
398
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
399 @example
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
400 (- (charset-bytes @var{charset})
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
401 (charset-dimension @var{charset}))
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
402 @end example
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
403
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
404 @node Splitting Characters
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
405 @section Splitting Characters
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
406
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
407 The functions in this section convert between characters and the byte
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
408 values used to represent them. For most purposes, there is no need to
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
409 be concerned with the sequence of bytes used to represent a character,
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
410 because Emacs translates automatically when necessary.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
411
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
412 @defun split-char character
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
413 Return a list containing the name of the character set of
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
414 @var{character}, followed by one or two byte values (integers) which
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
415 identify @var{character} within that character set. The number of byte
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
416 values is the character set's dimension.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
417
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
418 If @var{character} is invalid as a character code, @code{split-char}
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
419 returns a list consisting of the symbol @code{unknown} and @var{character}.
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
420
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
421 @example
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
422 (split-char 2248)
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
423 @result{} (latin-iso8859-1 72)
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
424 (split-char 65)
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
425 @result{} (ascii 65)
29265
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
426 (split-char 128)
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
427 @result{} (eight-bit-control 128)
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
428 @end example
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
429 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
430
34811
c2170032744b make-char change
Dave Love <fx@gnu.org>
parents: 33912
diff changeset
431 @defun make-char charset &optional code1 code2
c2170032744b make-char change
Dave Love <fx@gnu.org>
parents: 33912
diff changeset
432 This function returns the character in character set @var{charset} whose
c2170032744b make-char change
Dave Love <fx@gnu.org>
parents: 33912
diff changeset
433 position codes are @var{code1} and @var{code2}. This is roughly the
c2170032744b make-char change
Dave Love <fx@gnu.org>
parents: 33912
diff changeset
434 inverse of @code{split-char}. Normally, you should specify either one
c2170032744b make-char change
Dave Love <fx@gnu.org>
parents: 33912
diff changeset
435 or both of @var{code1} and @var{code2} according to the dimension of
c2170032744b make-char change
Dave Love <fx@gnu.org>
parents: 33912
diff changeset
436 @var{charset}. For example,
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
437
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
438 @example
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
439 (make-char 'latin-iso8859-1 72)
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
440 @result{} 2248
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
441 @end example
52788
814620b1c1af Don't mention preferred-coding-system.
Dave Love <fx@gnu.org>
parents: 52401
diff changeset
442
814620b1c1af Don't mention preferred-coding-system.
Dave Love <fx@gnu.org>
parents: 52401
diff changeset
443 Actually, the eighth bit of both @var{code1} and @var{code2} is zeroed
814620b1c1af Don't mention preferred-coding-system.
Dave Love <fx@gnu.org>
parents: 52401
diff changeset
444 before they are used to index @var{charset}. Thus you may use, for
814620b1c1af Don't mention preferred-coding-system.
Dave Love <fx@gnu.org>
parents: 52401
diff changeset
445 instance, an ISO 8859 character code rather than subtracting 128, as
814620b1c1af Don't mention preferred-coding-system.
Dave Love <fx@gnu.org>
parents: 52401
diff changeset
446 is necessary to index the corresponding Emacs charset.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
447 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
448
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
449 @cindex generic characters
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
450 If you call @code{make-char} with no @var{byte-values}, the result is
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
451 a @dfn{generic character} which stands for @var{charset}. A generic
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
452 character is an integer, but it is @emph{not} valid for insertion in the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
453 buffer as a character. It can be used in @code{char-table-range} to
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
454 refer to the whole character set (@pxref{Char-Tables}).
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
455 @code{char-valid-p} returns @code{nil} for generic characters.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
456 For example:
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
457
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
458 @example
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
459 (make-char 'latin-iso8859-1)
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
460 @result{} 2176
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
461 (char-valid-p 2176)
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
462 @result{} nil
29265
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
463 (char-valid-p 2176 t)
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
464 @result{} t
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
465 (split-char 2176)
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
466 @result{} (latin-iso8859-1 0)
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
467 @end example
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
468
52978
1a5c50faf357 Replace @sc{foo} with @acronym{FOO}.
Eli Zaretskii <eliz@gnu.org>
parents: 52788
diff changeset
469 The character sets @code{ascii}, @code{eight-bit-control}, and
1a5c50faf357 Replace @sc{foo} with @acronym{FOO}.
Eli Zaretskii <eliz@gnu.org>
parents: 52788
diff changeset
470 @code{eight-bit-graphic} don't have corresponding generic characters. If
34811
c2170032744b make-char change
Dave Love <fx@gnu.org>
parents: 33912
diff changeset
471 @var{charset} is one of them and you don't supply @var{code1},
c2170032744b make-char change
Dave Love <fx@gnu.org>
parents: 33912
diff changeset
472 @code{make-char} returns the character code corresponding to the
c2170032744b make-char change
Dave Love <fx@gnu.org>
parents: 33912
diff changeset
473 smallest code in @var{charset}.
29265
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
474
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
475 @node Scanning Charsets
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
476 @section Scanning for Character Sets
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
477
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
478 Sometimes it is useful to find out which character sets appear in a
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
479 part of a buffer or a string. One use for this is in determining which
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
480 coding systems (@pxref{Coding Systems}) are capable of representing all
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
481 of the text in question.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
482
60501
ac9848689bc2 (Text Representations): Clarify position-bytes.
Richard M. Stallman <rms@gnu.org>
parents: 54036
diff changeset
483 @defun charset-after &optional pos
ac9848689bc2 (Text Representations): Clarify position-bytes.
Richard M. Stallman <rms@gnu.org>
parents: 54036
diff changeset
484 This function return the charset of a character in the current buffer
ac9848689bc2 (Text Representations): Clarify position-bytes.
Richard M. Stallman <rms@gnu.org>
parents: 54036
diff changeset
485 at position @var{pos}. If @var{pos} is omitted or @code{nil}, it
ac9848689bc2 (Text Representations): Clarify position-bytes.
Richard M. Stallman <rms@gnu.org>
parents: 54036
diff changeset
486 defauls to the current value of point. If @var{pos} is out of range,
ac9848689bc2 (Text Representations): Clarify position-bytes.
Richard M. Stallman <rms@gnu.org>
parents: 54036
diff changeset
487 the value is @code{nil}.
ac9848689bc2 (Text Representations): Clarify position-bytes.
Richard M. Stallman <rms@gnu.org>
parents: 54036
diff changeset
488 @end defun
ac9848689bc2 (Text Representations): Clarify position-bytes.
Richard M. Stallman <rms@gnu.org>
parents: 54036
diff changeset
489
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
490 @defun find-charset-region beg end &optional translation
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
491 This function returns a list of the character sets that appear in the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
492 current buffer between positions @var{beg} and @var{end}.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
493
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
494 The optional argument @var{translation} specifies a translation table to
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
495 be used in scanning the text (@pxref{Translation of Characters}). If it
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
496 is non-@code{nil}, then each character in the region is translated
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
497 through this table, and the value returned describes the translated
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
498 characters instead of the characters actually in the buffer.
28887
0778eff185b6 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28877
diff changeset
499 @end defun
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
500
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
501 @defun find-charset-string string &optional translation
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
502 This function returns a list of the character sets that appear in the
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
503 string @var{string}. It is just like @code{find-charset-region}, except
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
504 that it applies to the contents of @var{string} instead of part of the
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
505 current buffer.
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
506 @end defun
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
507
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
508 @node Translation of Characters
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
509 @section Translation of Characters
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
510 @cindex character translation tables
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
511 @cindex translation tables
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
512
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
513 A @dfn{translation table} is a char-table that specifies a mapping
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
514 of characters into characters. These tables are used in encoding and
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
515 decoding, and for other purposes. Some coding systems specify their
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
516 own particular translation tables; there are also default translation
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
517 tables which apply to all other coding systems.
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
518
54036
9706b0221102 (Translation of Characters): Give examples of use.
Richard M. Stallman <rms@gnu.org>
parents: 53431
diff changeset
519 For instance, the coding-system @code{utf-8} has a translation table
9706b0221102 (Translation of Characters): Give examples of use.
Richard M. Stallman <rms@gnu.org>
parents: 53431
diff changeset
520 that maps characters of various charsets (e.g.,
9706b0221102 (Translation of Characters): Give examples of use.
Richard M. Stallman <rms@gnu.org>
parents: 53431
diff changeset
521 @code{latin-iso8859-@var{x}}) into Unicode character sets. This way,
9706b0221102 (Translation of Characters): Give examples of use.
Richard M. Stallman <rms@gnu.org>
parents: 53431
diff changeset
522 it can encode Latin-2 characters into UTF-8. Meanwhile,
9706b0221102 (Translation of Characters): Give examples of use.
Richard M. Stallman <rms@gnu.org>
parents: 53431
diff changeset
523 @code{unify-8859-on-decoding-mode} operates by specifying
9706b0221102 (Translation of Characters): Give examples of use.
Richard M. Stallman <rms@gnu.org>
parents: 53431
diff changeset
524 @code{standard-translation-table-for-decode} to translate
9706b0221102 (Translation of Characters): Give examples of use.
Richard M. Stallman <rms@gnu.org>
parents: 53431
diff changeset
525 Latin-@var{x} characters into corresponding Unicode characters.
9706b0221102 (Translation of Characters): Give examples of use.
Richard M. Stallman <rms@gnu.org>
parents: 53431
diff changeset
526
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
527 @defun make-translation-table &rest translations
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
528 This function returns a translation table based on the argument
35752
e1d9a16467ae *** empty log message ***
Dave Love <fx@gnu.org>
parents: 35493
diff changeset
529 @var{translations}. Each element of @var{translations} should be a
e1d9a16467ae *** empty log message ***
Dave Love <fx@gnu.org>
parents: 35493
diff changeset
530 list of elements of the form @code{(@var{from} . @var{to})}; this says
e1d9a16467ae *** empty log message ***
Dave Love <fx@gnu.org>
parents: 35493
diff changeset
531 to translate the character @var{from} into @var{to}.
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
532
35493
679a73dad19a make-translation-table addition
Dave Love <fx@gnu.org>
parents: 34811
diff changeset
533 The arguments and the forms in each argument are processed in order,
679a73dad19a make-translation-table addition
Dave Love <fx@gnu.org>
parents: 34811
diff changeset
534 and if a previous form already translates @var{to} to some other
679a73dad19a make-translation-table addition
Dave Love <fx@gnu.org>
parents: 34811
diff changeset
535 character, say @var{to-alt}, @var{from} is also translated to
679a73dad19a make-translation-table addition
Dave Love <fx@gnu.org>
parents: 34811
diff changeset
536 @var{to-alt}.
679a73dad19a make-translation-table addition
Dave Love <fx@gnu.org>
parents: 34811
diff changeset
537
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
538 You can also map one whole character set into another character set with
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
539 the same dimension. To do this, you specify a generic character (which
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
540 designates a character set) for @var{from} (@pxref{Splitting Characters}).
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
541 In this case, if @var{to} is also a generic character, its character
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
542 set should have the same dimension as @var{from}'s. Then the
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
543 translation table translates each character of @var{from}'s character
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
544 set into the corresponding character of @var{to}'s character set. If
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
545 @var{from} is a generic character and @var{to} is an ordinary
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
546 character, then the translation table translates every character of
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
547 @var{from}'s character set into @var{to}.
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
548 @end defun
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
549
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
550 In decoding, the translation table's translations are applied to the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
551 characters that result from ordinary decoding. If a coding system has
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
552 property @code{translation-table-for-decode}, that specifies the
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
553 translation table to use. (This is a property of the coding system,
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
554 as returned by @code{coding-system-get}, not a property of the symbol
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
555 that is the coding system's name. @xref{Coding System Basics,, Basic
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
556 Concepts of Coding Systems}.) Otherwise, if
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
557 @code{standard-translation-table-for-decode} is non-@code{nil},
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
558 decoding uses that table.
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
559
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
560 In encoding, the translation table's translations are applied to the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
561 characters in the buffer, and the result of translation is actually
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
562 encoded. If a coding system has property
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
563 @code{translation-table-for-encode}, that specifies the translation
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
564 table to use. Otherwise the variable
23433
a53274056f20 Fix names of standard-translation-table-for-decode(encode).
Richard M. Stallman <rms@gnu.org>
parents: 23110
diff changeset
565 @code{standard-translation-table-for-encode} specifies the translation
a53274056f20 Fix names of standard-translation-table-for-decode(encode).
Richard M. Stallman <rms@gnu.org>
parents: 23110
diff changeset
566 table.
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
567
23433
a53274056f20 Fix names of standard-translation-table-for-decode(encode).
Richard M. Stallman <rms@gnu.org>
parents: 23110
diff changeset
568 @defvar standard-translation-table-for-decode
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
569 This is the default translation table for decoding, for
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
570 coding systems that don't specify any other translation table.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
571 @end defvar
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
572
23433
a53274056f20 Fix names of standard-translation-table-for-decode(encode).
Richard M. Stallman <rms@gnu.org>
parents: 23110
diff changeset
573 @defvar standard-translation-table-for-encode
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
574 This is the default translation table for encoding, for
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
575 coding systems that don't specify any other translation table.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
576 @end defvar
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
577
51990
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
578 @defvar translation-table-for-input
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
579 Self-inserting characters are translated through this translation
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
580 table before they are inserted. This variable automatically becomes
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
581 buffer-local when set.
54036
9706b0221102 (Translation of Characters): Give examples of use.
Richard M. Stallman <rms@gnu.org>
parents: 53431
diff changeset
582
9706b0221102 (Translation of Characters): Give examples of use.
Richard M. Stallman <rms@gnu.org>
parents: 53431
diff changeset
583 @code{set-buffer-file-coding-system} sets this variable so that your
9706b0221102 (Translation of Characters): Give examples of use.
Richard M. Stallman <rms@gnu.org>
parents: 53431
diff changeset
584 keyboard input gets translated into the character sets that the buffer
9706b0221102 (Translation of Characters): Give examples of use.
Richard M. Stallman <rms@gnu.org>
parents: 53431
diff changeset
585 is likely to contain.
51990
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
586 @end defvar
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
587
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
588 @node Coding Systems
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
589 @section Coding Systems
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
590
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
591 @cindex coding system
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
592 When Emacs reads or writes a file, and when Emacs sends text to a
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
593 subprocess or receives text from a subprocess, it normally performs
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
594 character code conversion and end-of-line conversion as specified
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
595 by a particular @dfn{coding system}.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
596
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
597 How to define a coding system is an arcane matter, and is not
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
598 documented here.
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
599
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
600 @menu
28635
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
601 * Coding System Basics:: Basic concepts.
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
602 * Encoding and I/O:: How file I/O functions handle coding systems.
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
603 * Lisp and Coding Systems:: Functions to operate on coding system names.
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
604 * User-Chosen Coding Systems:: Asking the user to choose a coding system.
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
605 * Default Coding Systems:: Controlling the default choices.
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
606 * Specifying Coding Systems:: Requesting a particular coding system
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
607 for a single file operation.
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
608 * Explicit Encoding:: Encoding or decoding text without doing I/O.
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
609 * Terminal I/O Encoding:: Use of encoding for terminal I/O.
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
610 * MS-DOS File Types:: How DOS "text" and "binary" files
cda2b6ed6aec *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27374
diff changeset
611 relate to coding systems.
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
612 @end menu
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
613
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
614 @node Coding System Basics
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
615 @subsection Basic Concepts of Coding Systems
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
616
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
617 @cindex character code conversion
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
618 @dfn{Character code conversion} involves conversion between the encoding
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
619 used inside Emacs and some other encoding. Emacs supports many
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
620 different encodings, in that it can convert to and from them. For
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
621 example, it can convert text to or from encodings such as Latin 1, Latin
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
622 2, Latin 3, Latin 4, Latin 5, and several variants of ISO 2022. In some
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
623 cases, Emacs supports several alternative encodings for the same
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
624 characters; for example, there are three coding systems for the Cyrillic
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
625 (Russian) alphabet: ISO, Alternativnyj, and KOI8.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
626
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
627 Most coding systems specify a particular character code for
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
628 conversion, but some of them leave the choice unspecified---to be chosen
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
629 heuristically for each file, based on the data.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
630
61233
5c7a0c8de2df (Coding System Basics): Another cleanup.
Richard M. Stallman <rms@gnu.org>
parents: 61218
diff changeset
631 In general, a coding system doesn't guarantee roundtrip identity:
5c7a0c8de2df (Coding System Basics): Another cleanup.
Richard M. Stallman <rms@gnu.org>
parents: 61218
diff changeset
632 decoding a byte sequence using coding system, then encoding the
5c7a0c8de2df (Coding System Basics): Another cleanup.
Richard M. Stallman <rms@gnu.org>
parents: 61218
diff changeset
633 resulting text in the same coding system, can produce a different byte
5c7a0c8de2df (Coding System Basics): Another cleanup.
Richard M. Stallman <rms@gnu.org>
parents: 61218
diff changeset
634 sequence. However, the following coding systems do guarantee that the
5c7a0c8de2df (Coding System Basics): Another cleanup.
Richard M. Stallman <rms@gnu.org>
parents: 61218
diff changeset
635 byte sequence will be the same as what you originally decoded:
61185
447c3b4db32f (Coding System Basics): Describe about rondtrip
Kenichi Handa <handa@m17n.org>
parents: 60677
diff changeset
636
447c3b4db32f (Coding System Basics): Describe about rondtrip
Kenichi Handa <handa@m17n.org>
parents: 60677
diff changeset
637 @quotation
447c3b4db32f (Coding System Basics): Describe about rondtrip
Kenichi Handa <handa@m17n.org>
parents: 60677
diff changeset
638 chinese-big5 chinese-iso-8bit cyrillic-iso-8bit emacs-mule
447c3b4db32f (Coding System Basics): Describe about rondtrip
Kenichi Handa <handa@m17n.org>
parents: 60677
diff changeset
639 greek-iso-8bit hebrew-iso-8bit iso-latin-1 iso-latin-2 iso-latin-3
447c3b4db32f (Coding System Basics): Describe about rondtrip
Kenichi Handa <handa@m17n.org>
parents: 60677
diff changeset
640 iso-latin-4 iso-latin-5 iso-latin-8 iso-latin-9 iso-safe
447c3b4db32f (Coding System Basics): Describe about rondtrip
Kenichi Handa <handa@m17n.org>
parents: 60677
diff changeset
641 japanese-iso-8bit japanese-shift-jis korean-iso-8bit raw-text
447c3b4db32f (Coding System Basics): Describe about rondtrip
Kenichi Handa <handa@m17n.org>
parents: 60677
diff changeset
642 @end quotation
447c3b4db32f (Coding System Basics): Describe about rondtrip
Kenichi Handa <handa@m17n.org>
parents: 60677
diff changeset
643
61233
5c7a0c8de2df (Coding System Basics): Another cleanup.
Richard M. Stallman <rms@gnu.org>
parents: 61218
diff changeset
644 Encoding buffer text and then decoding the result can also fail to
5c7a0c8de2df (Coding System Basics): Another cleanup.
Richard M. Stallman <rms@gnu.org>
parents: 61218
diff changeset
645 reproduce the original text. For instance, if you encode Latin-2
61218
59673cc65537 (Coding System Basics): Clarify previous change.
Richard M. Stallman <rms@gnu.org>
parents: 61185
diff changeset
646 characters with @code{utf-8} and decode the result using the same
59673cc65537 (Coding System Basics): Clarify previous change.
Richard M. Stallman <rms@gnu.org>
parents: 61185
diff changeset
647 coding system, you'll get Unicode characters (of charset
61233
5c7a0c8de2df (Coding System Basics): Another cleanup.
Richard M. Stallman <rms@gnu.org>
parents: 61218
diff changeset
648 @code{mule-unicode-0100-24ff}). If you encode Unicode characters with
5c7a0c8de2df (Coding System Basics): Another cleanup.
Richard M. Stallman <rms@gnu.org>
parents: 61218
diff changeset
649 @code{iso-latin-2} and decode the result with the same coding system,
5c7a0c8de2df (Coding System Basics): Another cleanup.
Richard M. Stallman <rms@gnu.org>
parents: 61218
diff changeset
650 you'll get Latin-2 characters.
61185
447c3b4db32f (Coding System Basics): Describe about rondtrip
Kenichi Handa <handa@m17n.org>
parents: 60677
diff changeset
651
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
652 @cindex end of line conversion
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
653 @dfn{End of line conversion} handles three different conventions used
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
654 on various systems for representing end of line in files. The Unix
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
655 convention is to use the linefeed character (also called newline). The
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
656 DOS convention is to use a carriage-return and a linefeed at the end of
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
657 a line. The Mac convention is to use just carriage-return.
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
658
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
659 @cindex base coding system
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
660 @cindex variant coding system
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
661 @dfn{Base coding systems} such as @code{latin-1} leave the end-of-line
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
662 conversion unspecified, to be chosen based on the data. @dfn{Variant
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
663 coding systems} such as @code{latin-1-unix}, @code{latin-1-dos} and
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
664 @code{latin-1-mac} specify the end-of-line conversion explicitly as
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
665 well. Most base coding systems have three corresponding variants whose
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
666 names are formed by adding @samp{-unix}, @samp{-dos} and @samp{-mac}.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
667
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
668 The coding system @code{raw-text} is special in that it prevents
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
669 character code conversion, and causes the buffer visited with that
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
670 coding system to be a unibyte buffer. It does not specify the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
671 end-of-line conversion, allowing that to be determined as usual by the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
672 data, and has the usual three variants which specify the end-of-line
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
673 conversion. @code{no-conversion} is equivalent to @code{raw-text-unix}:
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
674 it specifies no conversion of either character codes or end-of-line.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
675
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
676 The coding system @code{emacs-mule} specifies that the data is
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
677 represented in the internal Emacs encoding. This is like
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
678 @code{raw-text} in that no code conversion happens, but different in
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
679 that the result is multibyte data.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
680
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
681 @defun coding-system-get coding-system property
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
682 This function returns the specified property of the coding system
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
683 @var{coding-system}. Most coding system properties exist for internal
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
684 purposes, but one that you might find useful is @code{mime-charset}.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
685 That property's value is the name used in MIME for the character coding
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
686 which this coding system can read and write. Examples:
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
687
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
688 @example
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
689 (coding-system-get 'iso-latin-1 'mime-charset)
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
690 @result{} iso-8859-1
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
691 (coding-system-get 'iso-2022-cn 'mime-charset)
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
692 @result{} iso-2022-cn
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
693 (coding-system-get 'cyrillic-koi8 'mime-charset)
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
694 @result{} koi8-r
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
695 @end example
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
696
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
697 The value of the @code{mime-charset} property is also defined
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
698 as an alias for the coding system.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
699 @end defun
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
700
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
701 @node Encoding and I/O
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
702 @subsection Encoding and I/O
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
703
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
704 The principal purpose of coding systems is for use in reading and
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
705 writing files. The function @code{insert-file-contents} uses
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
706 a coding system for decoding the file data, and @code{write-region}
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
707 uses one to encode the buffer contents.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
708
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
709 You can specify the coding system to use either explicitly
60501
ac9848689bc2 (Text Representations): Clarify position-bytes.
Richard M. Stallman <rms@gnu.org>
parents: 54036
diff changeset
710 (@pxref{Specifying Coding Systems}), or implicitly using a default
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
711 mechanism (@pxref{Default Coding Systems}). But these methods may not
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
712 completely specify what to do. For example, they may choose a coding
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
713 system such as @code{undefined} which leaves the character code
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
714 conversion to be determined from the data. In these cases, the I/O
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
715 operation finishes the job of choosing a coding system. Very often
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
716 you will want to find out afterwards which coding system was chosen.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
717
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
718 @defvar buffer-file-coding-system
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
719 This variable records the coding system that was used for visiting the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
720 current buffer. It is used for saving the buffer, and for writing part
43632
faa7540b3866 (Encoding and I/O): Mention select-safe-coding-system in the documentation
Eli Zaretskii <eliz@gnu.org>
parents: 40855
diff changeset
721 of the buffer with @code{write-region}. If the text to be written
faa7540b3866 (Encoding and I/O): Mention select-safe-coding-system in the documentation
Eli Zaretskii <eliz@gnu.org>
parents: 40855
diff changeset
722 cannot be safely encoded using the coding system specified by this
faa7540b3866 (Encoding and I/O): Mention select-safe-coding-system in the documentation
Eli Zaretskii <eliz@gnu.org>
parents: 40855
diff changeset
723 variable, these operations select an alternative encoding by calling
faa7540b3866 (Encoding and I/O): Mention select-safe-coding-system in the documentation
Eli Zaretskii <eliz@gnu.org>
parents: 40855
diff changeset
724 the function @code{select-safe-coding-system} (@pxref{User-Chosen
faa7540b3866 (Encoding and I/O): Mention select-safe-coding-system in the documentation
Eli Zaretskii <eliz@gnu.org>
parents: 40855
diff changeset
725 Coding Systems}). If selecting a different encoding requires to ask
faa7540b3866 (Encoding and I/O): Mention select-safe-coding-system in the documentation
Eli Zaretskii <eliz@gnu.org>
parents: 40855
diff changeset
726 the user to specify a coding system, @code{buffer-file-coding-system}
faa7540b3866 (Encoding and I/O): Mention select-safe-coding-system in the documentation
Eli Zaretskii <eliz@gnu.org>
parents: 40855
diff changeset
727 is updated to the newly selected coding system.
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
728
43632
faa7540b3866 (Encoding and I/O): Mention select-safe-coding-system in the documentation
Eli Zaretskii <eliz@gnu.org>
parents: 40855
diff changeset
729 @code{buffer-file-coding-system} does @emph{not} affect sending text
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
730 to a subprocess.
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
731 @end defvar
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
732
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
733 @defvar save-buffer-coding-system
29265
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
734 This variable specifies the coding system for saving the buffer (by
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
735 overriding @code{buffer-file-coding-system}). Note that it is not used
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
736 for @code{write-region}.
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
737
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
738 When a command to save the buffer starts out to use
29265
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
739 @code{buffer-file-coding-system} (or @code{save-buffer-coding-system}),
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
740 and that coding system cannot handle
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
741 the actual text in the buffer, the command asks the user to choose
43632
faa7540b3866 (Encoding and I/O): Mention select-safe-coding-system in the documentation
Eli Zaretskii <eliz@gnu.org>
parents: 40855
diff changeset
742 another coding system (by calling @code{select-safe-coding-system}).
faa7540b3866 (Encoding and I/O): Mention select-safe-coding-system in the documentation
Eli Zaretskii <eliz@gnu.org>
parents: 40855
diff changeset
743 After that happens, the command also updates
faa7540b3866 (Encoding and I/O): Mention select-safe-coding-system in the documentation
Eli Zaretskii <eliz@gnu.org>
parents: 40855
diff changeset
744 @code{buffer-file-coding-system} to represent the coding system that
faa7540b3866 (Encoding and I/O): Mention select-safe-coding-system in the documentation
Eli Zaretskii <eliz@gnu.org>
parents: 40855
diff changeset
745 the user specified.
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
746 @end defvar
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
747
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
748 @defvar last-coding-system-used
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
749 I/O operations for files and subprocesses set this variable to the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
750 coding system name that was used. The explicit encoding and decoding
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
751 functions (@pxref{Explicit Encoding}) set it too.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
752
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
753 @strong{Warning:} Since receiving subprocess output sets this variable,
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
754 it can change whenever Emacs waits; therefore, you should copy the
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
755 value shortly after the function call that stores the value you are
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
756 interested in.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
757 @end defvar
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
758
23110
0d84817a4973 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22267
diff changeset
759 The variable @code{selection-coding-system} specifies how to encode
0d84817a4973 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22267
diff changeset
760 selections for the window system. @xref{Window System Selections}.
0d84817a4973 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22267
diff changeset
761
53431
3addbe38d8a6 (Converting Representations):
Richard M. Stallman <rms@gnu.org>
parents: 53302
diff changeset
762 @defvar file-name-coding-system
3addbe38d8a6 (Converting Representations):
Richard M. Stallman <rms@gnu.org>
parents: 53302
diff changeset
763 The variable @code{file-name-coding-system} specifies the coding
3addbe38d8a6 (Converting Representations):
Richard M. Stallman <rms@gnu.org>
parents: 53302
diff changeset
764 system to use for encoding file names. Emacs encodes file names using
3addbe38d8a6 (Converting Representations):
Richard M. Stallman <rms@gnu.org>
parents: 53302
diff changeset
765 that coding system for all file operations. If
3addbe38d8a6 (Converting Representations):
Richard M. Stallman <rms@gnu.org>
parents: 53302
diff changeset
766 @code{file-name-coding-system} is @code{nil}, Emacs uses a default
3addbe38d8a6 (Converting Representations):
Richard M. Stallman <rms@gnu.org>
parents: 53302
diff changeset
767 coding system determined by the selected language environment. In the
3addbe38d8a6 (Converting Representations):
Richard M. Stallman <rms@gnu.org>
parents: 53302
diff changeset
768 default language environment, any non-@acronym{ASCII} characters in
3addbe38d8a6 (Converting Representations):
Richard M. Stallman <rms@gnu.org>
parents: 53302
diff changeset
769 file names are not encoded specially; they appear in the file system
3addbe38d8a6 (Converting Representations):
Richard M. Stallman <rms@gnu.org>
parents: 53302
diff changeset
770 using the internal Emacs representation.
3addbe38d8a6 (Converting Representations):
Richard M. Stallman <rms@gnu.org>
parents: 53302
diff changeset
771 @end defvar
3addbe38d8a6 (Converting Representations):
Richard M. Stallman <rms@gnu.org>
parents: 53302
diff changeset
772
3addbe38d8a6 (Converting Representations):
Richard M. Stallman <rms@gnu.org>
parents: 53302
diff changeset
773 @strong{Warning:} if you change @code{file-name-coding-system} (or
3addbe38d8a6 (Converting Representations):
Richard M. Stallman <rms@gnu.org>
parents: 53302
diff changeset
774 the language environment) in the middle of an Emacs session, problems
3addbe38d8a6 (Converting Representations):
Richard M. Stallman <rms@gnu.org>
parents: 53302
diff changeset
775 can result if you have already visited files whose names were encoded
3addbe38d8a6 (Converting Representations):
Richard M. Stallman <rms@gnu.org>
parents: 53302
diff changeset
776 using the earlier coding system and are handled differently under the
3addbe38d8a6 (Converting Representations):
Richard M. Stallman <rms@gnu.org>
parents: 53302
diff changeset
777 new coding system. If you try to save one of these buffers under the
3addbe38d8a6 (Converting Representations):
Richard M. Stallman <rms@gnu.org>
parents: 53302
diff changeset
778 visited file name, saving may use the wrong file name, or it may get
3addbe38d8a6 (Converting Representations):
Richard M. Stallman <rms@gnu.org>
parents: 53302
diff changeset
779 an error. If such a problem happens, use @kbd{C-x C-w} to specify a
3addbe38d8a6 (Converting Representations):
Richard M. Stallman <rms@gnu.org>
parents: 53302
diff changeset
780 new file name for that buffer.
3addbe38d8a6 (Converting Representations):
Richard M. Stallman <rms@gnu.org>
parents: 53302
diff changeset
781
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
782 @node Lisp and Coding Systems
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
783 @subsection Coding Systems in Lisp
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
784
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
785 Here are the Lisp facilities for working with coding systems:
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
786
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
787 @defun coding-system-list &optional base-only
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
788 This function returns a list of all coding system names (symbols). If
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
789 @var{base-only} is non-@code{nil}, the value includes only the
29265
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
790 base coding systems. Otherwise, it includes alias and variant coding
69f20c18d6eb *** empty log message ***
Kenichi Handa <handa@m17n.org>
parents: 28900
diff changeset
791 systems as well.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
792 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
793
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
794 @defun coding-system-p object
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
795 This function returns @code{t} if @var{object} is a coding system
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
796 name or @code{nil}.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
797 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
798
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
799 @defun check-coding-system coding-system
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
800 This function checks the validity of @var{coding-system}.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
801 If that is valid, it returns @var{coding-system}.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
802 Otherwise it signals an error with condition @code{coding-system-error}.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
803 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
804
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
805 @defun coding-system-change-eol-conversion coding-system eol-type
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
806 This function returns a coding system which is like @var{coding-system}
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
807 except for its eol conversion, which is specified by @code{eol-type}.
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
808 @var{eol-type} should be @code{unix}, @code{dos}, @code{mac}, or
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
809 @code{nil}. If it is @code{nil}, the returned coding system determines
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
810 the end-of-line conversion from the data.
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
811
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
812 @var{eol-type} may also be 0, 1 or 2, standing for @code{unix},
53302
ad4360363d82 Remove trailing whitespace
Luc Teirlinck <teirllm@auburn.edu>
parents: 53291
diff changeset
813 @code{dos} and @code{mac}, respectively.
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
814 @end defun
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
815
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
816 @defun coding-system-change-text-conversion eol-coding text-coding
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
817 This function returns a coding system which uses the end-of-line
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
818 conversion of @var{eol-coding}, and the text conversion of
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
819 @var{text-coding}. If @var{text-coding} is @code{nil}, it returns
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
820 @code{undecided}, or one of its variants according to @var{eol-coding}.
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
821 @end defun
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
822
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
823 @defun find-coding-systems-region from to
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
824 This function returns a list of coding systems that could be used to
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
825 encode a text between @var{from} and @var{to}. All coding systems in
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
826 the list can safely encode any multibyte characters in that portion of
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
827 the text.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
828
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
829 If the text contains no multibyte characters, the function returns the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
830 list @code{(undecided)}.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
831 @end defun
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
832
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
833 @defun find-coding-systems-string string
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
834 This function returns a list of coding systems that could be used to
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
835 encode the text of @var{string}. All coding systems in the list can
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
836 safely encode any multibyte characters in @var{string}. If the text
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
837 contains no multibyte characters, this returns the list
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
838 @code{(undecided)}.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
839 @end defun
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
840
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
841 @defun find-coding-systems-for-charsets charsets
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
842 This function returns a list of coding systems that could be used to
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
843 encode all the character sets in the list @var{charsets}.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
844 @end defun
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
845
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
846 @defun detect-coding-region start end &optional highest
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
847 This function chooses a plausible coding system for decoding the text
28877
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
848 from @var{start} to @var{end}. This text should be a byte sequence
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
849 (@pxref{Explicit Encoding}).
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
850
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
851 Normally this function returns a list of coding systems that could
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
852 handle decoding the text that was scanned. They are listed in order of
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
853 decreasing priority. But if @var{highest} is non-@code{nil}, then the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
854 return value is just one coding system, the one that is highest in
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
855 priority.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
856
52978
1a5c50faf357 Replace @sc{foo} with @acronym{FOO}.
Eli Zaretskii <eliz@gnu.org>
parents: 52788
diff changeset
857 If the region contains only @acronym{ASCII} characters, the value
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
858 is @code{undecided} or @code{(undecided)}, or a variant specifying
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
859 end-of-line conversion, if that can be deduced from the text.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
860 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
861
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
862 @defun detect-coding-string string &optional highest
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
863 This function is like @code{detect-coding-region} except that it
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
864 operates on the contents of @var{string} instead of bytes in the buffer.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
865 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
866
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
867 @xref{Coding systems for a subprocess,, Process Information}, in
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
868 particular the description of the functions
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
869 @code{process-coding-system} and @code{set-process-coding-system}, for
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
870 how to examine or set the coding systems used for I/O to a subprocess.
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
871
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
872 @node User-Chosen Coding Systems
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
873 @subsection User-Chosen Coding Systems
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
874
43632
faa7540b3866 (Encoding and I/O): Mention select-safe-coding-system in the documentation
Eli Zaretskii <eliz@gnu.org>
parents: 40855
diff changeset
875 @cindex select safe coding system
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
876 @defun select-safe-coding-system from to &optional default-coding-system accept-default-p file
39204
8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document
Eli Zaretskii <eliz@gnu.org>
parents: 35752
diff changeset
877 This function selects a coding system for encoding specified text,
8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document
Eli Zaretskii <eliz@gnu.org>
parents: 35752
diff changeset
878 asking the user to choose if necessary. Normally the specified text
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
879 is the text in the current buffer between @var{from} and @var{to}. If
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
880 @var{from} is a string, the string specifies the text to encode, and
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
881 @var{to} is ignored.
39204
8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document
Eli Zaretskii <eliz@gnu.org>
parents: 35752
diff changeset
882
8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document
Eli Zaretskii <eliz@gnu.org>
parents: 35752
diff changeset
883 If @var{default-coding-system} is non-@code{nil}, that is the first
8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document
Eli Zaretskii <eliz@gnu.org>
parents: 35752
diff changeset
884 coding system to try; if that can handle the text,
8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document
Eli Zaretskii <eliz@gnu.org>
parents: 35752
diff changeset
885 @code{select-safe-coding-system} returns that coding system. It can
8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document
Eli Zaretskii <eliz@gnu.org>
parents: 35752
diff changeset
886 also be a list of coding systems; then the function tries each of them
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
887 one by one. After trying all of them, it next tries the current
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
888 buffer's value of @code{buffer-file-coding-system} (if it is not
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
889 @code{undecided}), then the value of
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
890 @code{default-buffer-file-coding-system} and finally the user's most
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
891 preferred coding system, which the user can set using the command
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
892 @code{prefer-coding-system} (@pxref{Recognize Coding,, Recognizing
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
893 Coding Systems, emacs, The GNU Emacs Manual}).
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
894
39204
8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document
Eli Zaretskii <eliz@gnu.org>
parents: 35752
diff changeset
895 If one of those coding systems can safely encode all the specified
8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document
Eli Zaretskii <eliz@gnu.org>
parents: 35752
diff changeset
896 text, @code{select-safe-coding-system} chooses it and returns it.
8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document
Eli Zaretskii <eliz@gnu.org>
parents: 35752
diff changeset
897 Otherwise, it asks the user to choose from a list of coding systems
8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document
Eli Zaretskii <eliz@gnu.org>
parents: 35752
diff changeset
898 which can encode all the text, and returns the user's choice.
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
899
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
900 @var{default-coding-system} can also be a list whose first element is
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
901 t and whose other elements are coding systems. Then, if no coding
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
902 system in the list can handle the text, @code{select-safe-coding-system}
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
903 queries the user immediately, without trying any of the three
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
904 alternatives described above.
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
905
39204
8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document
Eli Zaretskii <eliz@gnu.org>
parents: 35752
diff changeset
906 The optional argument @var{accept-default-p}, if non-@code{nil},
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
907 should be a function to determine whether a coding system selected
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
908 without user interaction is acceptable. @code{select-safe-coding-system}
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
909 calls this function with one argument, the base coding system of the
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
910 selected coding system. If @var{accept-default-p} returns @code{nil},
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
911 @code{select-safe-coding-system} rejects the silently selected coding
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
912 system, and asks the user to select a coding system from a list of
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
913 possible candidates.
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
914
39204
8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document
Eli Zaretskii <eliz@gnu.org>
parents: 35752
diff changeset
915 @vindex select-safe-coding-system-accept-default-p
8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document
Eli Zaretskii <eliz@gnu.org>
parents: 35752
diff changeset
916 If the variable @code{select-safe-coding-system-accept-default-p} is
8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document
Eli Zaretskii <eliz@gnu.org>
parents: 35752
diff changeset
917 non-@code{nil}, its value overrides the value of
8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document
Eli Zaretskii <eliz@gnu.org>
parents: 35752
diff changeset
918 @var{accept-default-p}.
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
919
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
920 As a final step, before returning the chosen coding system,
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
921 @code{select-safe-coding-system} checks whether that coding system is
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
922 consistent with what would be selected if the contents of the region
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
923 were read from a file. (If not, this could lead to data corruption in
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
924 a file subsequently re-visited and edited.) Normally,
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
925 @code{select-safe-coding-system} uses @code{buffer-file-name} as the
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
926 file for this purpose, but if @var{file} is non-@code{nil}, it uses
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
927 that file instead (this can be relevant for @code{write-region} and
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
928 similar functions). If it detects an apparent inconsistency,
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
929 @code{select-safe-coding-system} queries the user before selecting the
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
930 coding system.
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
931 @end defun
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
932
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
933 Here are two functions you can use to let the user specify a coding
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
934 system, with completion. @xref{Completion}.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
935
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
936 @defun read-coding-system prompt &optional default
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
937 This function reads a coding system using the minibuffer, prompting with
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
938 string @var{prompt}, and returns the coding system name as a symbol. If
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
939 the user enters null input, @var{default} specifies which coding system
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
940 to return. It should be a symbol or a string.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
941 @end defun
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
942
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
943 @defun read-non-nil-coding-system prompt
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
944 This function reads a coding system using the minibuffer, prompting with
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
945 string @var{prompt}, and returns the coding system name as a symbol. If
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
946 the user tries to enter null input, it asks the user to try again.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
947 @xref{Coding Systems}.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
948 @end defun
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
949
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
950 @node Default Coding Systems
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
951 @subsection Default Coding Systems
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
952
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
953 This section describes variables that specify the default coding
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
954 system for certain files or when running certain subprograms, and the
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
955 function that I/O operations use to access them.
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
956
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
957 The idea of these variables is that you set them once and for all to the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
958 defaults you want, and then do not change them again. To specify a
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
959 particular coding system for a particular operation in a Lisp program,
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
960 don't change these variables; instead, override them using
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
961 @code{coding-system-for-read} and @code{coding-system-for-write}
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
962 (@pxref{Specifying Coding Systems}).
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
963
39204
8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document
Eli Zaretskii <eliz@gnu.org>
parents: 35752
diff changeset
964 @defvar auto-coding-regexp-alist
8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document
Eli Zaretskii <eliz@gnu.org>
parents: 35752
diff changeset
965 This variable is an alist of text patterns and corresponding coding
8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document
Eli Zaretskii <eliz@gnu.org>
parents: 35752
diff changeset
966 systems. Each element has the form @code{(@var{regexp}
8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document
Eli Zaretskii <eliz@gnu.org>
parents: 35752
diff changeset
967 . @var{coding-system})}; a file whose first few kilobytes match
8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document
Eli Zaretskii <eliz@gnu.org>
parents: 35752
diff changeset
968 @var{regexp} is decoded with @var{coding-system} when its contents are
8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document
Eli Zaretskii <eliz@gnu.org>
parents: 35752
diff changeset
969 read into a buffer. The settings in this alist take priority over
8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document
Eli Zaretskii <eliz@gnu.org>
parents: 35752
diff changeset
970 @code{coding:} tags in the files and the contents of
8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document
Eli Zaretskii <eliz@gnu.org>
parents: 35752
diff changeset
971 @code{file-coding-system-alist} (see below). The default value is set
8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document
Eli Zaretskii <eliz@gnu.org>
parents: 35752
diff changeset
972 so that Emacs automatically recognizes mail files in Babyl format and
8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document
Eli Zaretskii <eliz@gnu.org>
parents: 35752
diff changeset
973 reads them with no code conversions.
8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document
Eli Zaretskii <eliz@gnu.org>
parents: 35752
diff changeset
974 @end defvar
8f8df4d24f48 (User-Chosen Coding Systems) <select-safe-coding-system>: Document
Eli Zaretskii <eliz@gnu.org>
parents: 35752
diff changeset
975
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
976 @defvar file-coding-system-alist
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
977 This variable is an alist that specifies the coding systems to use for
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
978 reading and writing particular files. Each element has the form
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
979 @code{(@var{pattern} . @var{coding})}, where @var{pattern} is a regular
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
980 expression that matches certain file names. The element applies to file
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
981 names that match @var{pattern}.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
982
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
983 The @sc{cdr} of the element, @var{coding}, should be either a coding
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
984 system, a cons cell containing two coding systems, or a function name (a
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
985 symbol with a function definition). If @var{coding} is a coding system,
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
986 that coding system is used for both reading the file and writing it. If
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
987 @var{coding} is a cons cell containing two coding systems, its @sc{car}
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
988 specifies the coding system for decoding, and its @sc{cdr} specifies the
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
989 coding system for encoding.
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
990
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
991 If @var{coding} is a function name, the function should take one
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
992 argument, a list of all arguments passed to
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
993 @code{find-operation-coding-system}. It must return a coding system
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
994 or a cons cell containing two coding systems. This value has the same
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
995 meaning as described above.
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
996 @end defvar
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
997
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
998 @defvar process-coding-system-alist
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
999 This variable is an alist specifying which coding systems to use for a
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1000 subprocess, depending on which program is running in the subprocess. It
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1001 works like @code{file-coding-system-alist}, except that @var{pattern} is
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1002 matched against the program name used to start the subprocess. The coding
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1003 system or systems specified in this alist are used to initialize the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1004 coding systems used for I/O to the subprocess, but you can specify
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1005 other coding systems later using @code{set-process-coding-system}.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1006 @end defvar
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1007
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
1008 @strong{Warning:} Coding systems such as @code{undecided}, which
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
1009 determine the coding system from the data, do not work entirely reliably
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
1010 with asynchronous subprocess output. This is because Emacs handles
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1011 asynchronous subprocess output in batches, as it arrives. If the coding
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1012 system leaves the character code conversion unspecified, or leaves the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1013 end-of-line conversion unspecified, Emacs must try to detect the proper
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1014 conversion from one batch at a time, and this does not always work.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1015
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1016 Therefore, with an asynchronous subprocess, if at all possible, use a
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1017 coding system which determines both the character code conversion and
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1018 the end of line conversion---that is, one like @code{latin-1-unix},
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1019 rather than @code{undecided} or @code{latin-1}.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1020
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1021 @defvar network-coding-system-alist
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1022 This variable is an alist that specifies the coding system to use for
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1023 network streams. It works much like @code{file-coding-system-alist},
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1024 with the difference that the @var{pattern} in an element may be either a
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1025 port number or a regular expression. If it is a regular expression, it
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1026 is matched against the network service name used to open the network
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1027 stream.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1028 @end defvar
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1029
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1030 @defvar default-process-coding-system
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1031 This variable specifies the coding systems to use for subprocess (and
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1032 network stream) input and output, when nothing else specifies what to
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1033 do.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1034
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1035 The value should be a cons cell of the form @code{(@var{input-coding}
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1036 . @var{output-coding})}. Here @var{input-coding} applies to input from
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1037 the subprocess, and @var{output-coding} applies to output to it.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1038 @end defvar
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1039
51990
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1040 @defvar auto-coding-functions
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1041 This variable holds a list of functions that try to determine a
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1042 coding system for a file based on its undecoded contents.
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1043
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1044 Each function in this list should be written to look at text in the
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1045 current buffer, but should not modify it in any way. The buffer will
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1046 contain undecoded text of parts of the file. Each function should
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1047 take one argument, @var{size}, which tells it how many characters to
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1048 look at, starting from point. If the function succeeds in determining
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1049 a coding system for the file, it should return that coding system.
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1050 Otherwise, it should return @code{nil}.
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1051
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1052 If a file has a @samp{coding:} tag, that takes precedence, so these
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1053 functions won't be called.
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1054 @end defvar
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1055
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1056 @defun find-operation-coding-system operation &rest arguments
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1057 This function returns the coding system to use (by default) for
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1058 performing @var{operation} with @var{arguments}. The value has this
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1059 form:
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1060
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1061 @example
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
1062 (@var{decoding-system} . @var{encoding-system})
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1063 @end example
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1064
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1065 The first element, @var{decoding-system}, is the coding system to use
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1066 for decoding (in case @var{operation} does decoding), and
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1067 @var{encoding-system} is the coding system for encoding (in case
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1068 @var{operation} does encoding).
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1069
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
1070 The argument @var{operation} should be a symbol, one of
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1071 @code{insert-file-contents}, @code{write-region}, @code{call-process},
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1072 @code{call-process-region}, @code{start-process}, or
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
1073 @code{open-network-stream}. These are the names of the Emacs I/O primitives
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
1074 that can do coding system conversion.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1075
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1076 The remaining arguments should be the same arguments that might be given
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
1077 to that I/O primitive. Depending on the primitive, one of those
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1078 arguments is selected as the @dfn{target}. For example, if
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1079 @var{operation} does file I/O, whichever argument specifies the file
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1080 name is the target. For subprocess primitives, the process name is the
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1081 target. For @code{open-network-stream}, the target is the service name
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1082 or port number.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1083
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1084 This function looks up the target in @code{file-coding-system-alist},
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1085 @code{process-coding-system-alist}, or
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1086 @code{network-coding-system-alist}, depending on @var{operation}.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1087 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1088
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1089 @node Specifying Coding Systems
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1090 @subsection Specifying a Coding System for One Operation
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1091
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1092 You can specify the coding system for a specific operation by binding
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1093 the variables @code{coding-system-for-read} and/or
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1094 @code{coding-system-for-write}.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1095
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1096 @defvar coding-system-for-read
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1097 If this variable is non-@code{nil}, it specifies the coding system to
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1098 use for reading a file, or for input from a synchronous subprocess.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1099
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1100 It also applies to any asynchronous subprocess or network stream, but in
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1101 a different way: the value of @code{coding-system-for-read} when you
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1102 start the subprocess or open the network stream specifies the input
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1103 decoding method for that subprocess or network stream. It remains in
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1104 use for that subprocess or network stream unless and until overridden.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1105
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1106 The right way to use this variable is to bind it with @code{let} for a
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1107 specific I/O operation. Its global value is normally @code{nil}, and
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1108 you should not globally set it to any other value. Here is an example
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1109 of the right way to use the variable:
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1110
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1111 @example
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1112 ;; @r{Read the file with no character code conversion.}
52978
1a5c50faf357 Replace @sc{foo} with @acronym{FOO}.
Eli Zaretskii <eliz@gnu.org>
parents: 52788
diff changeset
1113 ;; @r{Assume @acronym{crlf} represents end-of-line.}
54036
9706b0221102 (Translation of Characters): Give examples of use.
Richard M. Stallman <rms@gnu.org>
parents: 53431
diff changeset
1114 (let ((coding-system-for-read 'emacs-mule-dos))
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1115 (insert-file-contents filename))
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1116 @end example
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1117
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1118 When its value is non-@code{nil}, @code{coding-system-for-read} takes
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1119 precedence over all other methods of specifying a coding system to use for
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1120 input, including @code{file-coding-system-alist},
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1121 @code{process-coding-system-alist} and
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1122 @code{network-coding-system-alist}.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1123 @end defvar
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1124
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1125 @defvar coding-system-for-write
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1126 This works much like @code{coding-system-for-read}, except that it
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1127 applies to output rather than input. It affects writing to files,
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
1128 as well as sending output to subprocesses and net connections.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1129
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1130 When a single operation does both input and output, as do
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1131 @code{call-process-region} and @code{start-process}, both
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1132 @code{coding-system-for-read} and @code{coding-system-for-write}
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1133 affect it.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1134 @end defvar
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1135
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1136 @defvar inhibit-eol-conversion
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1137 When this variable is non-@code{nil}, no end-of-line conversion is done,
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1138 no matter which coding system is specified. This applies to all the
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1139 Emacs I/O and subprocess primitives, and to the explicit encoding and
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1140 decoding functions (@pxref{Explicit Encoding}).
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1141 @end defvar
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1142
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1143 @node Explicit Encoding
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1144 @subsection Explicit Encoding and Decoding
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1145 @cindex encoding text
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1146 @cindex decoding text
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1147
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1148 All the operations that transfer text in and out of Emacs have the
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1149 ability to use a coding system to encode or decode the text.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1150 You can also explicitly encode and decode text using the functions
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1151 in this section.
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1152
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1153 The result of encoding, and the input to decoding, are not ordinary
28877
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
1154 text. They logically consist of a series of byte values; that is, a
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
1155 series of characters whose codes are in the range 0 through 255. In a
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
1156 multibyte buffer or string, character codes 128 through 159 are
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
1157 represented by multibyte sequences, but this is invisible to Lisp
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
1158 programs.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1159
28877
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
1160 The usual way to read a file into a buffer as a sequence of bytes, so
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
1161 you can decode the contents explicitly, is with
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
1162 @code{insert-file-contents-literally} (@pxref{Reading from Files});
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
1163 alternatively, specify a non-@code{nil} @var{rawfile} argument when
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
1164 visiting a file with @code{find-file-noselect}. These methods result in
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
1165 a unibyte buffer.
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
1166
28877
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
1167 The usual way to use the byte sequence that results from explicitly
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
1168 encoding text is to copy it to a file or process---for example, to write
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
1169 it with @code{write-region} (@pxref{Writing to Files}), and suppress
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
1170 encoding by binding @code{coding-system-for-write} to
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
1171 @code{no-conversion}.
24951
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
1172
7451b1458af1 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 23433
diff changeset
1173 Here are the functions to perform explicit encoding or decoding. The
28877
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
1174 decoding functions produce sequences of bytes; the encoding functions
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
1175 are meant to operate on sequences of bytes. All of these functions
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
1176 discard text properties.
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
1177
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
1178 @deffn Command encode-coding-region start end coding-system
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
1179 This command encodes the text from @var{start} to @var{end} according
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1180 to coding system @var{coding-system}. The encoded text replaces the
28877
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
1181 original text in the buffer. The result of encoding is logically a
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
1182 sequence of bytes, but the buffer remains multibyte if it was multibyte
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
1183 before.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1184
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
1185 This command returns the length of the encoded text.
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
1186 @end deffn
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
1187
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
1188 @defun encode-coding-string string coding-system &optional nocopy
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1189 This function encodes the text in @var{string} according to coding
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1190 system @var{coding-system}. It returns a new string containing the
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
1191 encoded text, except when @var{nocopy} is non-@code{nil}, in which
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
1192 case the function may return @var{string} itself if the encoding
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
1193 operation is trivial. The result of encoding is a unibyte string.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1194 @end defun
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1195
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
1196 @deffn Command decode-coding-region start end coding-system
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
1197 This command decodes the text from @var{start} to @var{end} according
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1198 to coding system @var{coding-system}. The decoded text replaces the
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1199 original text in the buffer. To make explicit decoding useful, the text
28877
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
1200 before decoding ought to be a sequence of byte values, but both
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
1201 multibyte and unibyte buffers are acceptable.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1202
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
1203 This command returns the length of the decoded text.
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
1204 @end deffn
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
1205
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
1206 @defun decode-coding-string string coding-system &optional nocopy
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1207 This function decodes the text in @var{string} according to coding
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1208 system @var{coding-system}. It returns a new string containing the
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
1209 decoded text, except when @var{nocopy} is non-@code{nil}, in which
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
1210 case the function may return @var{string} itself if the decoding
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
1211 operation is trivial. To make explicit decoding useful, the contents
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
1212 of @var{string} ought to be a sequence of byte values, but a multibyte
28877
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
1213 string is acceptable.
21006
00022857f529 Initial revision
Richard M. Stallman <rms@gnu.org>
parents:
diff changeset
1214 @end defun
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1215
51990
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1216 @defun decode-coding-inserted-region from to filename &optional visit beg end replace
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1217 This function decodes the text from @var{from} to @var{to} as if
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1218 it were being read from file @var{filename} using @code{insert-file-contents}
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1219 using the rest of the arguments provided.
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1220
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1221 The normal way to use this function is after reading text from a file
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1222 without decoding, if you decide you would rather have decoded it.
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1223 Instead of deleting the text and reading it again, this time with
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1224 decoding, you can call this function.
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1225 @end defun
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1226
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1227 @node Terminal I/O Encoding
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1228 @subsection Terminal I/O Encoding
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1229
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1230 Emacs can decode keyboard input using a coding system, and encode
23110
0d84817a4973 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22267
diff changeset
1231 terminal output. This is useful for terminals that transmit or display
0d84817a4973 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22267
diff changeset
1232 text using a particular encoding such as Latin-1. Emacs does not set
0d84817a4973 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22267
diff changeset
1233 @code{last-coding-system-used} for encoding or decoding for the
0d84817a4973 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22267
diff changeset
1234 terminal.
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1235
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1236 @defun keyboard-coding-system
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1237 This function returns the coding system that is in use for decoding
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1238 keyboard input---or @code{nil} if no coding system is to be used.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1239 @end defun
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1240
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
1241 @deffn Command set-keyboard-coding-system coding-system
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
1242 This command specifies @var{coding-system} as the coding system to
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1243 use for decoding keyboard input. If @var{coding-system} is @code{nil},
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1244 that means do not decode keyboard input.
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
1245 @end deffn
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1246
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1247 @defun terminal-coding-system
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1248 This function returns the coding system that is in use for encoding
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1249 terminal output---or @code{nil} for no encoding.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1250 @end defun
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1251
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
1252 @deffn Command set-terminal-coding-system coding-system
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
1253 This command specifies @var{coding-system} as the coding system to use
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1254 for encoding terminal output. If @var{coding-system} is @code{nil},
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1255 that means do not encode terminal output.
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
1256 @end deffn
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1257
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1258 @node MS-DOS File Types
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1259 @subsection MS-DOS File Types
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1260 @cindex DOS file types
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1261 @cindex MS-DOS file types
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1262 @cindex Windows file types
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1263 @cindex file types on MS-DOS and Windows
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1264 @cindex text files and binary files
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1265 @cindex binary files and text files
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1266
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
1267 On MS-DOS and Microsoft Windows, Emacs guesses the appropriate
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
1268 end-of-line conversion for a file by looking at the file's name. This
28877
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
1269 feature classifies files as @dfn{text files} and @dfn{binary files}. By
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
1270 ``binary file'' we mean a file of literal byte values that are not
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
1271 necessarily meant to be characters; Emacs does no end-of-line conversion
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
1272 and no character code conversion for them. On the other hand, the bytes
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
1273 in a text file are intended to represent characters; when you create a
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
1274 new file whose name implies that it is a text file, Emacs uses DOS
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
1275 end-of-line conversion.
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1276
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1277 @defvar buffer-file-type
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1278 This variable, automatically buffer-local in each buffer, records the
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1279 file type of the buffer's visited file. When a buffer does not specify
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1280 a coding system with @code{buffer-file-coding-system}, this variable is
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1281 used to determine which coding system to use when writing the contents
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1282 of the buffer. It should be @code{nil} for text, @code{t} for binary.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1283 If it is @code{t}, the coding system is @code{no-conversion}.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1284 Otherwise, @code{undecided-dos} is used.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1285
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1286 Normally this variable is set by visiting a file; it is set to
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1287 @code{nil} if the file was visited without any actual conversion.
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1288 @end defvar
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1289
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1290 @defopt file-name-buffer-file-type-alist
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1291 This variable holds an alist for recognizing text and binary files.
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1292 Each element has the form (@var{regexp} . @var{type}), where
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1293 @var{regexp} is matched against the file name, and @var{type} may be
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1294 @code{nil} for text, @code{t} for binary, or a function to call to
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1295 compute which. If it is a function, then it is called with a single
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1296 argument (the file name) and should return @code{t} or @code{nil}.
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1297
25751
467b88fab665 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 24952
diff changeset
1298 When running on MS-DOS or MS-Windows, Emacs checks this alist to decide
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1299 which coding system to use when reading a file. For a text file,
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1300 @code{undecided-dos} is used. For a binary file, @code{no-conversion}
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1301 is used.
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1302
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1303 If no element in this alist matches a given file name, then
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1304 @code{default-buffer-file-type} says how to treat the file.
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1305 @end defopt
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1306
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1307 @defopt default-buffer-file-type
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1308 This variable says how to handle files for which
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1309 @code{file-name-buffer-file-type-alist} says nothing about the type.
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1310
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1311 If this variable is non-@code{nil}, then these files are treated as
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1312 binary: the coding system @code{no-conversion} is used. Otherwise,
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1313 nothing special is done for them---the coding system is deduced solely
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1314 from the file contents, in the usual Emacs fashion.
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1315 @end defopt
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1316
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1317 @node Input Methods
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1318 @section Input Methods
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1319 @cindex input methods
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1320
52978
1a5c50faf357 Replace @sc{foo} with @acronym{FOO}.
Eli Zaretskii <eliz@gnu.org>
parents: 52788
diff changeset
1321 @dfn{Input methods} provide convenient ways of entering non-@acronym{ASCII}
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1322 characters from the keyboard. Unlike coding systems, which translate
52978
1a5c50faf357 Replace @sc{foo} with @acronym{FOO}.
Eli Zaretskii <eliz@gnu.org>
parents: 52788
diff changeset
1323 non-@acronym{ASCII} characters to and from encodings meant to be read by
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1324 programs, input methods provide human-friendly commands. (@xref{Input
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1325 Methods,,, emacs, The GNU Emacs Manual}, for information on how users
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1326 use input methods to enter text.) How to define input methods is not
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1327 yet documented in this manual, but here we describe how to use them.
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1328
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1329 Each input method has a name, which is currently a string;
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1330 in the future, symbols may also be usable as input method names.
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1331
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1332 @defvar current-input-method
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1333 This variable holds the name of the input method now active in the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1334 current buffer. (It automatically becomes local in each buffer when set
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1335 in any fashion.) It is @code{nil} if no input method is active in the
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1336 buffer now.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1337 @end defvar
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1338
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
1339 @defopt default-input-method
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1340 This variable holds the default input method for commands that choose an
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1341 input method. Unlike @code{current-input-method}, this variable is
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1342 normally global.
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
1343 @end defopt
21682
90da2489c498 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21006
diff changeset
1344
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
1345 @deffn Command set-input-method input-method
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
1346 This command activates input method @var{input-method} for the current
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1347 buffer. It also sets @code{default-input-method} to @var{input-method}.
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
1348 If @var{input-method} is @code{nil}, this command deactivates any input
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1349 method for the current buffer.
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
1350 @end deffn
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1351
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1352 @defun read-input-method-name prompt &optional default inhibit-null
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1353 This function reads an input method name with the minibuffer, prompting
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1354 with @var{prompt}. If @var{default} is non-@code{nil}, that is returned
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1355 by default, if the user enters empty input. However, if
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1356 @var{inhibit-null} is non-@code{nil}, empty input signals an error.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1357
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1358 The returned value is a string.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1359 @end defun
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1360
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1361 @defvar input-method-alist
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1362 This variable defines all the supported input methods.
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1363 Each element defines one input method, and should have the form:
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1364
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1365 @example
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
1366 (@var{input-method} @var{language-env} @var{activate-func}
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
1367 @var{title} @var{description} @var{args}...)
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1368 @end example
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1369
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
1370 Here @var{input-method} is the input method name, a string;
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
1371 @var{language-env} is another string, the name of the language
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
1372 environment this input method is recommended for. (That serves only for
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
1373 documentation purposes.)
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1374
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1375 @var{activate-func} is a function to call to activate this method. The
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1376 @var{args}, if any, are passed as arguments to @var{activate-func}. All
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1377 told, the arguments to @var{activate-func} are @var{input-method} and
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1378 the @var{args}.
28877
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
1379
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
1380 @var{title} is a string to display in the mode line while this method is
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
1381 active. @var{description} is a string describing this method and what
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
1382 it is good for.
22252
40089afa2b1d *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22138
diff changeset
1383 @end defvar
22138
d4ac295a98b3 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 21682
diff changeset
1384
23110
0d84817a4973 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 22267
diff changeset
1385 The fundamental interface to input methods is through the
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
1386 variable @code{input-method-function}. @xref{Reading One Event},
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
1387 and @ref{Invoking the Input Method}.
26696
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1388
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1389 @node Locales
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1390 @section Locales
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1391 @cindex locale
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1392
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1393 POSIX defines a concept of ``locales'' which control which language
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1394 to use in language-related features. These Emacs variables control
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1395 how Emacs interacts with these features.
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1396
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1397 @defvar locale-coding-system
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1398 @tindex locale-coding-system
43634
f55024232f5d (Locales): locale-coding-system is used for decoding keyboard input on X.
Eli Zaretskii <eliz@gnu.org>
parents: 43632
diff changeset
1399 @cindex keyboard input decoding on X
26696
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1400 This variable specifies the coding system to use for decoding system
43634
f55024232f5d (Locales): locale-coding-system is used for decoding keyboard input on X.
Eli Zaretskii <eliz@gnu.org>
parents: 43632
diff changeset
1401 error messages and---on X Window system only---keyboard input, for
f55024232f5d (Locales): locale-coding-system is used for decoding keyboard input on X.
Eli Zaretskii <eliz@gnu.org>
parents: 43632
diff changeset
1402 encoding the format argument to @code{format-time-string}, and for
f55024232f5d (Locales): locale-coding-system is used for decoding keyboard input on X.
Eli Zaretskii <eliz@gnu.org>
parents: 43632
diff changeset
1403 decoding the return value of @code{format-time-string}.
26696
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1404 @end defvar
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1405
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1406 @defvar system-messages-locale
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1407 @tindex system-messages-locale
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1408 This variable specifies the locale to use for generating system error
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1409 messages. Changing the locale can cause messages to come out in a
27362
ce0641caaa76 *** empty log message ***
Richard M. Stallman <rms@gnu.org>
parents: 27189
diff changeset
1410 different language or in a different orthography. If the variable is
26696
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1411 @code{nil}, the locale is specified by environment variables in the
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1412 usual POSIX fashion.
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1413 @end defvar
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1414
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1415 @defvar system-time-locale
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1416 @tindex system-time-locale
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1417 This variable specifies the locale to use for formatting time values.
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1418 Changing the locale can cause messages to appear according to the
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1419 conventions of a different language. If the variable is @code{nil}, the
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1420 locale is specified by environment variables in the usual POSIX fashion.
ef5e7bbe6f19 Current version from /gd/gnu/elisp.
Dave Love <fx@gnu.org>
parents: 25751
diff changeset
1421 @end defvar
28877
607e317d50b5 *** empty log message ***
Gerd Moellmann <gerd@gnu.org>
parents: 28635
diff changeset
1422
51990
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1423 @defun locale-info item
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1424 This function returns locale data @var{item} for the current POSIX
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1425 locale, if available. @var{item} should be one of these symbols:
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1426
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1427 @table @code
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1428 @item codeset
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1429 Return the character set as a string (locale item @code{CODESET}).
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1430
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1431 @item days
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1432 Return a 7-element vector of day names (locale items
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1433 @code{DAY_1} through @code{DAY_7});
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1434
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1435 @item months
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1436 Return a 12-element vector of month names (locale items @code{MON_1}
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1437 through @code{MON_12}).
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1438
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1439 @item paper
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1440 Return a list @code{(@var{width} @var{height})} for the default paper
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
1441 size measured in millimeters (locale items @code{PAPER_WIDTH} and
51990
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1442 @code{PAPER_HEIGHT}).
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1443 @end table
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1444
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1445 If the system can't provide the requested information, or if
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1446 @var{item} is not one of those symbols, the value is @code{nil}. All
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1447 strings in the return value are decoded using
53291
04d2bf306bd2 Various small changes in addition to the following.
Luc Teirlinck <teirllm@auburn.edu>
parents: 52978
diff changeset
1448 @code{locale-coding-system}. @xref{Locales,,, libc, The GNU Libc Manual},
51990
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1449 for more information about locales and locale items.
3e55792cc7f1 (Converting Representations): Add string-to-multibyte.
Richard M. Stallman <rms@gnu.org>
parents: 51703
diff changeset
1450 @end defun
52401
695cf19ef79e Add arch taglines
Miles Bader <miles@gnu.org>
parents: 51990
diff changeset
1451
695cf19ef79e Add arch taglines
Miles Bader <miles@gnu.org>
parents: 51990
diff changeset
1452 @ignore
695cf19ef79e Add arch taglines
Miles Bader <miles@gnu.org>
parents: 51990
diff changeset
1453 arch-tag: be705bf8-941b-4c35-84fc-ad7d20ddb7cb
695cf19ef79e Add arch taglines
Miles Bader <miles@gnu.org>
parents: 51990
diff changeset
1454 @end ignore