Mercurial > emacs
annotate man/mule.texi @ 38212:6b14cc47a4f2
Major rewrite. Sections Tags, Emerge, Change Log and Authors
moved to maintaining.texi. Some sections reordered.
Node Misc for Programs moved to just before the language-specific sections.
New node Defuns contains an intro plus the old
Defuns node (now renamed Moving by Defuns)
as well as Imenu, Which Function, and a node
Left Margin Paren to explain the convention about this.
New node Parentheses now documents M-x check-parens.
It contains subnodes Expressions, Moving by Parens, and Matching.
Expressions and Moving by Parens contain the material
formerly in Lists and List Commands, but divided up differently.
The section Balanced Editing has been deleted.
Most of the C indentation customization (all except c-set-style),
has been replaced with a reference to the C Modes manual.
Documentation now is divided into three subsections.
Some rewrites in the Program Indent section about
C-u TAB and C-M-q.
author | Richard M. Stallman <rms@gnu.org> |
---|---|
date | Tue, 26 Jun 2001 13:43:32 +0000 |
parents | 4eaf5126c0e5 |
children | 6bee7ffac2cd |
rev | line source |
---|---|
25829 | 1 @c This is part of the Emacs manual. |
37766
9be4cab94990
Add something for auto-coding-regexp-alist.
Gerd Moellmann <gerd@gnu.org>
parents:
37630
diff
changeset
|
2 @c Copyright (C) 1997, 1999, 2000, 2001 Free Software Foundation, Inc. |
25829 | 3 @c See file emacs.texi for copying conditions. |
4 @node International, Major Modes, Frames, Top | |
5 @chapter International Character Set Support | |
6 @cindex MULE | |
7 @cindex international scripts | |
8 @cindex multibyte characters | |
9 @cindex encoding of characters | |
10 | |
31067
3f11714b9e14
Update the list of supported language environments.
Eli Zaretskii <eliz@gnu.org>
parents:
31023
diff
changeset
|
11 @cindex Celtic |
25829 | 12 @cindex Chinese |
26140
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
13 @cindex Cyrillic |
31067
3f11714b9e14
Update the list of supported language environments.
Eli Zaretskii <eliz@gnu.org>
parents:
31023
diff
changeset
|
14 @cindex Czech |
25829 | 15 @cindex Devanagari |
16 @cindex Hindi | |
17 @cindex Marathi | |
26140
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
18 @cindex Ethiopic |
31067
3f11714b9e14
Update the list of supported language environments.
Eli Zaretskii <eliz@gnu.org>
parents:
31023
diff
changeset
|
19 @cindex German |
25829 | 20 @cindex Greek |
26140
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
21 @cindex Hebrew |
25829 | 22 @cindex IPA |
23 @cindex Japanese | |
24 @cindex Korean | |
25 @cindex Lao | |
31067
3f11714b9e14
Update the list of supported language environments.
Eli Zaretskii <eliz@gnu.org>
parents:
31023
diff
changeset
|
26 @cindex Latin |
3f11714b9e14
Update the list of supported language environments.
Eli Zaretskii <eliz@gnu.org>
parents:
31023
diff
changeset
|
27 @cindex Polish |
3f11714b9e14
Update the list of supported language environments.
Eli Zaretskii <eliz@gnu.org>
parents:
31023
diff
changeset
|
28 @cindex Romanian |
3f11714b9e14
Update the list of supported language environments.
Eli Zaretskii <eliz@gnu.org>
parents:
31023
diff
changeset
|
29 @cindex Slovak |
3f11714b9e14
Update the list of supported language environments.
Eli Zaretskii <eliz@gnu.org>
parents:
31023
diff
changeset
|
30 @cindex Slovenian |
25829 | 31 @cindex Thai |
32 @cindex Tibetan | |
31067
3f11714b9e14
Update the list of supported language environments.
Eli Zaretskii <eliz@gnu.org>
parents:
31023
diff
changeset
|
33 @cindex Turkish |
25829 | 34 @cindex Vietnamese |
35163 | 35 @cindex Dutch |
36 @cindex Spanish | |
25829 | 37 Emacs supports a wide variety of international character sets, |
38 including European variants of the Latin alphabet, as well as Chinese, | |
26140
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
39 Cyrillic, Devanagari (Hindi and Marathi), Ethiopic, Greek, Hebrew, IPA, |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
40 Japanese, Korean, Lao, Thai, Tibetan, and Vietnamese scripts. These features |
25829 | 41 have been merged from the modified version of Emacs known as MULE (for |
42 ``MULti-lingual Enhancement to GNU Emacs'') | |
43 | |
32386
d65f9772ee72
Mention the cpNNNN coding systems, with an xref to msdog.texi.
Eli Zaretskii <eliz@gnu.org>
parents:
32275
diff
changeset
|
44 Emacs also supports various encodings of these characters used by |
36170
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
45 other internationalized software, such as word processors and mailers. |
32386
d65f9772ee72
Mention the cpNNNN coding systems, with an xref to msdog.texi.
Eli Zaretskii <eliz@gnu.org>
parents:
32275
diff
changeset
|
46 |
37584
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
47 Emacs allows editing text with international characters by supporting |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
48 all the related activities: |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
49 |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
50 @itemize @bullet |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
51 @item |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
52 You can visit files with non-ASCII characters, save non-ASCII text, and |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
53 pass non-ASCII text between Emacs and programs it invokes (such as |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
54 compilers, spell-checkers, and mailers). Setting your language |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
55 environment (@pxref{Language Environments}) takes care of setting up the |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
56 coding systems and other options for a specific language or culture. |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
57 Alternatively, you can specify how Emacs should encode or decode text |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
58 for each command; see @ref{Specify Coding}. |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
59 |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
60 @item |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
61 You can display non-ASCII characters encoded by the various scripts. |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
62 This works by using appropriate fonts on X and similar graphics |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
63 displays (@pxref{Defining Fontsets}), and by sending special codes to |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
64 text-only displays (@pxref{Specify Coding}). If some characters are |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
65 displayed incorrectly, refer to @ref{Undisplayable Characters}, which |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
66 describes possible problems and explains how to solve them. |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
67 |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
68 @item |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
69 You can insert non-ASCII characters or search for them. To do that, |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
70 you can specify an input method (@pxref{Select Input Method}) suitable |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
71 for your language, or use the default input method set up when you set |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
72 your language environment. (Emacs input methods are part of the Leim |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
73 package, which must be installed for you to be able to use them.) If |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
74 your keyboard can produce non-ASCII characters, you can select an |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
75 appropriate keyboard coding system (@pxref{Specify Coding}), and Emacs |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
76 will accept those characters. Latin-1 characters can also be input by |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
77 using the @kbd{C-x 8} prefix, see @ref{Single-Byte Character Support, |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
78 C-x 8}. |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
79 @end itemize |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
80 |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
81 The rest of this chapter describes these issues in detail. |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
82 |
25829 | 83 @menu |
37865
dcd99cd5b789
Change "International Intro" -> "International Chars".
Eli Zaretskii <eliz@gnu.org>
parents:
37766
diff
changeset
|
84 * International Chars:: Basic concepts of multibyte characters. |
25829 | 85 * Enabling Multibyte:: Controlling whether to use multibyte characters. |
86 * Language Environments:: Setting things up for the language you use. | |
87 * Input Methods:: Entering text characters not on your keyboard. | |
88 * Select Input Method:: Specifying your choice of input methods. | |
89 * Multibyte Conversion:: How single-byte characters convert to multibyte. | |
90 * Coding Systems:: Character set conversion when you read and | |
91 write files, and so on. | |
92 * Recognize Coding:: How Emacs figures out which conversion to use. | |
93 * Specify Coding:: Various ways to choose which conversion to use. | |
94 * Fontsets:: Fontsets are collections of fonts | |
95 that cover the whole spectrum of characters. | |
96 * Defining Fontsets:: Defining a new fontset. | |
33745
78ec4a7ba765
(Undisplayable Characters): New node.
Dave Love <fx@gnu.org>
parents:
32386
diff
changeset
|
97 * Undisplayable Characters:: When characters don't display. |
27211
0699f691fac1
Don't conflate single-byte with European.
Dave Love <fx@gnu.org>
parents:
27156
diff
changeset
|
98 * Single-Byte Character Support:: |
25829 | 99 You can pick one European character set |
100 to use without multibyte characters. | |
101 @end menu | |
102 | |
37865
dcd99cd5b789
Change "International Intro" -> "International Chars".
Eli Zaretskii <eliz@gnu.org>
parents:
37766
diff
changeset
|
103 @node International Chars |
25829 | 104 @section Introduction to International Character Sets |
105 | |
31023
b72f53ea6c54
Document list-charset-chars.
Eli Zaretskii <eliz@gnu.org>
parents:
31021
diff
changeset
|
106 The users of international character sets and scripts have established |
b72f53ea6c54
Document list-charset-chars.
Eli Zaretskii <eliz@gnu.org>
parents:
31021
diff
changeset
|
107 many more-or-less standard coding systems for storing files. Emacs |
b72f53ea6c54
Document list-charset-chars.
Eli Zaretskii <eliz@gnu.org>
parents:
31021
diff
changeset
|
108 internally uses a single multibyte character encoding, so that it can |
b72f53ea6c54
Document list-charset-chars.
Eli Zaretskii <eliz@gnu.org>
parents:
31021
diff
changeset
|
109 intermix characters from all these scripts in a single buffer or string. |
b72f53ea6c54
Document list-charset-chars.
Eli Zaretskii <eliz@gnu.org>
parents:
31021
diff
changeset
|
110 This encoding represents each non-ASCII character as a sequence of bytes |
b72f53ea6c54
Document list-charset-chars.
Eli Zaretskii <eliz@gnu.org>
parents:
31021
diff
changeset
|
111 in the range 0200 through 0377. Emacs translates between the multibyte |
b72f53ea6c54
Document list-charset-chars.
Eli Zaretskii <eliz@gnu.org>
parents:
31021
diff
changeset
|
112 character encoding and various other coding systems when reading and |
b72f53ea6c54
Document list-charset-chars.
Eli Zaretskii <eliz@gnu.org>
parents:
31021
diff
changeset
|
113 writing files, when exchanging data with subprocesses, and (in some |
b72f53ea6c54
Document list-charset-chars.
Eli Zaretskii <eliz@gnu.org>
parents:
31021
diff
changeset
|
114 cases) in the @kbd{C-q} command (@pxref{Multibyte Conversion}). |
25829 | 115 |
116 @kindex C-h h | |
117 @findex view-hello-file | |
35206 | 118 @cindex undisplayable characters |
36170
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
119 @cindex @samp{?} in display |
25829 | 120 The command @kbd{C-h h} (@code{view-hello-file}) displays the file |
121 @file{etc/HELLO}, which shows how to say ``hello'' in many languages. | |
36170
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
122 This illustrates various scripts. If some characters can't be |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
123 displayed on your terminal, they appear as @samp{?} or as hollow boxes |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
124 (@pxref{Undisplayable Characters}). |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
125 |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
126 Keyboards, even in the countries where these character sets are used, |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
127 generally don't have keys for all the characters in them. So Emacs |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
128 supports various @dfn{input methods}, typically one for each script or |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
129 language, to make it convenient to type them. |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
130 |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
131 @kindex C-x RET |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
132 The prefix key @kbd{C-x @key{RET}} is used for commands that pertain |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
133 to multibyte characters, coding systems, and input methods. |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
134 |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
135 @ignore |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
136 @c This is commented out because it doesn't fit here, or anywhere. |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
137 @c This manual does not discuss "character sets" as they |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
138 @c are used in Mule, and it makes no sense to mention these commands |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
139 @c except as part of a larger discussion of the topic. |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
140 @c But it is not clear that topic is worth mentioning here, |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
141 @c since that is more of an implementation concept |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
142 @c than a user-level concept. And when we switch to Unicode, |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
143 @c character sets in the current sense may not even exist. |
25829 | 144 |
31023
b72f53ea6c54
Document list-charset-chars.
Eli Zaretskii <eliz@gnu.org>
parents:
31021
diff
changeset
|
145 @findex list-charset-chars |
b72f53ea6c54
Document list-charset-chars.
Eli Zaretskii <eliz@gnu.org>
parents:
31021
diff
changeset
|
146 @cindex characters in a certain charset |
b72f53ea6c54
Document list-charset-chars.
Eli Zaretskii <eliz@gnu.org>
parents:
31021
diff
changeset
|
147 The command @kbd{M-x list-charset-chars} prompts for a name of a |
b72f53ea6c54
Document list-charset-chars.
Eli Zaretskii <eliz@gnu.org>
parents:
31021
diff
changeset
|
148 character set, and displays all the characters in that character set. |
b72f53ea6c54
Document list-charset-chars.
Eli Zaretskii <eliz@gnu.org>
parents:
31021
diff
changeset
|
149 |
31277
f7a933475f89
(International Intro): document describe-character-set.
Eli Zaretskii <eliz@gnu.org>
parents:
31270
diff
changeset
|
150 @findex describe-character-set |
f7a933475f89
(International Intro): document describe-character-set.
Eli Zaretskii <eliz@gnu.org>
parents:
31270
diff
changeset
|
151 @cindex character set, description |
f7a933475f89
(International Intro): document describe-character-set.
Eli Zaretskii <eliz@gnu.org>
parents:
31270
diff
changeset
|
152 The command @kbd{M-x describe-character-set} prompts for a character |
f7a933475f89
(International Intro): document describe-character-set.
Eli Zaretskii <eliz@gnu.org>
parents:
31270
diff
changeset
|
153 set name and displays information about that character set, including |
f7a933475f89
(International Intro): document describe-character-set.
Eli Zaretskii <eliz@gnu.org>
parents:
31270
diff
changeset
|
154 its internal representation within Emacs. |
36170
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
155 @end ignore |
25829 | 156 |
157 @node Enabling Multibyte | |
158 @section Enabling Multibyte Characters | |
159 | |
37584
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
160 @cindex turn multibyte support on or off |
25829 | 161 You can enable or disable multibyte character support, either for |
162 Emacs as a whole, or for a single buffer. When multibyte characters are | |
163 disabled in a buffer, then each byte in that buffer represents a | |
164 character, even codes 0200 through 0377. The old features for | |
165 supporting the European character sets, ISO Latin-1 and ISO Latin-2, | |
166 work as they did in Emacs 19 and also work for the other ISO 8859 | |
167 character sets. | |
168 | |
169 However, there is no need to turn off multibyte character support to | |
170 use ISO Latin; the Emacs multibyte character set includes all the | |
171 characters in these character sets, and Emacs can translate | |
172 automatically to and from the ISO codes. | |
173 | |
37584
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
174 By default, Emacs starts in multibyte mode, because that allows you to |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
175 use all the supported languages and scripts without limitations. |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
176 |
25829 | 177 To edit a particular file in unibyte representation, visit it using |
178 @code{find-file-literally}. @xref{Visiting}. To convert a buffer in | |
179 multibyte representation into a single-byte representation of the same | |
180 characters, the easiest way is to save the contents in a file, kill the | |
181 buffer, and find the file again with @code{find-file-literally}. You | |
182 can also use @kbd{C-x @key{RET} c} | |
183 (@code{universal-coding-system-argument}) and specify @samp{raw-text} as | |
184 the coding system with which to find or save a file. @xref{Specify | |
185 Coding}. Finding a file as @samp{raw-text} doesn't disable format | |
186 conversion, uncompression and auto mode selection as | |
187 @code{find-file-literally} does. | |
188 | |
189 @vindex enable-multibyte-characters | |
190 @vindex default-enable-multibyte-characters | |
191 To turn off multibyte character support by default, start Emacs with | |
192 the @samp{--unibyte} option (@pxref{Initial Options}), or set the | |
29107 | 193 environment variable @env{EMACS_UNIBYTE}. You can also customize |
25829 | 194 @code{enable-multibyte-characters} or, equivalently, directly set the |
37584
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
195 variable @code{default-enable-multibyte-characters} to @code{nil} in |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
196 your init file to have basically the same effect as @samp{--unibyte}. |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
197 |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
198 @findex toggle-enable-multibyte-characters |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
199 To convert a unibyte session to a multibyte session, set |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
200 @code{default-enable-multibyte-characters} to @code{t}. Buffers which |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
201 were created in the unibyte session before you turn on multibyte support |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
202 will stay unibyte. You can turn on multibyte support in a specific |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
203 buffer by invoking the command @code{toggle-enable-multibyte-characters} |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
204 in that buffer. |
25829 | 205 |
31141
a7c55d999688
Expand the explanation about Lisp files being loaded as multibyte.
Eli Zaretskii <eliz@gnu.org>
parents:
31077
diff
changeset
|
206 @cindex Lisp files, and multibyte operation |
a7c55d999688
Expand the explanation about Lisp files being loaded as multibyte.
Eli Zaretskii <eliz@gnu.org>
parents:
31077
diff
changeset
|
207 @cindex multibyte operation, and Lisp files |
a7c55d999688
Expand the explanation about Lisp files being loaded as multibyte.
Eli Zaretskii <eliz@gnu.org>
parents:
31077
diff
changeset
|
208 @cindex unibyte operation, and Lisp files |
a7c55d999688
Expand the explanation about Lisp files being loaded as multibyte.
Eli Zaretskii <eliz@gnu.org>
parents:
31077
diff
changeset
|
209 @cindex init file, and non-ASCII characters |
a7c55d999688
Expand the explanation about Lisp files being loaded as multibyte.
Eli Zaretskii <eliz@gnu.org>
parents:
31077
diff
changeset
|
210 @cindex environment variables, and non-ASCII characters |
36170
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
211 With @samp{--unibyte}, multibyte strings are not created during |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
212 initialization from the values of environment variables, |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
213 @file{/etc/passwd} entries etc.@: that contain non-ASCII 8-bit |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
214 characters. |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
215 |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
216 Emacs normally loads Lisp files as multibyte, regardless of whether |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
217 you used @samp{--unibyte}. This includes the Emacs initialization |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
218 file, @file{.emacs}, and the initialization files of Emacs packages |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
219 such as Gnus. However, you can specify unibyte loading for a |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
220 particular Lisp file, by putting @samp{-*-unibyte: t;-*-} in a comment |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
221 on the first line. Then that file is always loaded as unibyte text, |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
222 even if you did not start Emacs with @samp{--unibyte}. The motivation |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
223 for these conventions is that it is more reliable to always load any |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
224 particular Lisp file in the same way. However, you can load a Lisp |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
225 file as unibyte, on any one occasion, by typing @kbd{C-x @key{RET} c |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
226 raw-text @key{RET}} immediately before loading it. |
25829 | 227 |
228 The mode line indicates whether multibyte character support is enabled | |
229 in the current buffer. If it is, there are two or more characters (most | |
230 often two dashes) before the colon near the beginning of the mode line. | |
231 When multibyte characters are not enabled, just one dash precedes the | |
232 colon. | |
233 | |
234 @node Language Environments | |
235 @section Language Environments | |
236 @cindex language environments | |
237 | |
238 All supported character sets are supported in Emacs buffers whenever | |
239 multibyte characters are enabled; there is no need to select a | |
240 particular language in order to display its characters in an Emacs | |
241 buffer. However, it is important to select a @dfn{language environment} | |
242 in order to set various defaults. The language environment really | |
243 represents a choice of preferred script (more or less) rather than a | |
244 choice of language. | |
245 | |
246 The language environment controls which coding systems to recognize | |
247 when reading text (@pxref{Recognize Coding}). This applies to files, | |
248 incoming mail, netnews, and any other text you read into Emacs. It may | |
249 also specify the default coding system to use when you create a file. | |
250 Each language environment also specifies a default input method. | |
251 | |
252 @findex set-language-environment | |
26140
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
253 @vindex current-language-environment |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
254 To select a language environment, customize the option |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
255 @code{current-language-environment} or use the command @kbd{M-x |
25829 | 256 set-language-environment}. It makes no difference which buffer is |
257 current when you use this command, because the effects apply globally to | |
258 the Emacs session. The supported language environments include: | |
259 | |
33745
78ec4a7ba765
(Undisplayable Characters): New node.
Dave Love <fx@gnu.org>
parents:
32386
diff
changeset
|
260 @cindex Euro sign |
25829 | 261 @quotation |
26140
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
262 Chinese-BIG5, Chinese-CNS, Chinese-GB, Cyrillic-ALT, Cyrillic-ISO, |
37870 | 263 Cyrillic-KOI8, Czech, Devanagari, Dutch, English, Ethiopic, German, |
264 Greek, Hebrew, IPA, Japanese, Korean, Lao, Latin-1, Latin-2, Latin-3, | |
265 Latin-4, Latin-5, Latin-8 (Celtic), Latin-9 (updated Latin-1, with the | |
266 Euro sign), Polish, Romanian, Slovak, Slovenian, Spanish, Thai, | |
267 Tibetan, Turkish, and Vietnamese. | |
25829 | 268 @end quotation |
269 | |
36170
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
270 @cindex fonts for various scripts |
37019
1deafff9fd1f
(Language Environments): Explain how to update the X
Eli Zaretskii <eliz@gnu.org>
parents:
36875
diff
changeset
|
271 @cindex Intlfonts package, installation |
36170
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
272 To display the script(s) used by your language environment on a |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
273 graphical display, you need to have a suitable font. If some of the |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
274 characters appear as empty boxes, you should install the GNU Intlfonts |
37019
1deafff9fd1f
(Language Environments): Explain how to update the X
Eli Zaretskii <eliz@gnu.org>
parents:
36875
diff
changeset
|
275 package, which includes fonts for all supported scripts.@footnote{If |
1deafff9fd1f
(Language Environments): Explain how to update the X
Eli Zaretskii <eliz@gnu.org>
parents:
36875
diff
changeset
|
276 you run Emacs on X, you need to inform the X server about the location |
1deafff9fd1f
(Language Environments): Explain how to update the X
Eli Zaretskii <eliz@gnu.org>
parents:
36875
diff
changeset
|
277 of the newly installed fonts with the following commands: |
1deafff9fd1f
(Language Environments): Explain how to update the X
Eli Zaretskii <eliz@gnu.org>
parents:
36875
diff
changeset
|
278 |
1deafff9fd1f
(Language Environments): Explain how to update the X
Eli Zaretskii <eliz@gnu.org>
parents:
36875
diff
changeset
|
279 @example |
1deafff9fd1f
(Language Environments): Explain how to update the X
Eli Zaretskii <eliz@gnu.org>
parents:
36875
diff
changeset
|
280 xset fp+ /usr/local/share/emacs/fonts |
1deafff9fd1f
(Language Environments): Explain how to update the X
Eli Zaretskii <eliz@gnu.org>
parents:
36875
diff
changeset
|
281 xset fp rehash |
1deafff9fd1f
(Language Environments): Explain how to update the X
Eli Zaretskii <eliz@gnu.org>
parents:
36875
diff
changeset
|
282 @end example |
1deafff9fd1f
(Language Environments): Explain how to update the X
Eli Zaretskii <eliz@gnu.org>
parents:
36875
diff
changeset
|
283 } |
36170
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
284 @xref{Fontsets}, for more details about setting up your fonts. |
32275
30abf11e1b8e
(Language Environments): Mention the requirement to have a suitable font
Eli Zaretskii <eliz@gnu.org>
parents:
31280
diff
changeset
|
285 |
26140
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
286 @findex set-locale-environment |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
287 @vindex locale-language-names |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
288 @vindex locale-charset-language-names |
33745
78ec4a7ba765
(Undisplayable Characters): New node.
Dave Love <fx@gnu.org>
parents:
32386
diff
changeset
|
289 @cindex locales |
37086
07200bf360ab
(Language Environments): Fix the description of locale settings during
Eli Zaretskii <eliz@gnu.org>
parents:
37081
diff
changeset
|
290 Some operating systems let you specify the character-set locale you |
07200bf360ab
(Language Environments): Fix the description of locale settings during
Eli Zaretskii <eliz@gnu.org>
parents:
37081
diff
changeset
|
291 are using by setting the locale environment variables @env{LC_ALL}, |
07200bf360ab
(Language Environments): Fix the description of locale settings during
Eli Zaretskii <eliz@gnu.org>
parents:
37081
diff
changeset
|
292 @env{LC_CTYPE}, or @env{LANG}.@footnote{If more than one of these is |
07200bf360ab
(Language Environments): Fix the description of locale settings during
Eli Zaretskii <eliz@gnu.org>
parents:
37081
diff
changeset
|
293 set, the first one that is nonempty specifies your locale for this |
07200bf360ab
(Language Environments): Fix the description of locale settings during
Eli Zaretskii <eliz@gnu.org>
parents:
37081
diff
changeset
|
294 purpose.} During startup, Emacs looks up your character-set locale's |
07200bf360ab
(Language Environments): Fix the description of locale settings during
Eli Zaretskii <eliz@gnu.org>
parents:
37081
diff
changeset
|
295 name in the system locale alias table, matches its canonical name |
07200bf360ab
(Language Environments): Fix the description of locale settings during
Eli Zaretskii <eliz@gnu.org>
parents:
37081
diff
changeset
|
296 against entries in the value of the variables |
07200bf360ab
(Language Environments): Fix the description of locale settings during
Eli Zaretskii <eliz@gnu.org>
parents:
37081
diff
changeset
|
297 @code{locale-charset-language-names} and @code{locale-language-names}, |
07200bf360ab
(Language Environments): Fix the description of locale settings during
Eli Zaretskii <eliz@gnu.org>
parents:
37081
diff
changeset
|
298 and selects the corresponding language environment if a match is found. |
07200bf360ab
(Language Environments): Fix the description of locale settings during
Eli Zaretskii <eliz@gnu.org>
parents:
37081
diff
changeset
|
299 (The former variable overrides the latter.) It also adjusts the display |
07200bf360ab
(Language Environments): Fix the description of locale settings during
Eli Zaretskii <eliz@gnu.org>
parents:
37081
diff
changeset
|
300 table and terminal coding system, the locale coding system, and the |
07200bf360ab
(Language Environments): Fix the description of locale settings during
Eli Zaretskii <eliz@gnu.org>
parents:
37081
diff
changeset
|
301 preferred coding system as needed for the locale. |
26140
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
302 |
36170
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
303 If you modify the @env{LC_ALL}, @env{LC_CTYPE}, or @env{LANG} |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
304 environment variables while running Emacs, you may want to invoke the |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
305 @code{set-locale-environment} function afterwards to readjust the |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
306 language environment from the new locale. |
26513
949ca235ee9e
Describe the relationship between set-locale-environment and
Paul Eggert <eggert@twinsun.com>
parents:
26140
diff
changeset
|
307 |
26140
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
308 @vindex locale-preferred-coding-systems |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
309 The @code{set-locale-environment} function normally uses the preferred |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
310 coding system established by the language environment to decode system |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
311 messages. But if your locale matches an entry in the variable |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
312 @code{locale-preferred-coding-systems}, Emacs uses the corresponding |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
313 coding system instead. For example, if the locale @samp{ja_JP.PCK} |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
314 matches @code{japanese-shift-jis} in |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
315 @code{locale-preferred-coding-systems}, Emacs uses that encoding even |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
316 though it might normally use @code{japanese-iso-8bit}. |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
317 |
36170
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
318 You can override the language environment chosen at startup with |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
319 explicit use of the command @code{set-language-environment}, or with |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
320 customization of @code{current-language-environment} in your init |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
321 file. |
25829 | 322 |
323 @kindex C-h L | |
324 @findex describe-language-environment | |
325 To display information about the effects of a certain language | |
326 environment @var{lang-env}, use the command @kbd{C-h L @var{lang-env} | |
327 @key{RET}} (@code{describe-language-environment}). This tells you which | |
328 languages this language environment is useful for, and lists the | |
329 character sets, coding systems, and input methods that go with it. It | |
330 also shows some sample text to illustrate scripts used in this language | |
331 environment. By default, this command describes the chosen language | |
332 environment. | |
333 | |
334 @vindex set-language-environment-hook | |
335 You can customize any language environment with the normal hook | |
336 @code{set-language-environment-hook}. The command | |
337 @code{set-language-environment} runs that hook after setting up the new | |
338 language environment. The hook functions can test for a specific | |
339 language environment by checking the variable | |
37019
1deafff9fd1f
(Language Environments): Explain how to update the X
Eli Zaretskii <eliz@gnu.org>
parents:
36875
diff
changeset
|
340 @code{current-language-environment}. This hook is where you should |
1deafff9fd1f
(Language Environments): Explain how to update the X
Eli Zaretskii <eliz@gnu.org>
parents:
36875
diff
changeset
|
341 put non-default settings for specific language environment, such as |
1deafff9fd1f
(Language Environments): Explain how to update the X
Eli Zaretskii <eliz@gnu.org>
parents:
36875
diff
changeset
|
342 coding systems for keyboard input and terminal output, the default |
1deafff9fd1f
(Language Environments): Explain how to update the X
Eli Zaretskii <eliz@gnu.org>
parents:
36875
diff
changeset
|
343 input method, etc. |
25829 | 344 |
345 @vindex exit-language-environment-hook | |
346 Before it starts to set up the new language environment, | |
347 @code{set-language-environment} first runs the hook | |
348 @code{exit-language-environment-hook}. This hook is useful for undoing | |
349 customizations that were made with @code{set-language-environment-hook}. | |
350 For instance, if you set up a special key binding in a specific language | |
351 environment using @code{set-language-environment-hook}, you should set | |
352 up @code{exit-language-environment-hook} to restore the normal binding | |
353 for that key. | |
354 | |
355 @node Input Methods | |
356 @section Input Methods | |
357 | |
358 @cindex input methods | |
359 An @dfn{input method} is a kind of character conversion designed | |
360 specifically for interactive input. In Emacs, typically each language | |
361 has its own input method; sometimes several languages which use the same | |
362 characters can share one input method. A few languages support several | |
363 input methods. | |
364 | |
37019
1deafff9fd1f
(Language Environments): Explain how to update the X
Eli Zaretskii <eliz@gnu.org>
parents:
36875
diff
changeset
|
365 The simplest kind of input method works by mapping ASCII letters |
1deafff9fd1f
(Language Environments): Explain how to update the X
Eli Zaretskii <eliz@gnu.org>
parents:
36875
diff
changeset
|
366 into another alphabet; this allows you to type characters which your |
1deafff9fd1f
(Language Environments): Explain how to update the X
Eli Zaretskii <eliz@gnu.org>
parents:
36875
diff
changeset
|
367 keyboard doesn't support directly. This is how the Greek and Russian |
1deafff9fd1f
(Language Environments): Explain how to update the X
Eli Zaretskii <eliz@gnu.org>
parents:
36875
diff
changeset
|
368 input methods work. |
25829 | 369 |
370 A more powerful technique is composition: converting sequences of | |
371 characters into one letter. Many European input methods use composition | |
372 to produce a single non-ASCII letter from a sequence that consists of a | |
373 letter followed by accent characters (or vice versa). For example, some | |
374 methods convert the sequence @kbd{a'} into a single accented letter. | |
375 These input methods have no special commands of their own; all they do | |
376 is compose sequences of printing characters. | |
377 | |
378 The input methods for syllabic scripts typically use mapping followed | |
379 by composition. The input methods for Thai and Korean work this way. | |
380 First, letters are mapped into symbols for particular sounds or tone | |
381 marks; then, sequences of these which make up a whole syllable are | |
382 mapped into one syllable sign. | |
383 | |
384 Chinese and Japanese require more complex methods. In Chinese input | |
385 methods, first you enter the phonetic spelling of a Chinese word (in | |
386 input method @code{chinese-py}, among others), or a sequence of portions | |
387 of the character (input methods @code{chinese-4corner} and | |
388 @code{chinese-sw}, and others). Since one phonetic spelling typically | |
389 corresponds to many different Chinese characters, you must select one of | |
390 the alternatives using special Emacs commands. Keys such as @kbd{C-f}, | |
391 @kbd{C-b}, @kbd{C-n}, @kbd{C-p}, and digits have special definitions in | |
392 this situation, used for selecting among the alternatives. @key{TAB} | |
393 displays a buffer showing all the possibilities. | |
394 | |
395 In Japanese input methods, first you input a whole word using | |
396 phonetic spelling; then, after the word is in the buffer, Emacs converts | |
397 it into one or more characters using a large dictionary. One phonetic | |
398 spelling corresponds to many differently written Japanese words, so you | |
399 must select one of them; use @kbd{C-n} and @kbd{C-p} to cycle through | |
400 the alternatives. | |
401 | |
402 Sometimes it is useful to cut off input method processing so that the | |
403 characters you have just entered will not combine with subsequent | |
404 characters. For example, in input method @code{latin-1-postfix}, the | |
405 sequence @kbd{e '} combines to form an @samp{e} with an accent. What if | |
406 you want to enter them as separate characters? | |
407 | |
408 One way is to type the accent twice; that is a special feature for | |
409 entering the separate letter and accent. For example, @kbd{e ' '} gives | |
410 you the two characters @samp{e'}. Another way is to type another letter | |
411 after the @kbd{e}---something that won't combine with that---and | |
412 immediately delete it. For example, you could type @kbd{e e @key{DEL} | |
413 '} to get separate @samp{e} and @samp{'}. | |
414 | |
415 Another method, more general but not quite as easy to type, is to use | |
416 @kbd{C-\ C-\} between two characters to stop them from combining. This | |
417 is the command @kbd{C-\} (@code{toggle-input-method}) used twice. | |
418 @ifinfo | |
419 @xref{Select Input Method}. | |
420 @end ifinfo | |
421 | |
37019
1deafff9fd1f
(Language Environments): Explain how to update the X
Eli Zaretskii <eliz@gnu.org>
parents:
36875
diff
changeset
|
422 @cindex incremental search, input method interference |
25829 | 423 @kbd{C-\ C-\} is especially useful inside an incremental search, |
424 because it stops waiting for more characters to combine, and starts | |
425 searching for what you have already entered. | |
426 | |
427 @vindex input-method-verbose-flag | |
428 @vindex input-method-highlight-flag | |
429 The variables @code{input-method-highlight-flag} and | |
37870 | 430 @code{input-method-verbose-flag} control how input methods explain |
431 what is happening. If @code{input-method-highlight-flag} is | |
432 non-@code{nil}, the partial sequence is highlighted in the buffer (for | |
433 most input methods---some disable this feature). If | |
434 @code{input-method-verbose-flag} is non-@code{nil}, the list of | |
435 possible characters to type next is displayed in the echo area (but | |
436 not when you are in the minibuffer). | |
25829 | 437 |
31077 | 438 @cindex Leim package |
36170
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
439 Input methods are implemented in the separate Leim package: they are |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
440 available only if the system administrator used Leim when building |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
441 Emacs. If Emacs was built without Leim, you will find that no input |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
442 methods are defined. |
31077 | 443 |
25829 | 444 @node Select Input Method |
445 @section Selecting an Input Method | |
446 | |
447 @table @kbd | |
448 @item C-\ | |
449 Enable or disable use of the selected input method. | |
450 | |
451 @item C-x @key{RET} C-\ @var{method} @key{RET} | |
452 Select a new input method for the current buffer. | |
453 | |
454 @item C-h I @var{method} @key{RET} | |
455 @itemx C-h C-\ @var{method} @key{RET} | |
456 @findex describe-input-method | |
457 @kindex C-h I | |
458 @kindex C-h C-\ | |
459 Describe the input method @var{method} (@code{describe-input-method}). | |
31204 | 460 By default, it describes the current input method (if any). This |
461 description should give you the full details of how to use any | |
31270 | 462 particular input method. |
25829 | 463 |
464 @item M-x list-input-methods | |
465 Display a list of all the supported input methods. | |
466 @end table | |
467 | |
468 @findex set-input-method | |
469 @vindex current-input-method | |
470 @kindex C-x RET C-\ | |
471 To choose an input method for the current buffer, use @kbd{C-x | |
472 @key{RET} C-\} (@code{set-input-method}). This command reads the | |
473 input method name with the minibuffer; the name normally starts with the | |
474 language environment that it is meant to be used with. The variable | |
475 @code{current-input-method} records which input method is selected. | |
476 | |
477 @findex toggle-input-method | |
478 @kindex C-\ | |
479 Input methods use various sequences of ASCII characters to stand for | |
480 non-ASCII characters. Sometimes it is useful to turn off the input | |
481 method temporarily. To do this, type @kbd{C-\} | |
482 (@code{toggle-input-method}). To reenable the input method, type | |
483 @kbd{C-\} again. | |
484 | |
485 If you type @kbd{C-\} and you have not yet selected an input method, | |
486 it prompts for you to specify one. This has the same effect as using | |
487 @kbd{C-x @key{RET} C-\} to specify an input method. | |
488 | |
36850
e1167ad75cde
(Select Input Method): Document the behavior of toggle-input-method
Eli Zaretskii <eliz@gnu.org>
parents:
36334
diff
changeset
|
489 When invoked with a numeric argument, as in @kbd{C-u C-\}, |
e1167ad75cde
(Select Input Method): Document the behavior of toggle-input-method
Eli Zaretskii <eliz@gnu.org>
parents:
36334
diff
changeset
|
490 @code{toggle-input-method} always prompts you for an input method, |
e1167ad75cde
(Select Input Method): Document the behavior of toggle-input-method
Eli Zaretskii <eliz@gnu.org>
parents:
36334
diff
changeset
|
491 suggesting the most recently selected one as the default. |
e1167ad75cde
(Select Input Method): Document the behavior of toggle-input-method
Eli Zaretskii <eliz@gnu.org>
parents:
36334
diff
changeset
|
492 |
25829 | 493 @vindex default-input-method |
494 Selecting a language environment specifies a default input method for | |
495 use in various buffers. When you have a default input method, you can | |
496 select it in the current buffer by typing @kbd{C-\}. The variable | |
497 @code{default-input-method} specifies the default input method | |
498 (@code{nil} means there is none). | |
499 | |
37019
1deafff9fd1f
(Language Environments): Explain how to update the X
Eli Zaretskii <eliz@gnu.org>
parents:
36875
diff
changeset
|
500 In some language environments, which support several different input |
1deafff9fd1f
(Language Environments): Explain how to update the X
Eli Zaretskii <eliz@gnu.org>
parents:
36875
diff
changeset
|
501 methods, you might want to use an input method different from the |
1deafff9fd1f
(Language Environments): Explain how to update the X
Eli Zaretskii <eliz@gnu.org>
parents:
36875
diff
changeset
|
502 default chosen by @code{set-language-environment}. You can instruct |
1deafff9fd1f
(Language Environments): Explain how to update the X
Eli Zaretskii <eliz@gnu.org>
parents:
36875
diff
changeset
|
503 Emacs to select a different default input method for a certain |
37870 | 504 language environment, if you wish, by using |
37019
1deafff9fd1f
(Language Environments): Explain how to update the X
Eli Zaretskii <eliz@gnu.org>
parents:
36875
diff
changeset
|
505 @code{set-language-environment-hook} (@pxref{Language Environments, |
1deafff9fd1f
(Language Environments): Explain how to update the X
Eli Zaretskii <eliz@gnu.org>
parents:
36875
diff
changeset
|
506 set-language-environment-hook}). For example: |
1deafff9fd1f
(Language Environments): Explain how to update the X
Eli Zaretskii <eliz@gnu.org>
parents:
36875
diff
changeset
|
507 |
1deafff9fd1f
(Language Environments): Explain how to update the X
Eli Zaretskii <eliz@gnu.org>
parents:
36875
diff
changeset
|
508 @lisp |
1deafff9fd1f
(Language Environments): Explain how to update the X
Eli Zaretskii <eliz@gnu.org>
parents:
36875
diff
changeset
|
509 (defun my-chinese-setup () |
1deafff9fd1f
(Language Environments): Explain how to update the X
Eli Zaretskii <eliz@gnu.org>
parents:
36875
diff
changeset
|
510 "Set up my private Chinese environment." |
1deafff9fd1f
(Language Environments): Explain how to update the X
Eli Zaretskii <eliz@gnu.org>
parents:
36875
diff
changeset
|
511 (if (equal current-language-environment "Chinese-GB") |
1deafff9fd1f
(Language Environments): Explain how to update the X
Eli Zaretskii <eliz@gnu.org>
parents:
36875
diff
changeset
|
512 (setq default-input-method "chinese-tonepy"))) |
1deafff9fd1f
(Language Environments): Explain how to update the X
Eli Zaretskii <eliz@gnu.org>
parents:
36875
diff
changeset
|
513 (add-hook 'set-language-environment-hook 'my-chinese-setup) |
1deafff9fd1f
(Language Environments): Explain how to update the X
Eli Zaretskii <eliz@gnu.org>
parents:
36875
diff
changeset
|
514 @end lisp |
1deafff9fd1f
(Language Environments): Explain how to update the X
Eli Zaretskii <eliz@gnu.org>
parents:
36875
diff
changeset
|
515 |
1deafff9fd1f
(Language Environments): Explain how to update the X
Eli Zaretskii <eliz@gnu.org>
parents:
36875
diff
changeset
|
516 @noindent |
1deafff9fd1f
(Language Environments): Explain how to update the X
Eli Zaretskii <eliz@gnu.org>
parents:
36875
diff
changeset
|
517 This sets the default input method to be @code{chinese-tonepy} |
1deafff9fd1f
(Language Environments): Explain how to update the X
Eli Zaretskii <eliz@gnu.org>
parents:
36875
diff
changeset
|
518 whenever you choose a Chinese-GB language environment. |
1deafff9fd1f
(Language Environments): Explain how to update the X
Eli Zaretskii <eliz@gnu.org>
parents:
36875
diff
changeset
|
519 |
25829 | 520 @findex quail-set-keyboard-layout |
521 Some input methods for alphabetic scripts work by (in effect) | |
522 remapping the keyboard to emulate various keyboard layouts commonly used | |
523 for those scripts. How to do this remapping properly depends on your | |
524 actual keyboard layout. To specify which layout your keyboard has, use | |
525 the command @kbd{M-x quail-set-keyboard-layout}. | |
526 | |
527 @findex list-input-methods | |
528 To display a list of all the supported input methods, type @kbd{M-x | |
529 list-input-methods}. The list gives information about each input | |
530 method, including the string that stands for it in the mode line. | |
531 | |
532 @node Multibyte Conversion | |
533 @section Unibyte and Multibyte Non-ASCII characters | |
534 | |
535 When multibyte characters are enabled, character codes 0240 (octal) | |
536 through 0377 (octal) are not really legitimate in the buffer. The valid | |
537 non-ASCII printing characters have codes that start from 0400. | |
538 | |
36170
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
539 If you type a self-inserting character in the range 0240 through |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
540 0377, or if you use @kbd{C-q} to insert one, Emacs assumes you |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
541 intended to use one of the ISO Latin-@var{n} character sets, and |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
542 converts it to the Emacs code representing that Latin-@var{n} |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
543 character. You select @emph{which} ISO Latin character set to use |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
544 through your choice of language environment |
25829 | 545 @iftex |
546 (see above). | |
547 @end iftex | |
548 @ifinfo | |
549 (@pxref{Language Environments}). | |
550 @end ifinfo | |
551 If you do not specify a choice, the default is Latin-1. | |
552 | |
36170
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
553 If you insert a character in the range 0200 through 0237, which |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
554 forms the @code{eight-bit-control} character set, it is inserted |
33745
78ec4a7ba765
(Undisplayable Characters): New node.
Dave Love <fx@gnu.org>
parents:
32386
diff
changeset
|
555 literally. You should normally avoid doing this since buffers |
78ec4a7ba765
(Undisplayable Characters): New node.
Dave Love <fx@gnu.org>
parents:
32386
diff
changeset
|
556 containing such characters have to be written out in either the |
36170
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
557 @code{emacs-mule} or @code{raw-text} coding system, which is usually |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
558 not what you want. |
25829 | 559 |
560 @node Coding Systems | |
561 @section Coding Systems | |
562 @cindex coding systems | |
563 | |
564 Users of various languages have established many more-or-less standard | |
565 coding systems for representing them. Emacs does not use these coding | |
566 systems internally; instead, it converts from various coding systems to | |
567 its own system when reading data, and converts the internal coding | |
568 system to other coding systems when writing data. Conversion is | |
569 possible in reading or writing files, in sending or receiving from the | |
570 terminal, and in exchanging data with subprocesses. | |
571 | |
572 Emacs assigns a name to each coding system. Most coding systems are | |
573 used for one language, and the name of the coding system starts with the | |
574 language name. Some coding systems are used for several languages; | |
575 their names usually start with @samp{iso}. There are also special | |
576 coding systems @code{no-conversion}, @code{raw-text} and | |
577 @code{emacs-mule} which do not convert printing characters at all. | |
578 | |
37584
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
579 @cindex international files from DOS/Windows systems |
32386
d65f9772ee72
Mention the cpNNNN coding systems, with an xref to msdog.texi.
Eli Zaretskii <eliz@gnu.org>
parents:
32275
diff
changeset
|
580 A special class of coding systems, collectively known as |
d65f9772ee72
Mention the cpNNNN coding systems, with an xref to msdog.texi.
Eli Zaretskii <eliz@gnu.org>
parents:
32275
diff
changeset
|
581 @dfn{codepages}, is designed to support text encoded by MS-Windows and |
d65f9772ee72
Mention the cpNNNN coding systems, with an xref to msdog.texi.
Eli Zaretskii <eliz@gnu.org>
parents:
32275
diff
changeset
|
582 MS-DOS software. To use any of these systems, you need to create it |
37584
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
583 with @kbd{M-x codepage-setup}. @xref{MS-DOS and MULE}. After |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
584 creating the coding system for the codepage, you can use it as any |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
585 other coding system. For example, to visit a file encoded in codepage |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
586 850, type @kbd{C-x @key{RET} c cp850 @key{RET} C-x C-f @var{filename} |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
587 @key{RET}}. |
32386
d65f9772ee72
Mention the cpNNNN coding systems, with an xref to msdog.texi.
Eli Zaretskii <eliz@gnu.org>
parents:
32275
diff
changeset
|
588 |
25829 | 589 In addition to converting various representations of non-ASCII |
590 characters, a coding system can perform end-of-line conversion. Emacs | |
591 handles three different conventions for how to separate lines in a file: | |
592 newline, carriage-return linefeed, and just carriage-return. | |
593 | |
594 @table @kbd | |
595 @item C-h C @var{coding} @key{RET} | |
596 Describe coding system @var{coding}. | |
597 | |
598 @item C-h C @key{RET} | |
599 Describe the coding systems currently in use. | |
600 | |
601 @item M-x list-coding-systems | |
602 Display a list of all the supported coding systems. | |
603 @end table | |
604 | |
605 @kindex C-h C | |
606 @findex describe-coding-system | |
607 The command @kbd{C-h C} (@code{describe-coding-system}) displays | |
608 information about particular coding systems. You can specify a coding | |
609 system name as argument; alternatively, with an empty argument, it | |
610 describes the coding systems currently selected for various purposes, | |
611 both in the current buffer and as the defaults, and the priority list | |
612 for recognizing coding systems (@pxref{Recognize Coding}). | |
613 | |
614 @findex list-coding-systems | |
615 To display a list of all the supported coding systems, type @kbd{M-x | |
616 list-coding-systems}. The list gives information about each coding | |
617 system, including the letter that stands for it in the mode line | |
618 (@pxref{Mode Line}). | |
619 | |
620 @cindex end-of-line conversion | |
621 @cindex MS-DOS end-of-line conversion | |
622 @cindex Macintosh end-of-line conversion | |
623 Each of the coding systems that appear in this list---except for | |
624 @code{no-conversion}, which means no conversion of any kind---specifies | |
625 how and whether to convert printing characters, but leaves the choice of | |
626 end-of-line conversion to be decided based on the contents of each file. | |
627 For example, if the file appears to use the sequence carriage-return | |
628 linefeed to separate lines, DOS end-of-line conversion will be used. | |
629 | |
630 Each of the listed coding systems has three variants which specify | |
631 exactly what to do for end-of-line conversion: | |
632 | |
633 @table @code | |
634 @item @dots{}-unix | |
635 Don't do any end-of-line conversion; assume the file uses | |
636 newline to separate lines. (This is the convention normally used | |
637 on Unix and GNU systems.) | |
638 | |
639 @item @dots{}-dos | |
640 Assume the file uses carriage-return linefeed to separate lines, and do | |
641 the appropriate conversion. (This is the convention normally used on | |
36185 | 642 Microsoft systems.@footnote{It is also specified for MIME @samp{text/*} |
25829 | 643 bodies and in other network transport contexts. It is different |
644 from the SGML reference syntax record-start/record-end format which | |
645 Emacs doesn't support directly.}) | |
646 | |
647 @item @dots{}-mac | |
648 Assume the file uses carriage-return to separate lines, and do the | |
649 appropriate conversion. (This is the convention normally used on the | |
650 Macintosh system.) | |
651 @end table | |
652 | |
653 These variant coding systems are omitted from the | |
654 @code{list-coding-systems} display for brevity, since they are entirely | |
655 predictable. For example, the coding system @code{iso-latin-1} has | |
656 variants @code{iso-latin-1-unix}, @code{iso-latin-1-dos} and | |
657 @code{iso-latin-1-mac}. | |
658 | |
659 The coding system @code{raw-text} is good for a file which is mainly | |
660 ASCII text, but may contain byte values above 127 which are not meant to | |
661 encode non-ASCII characters. With @code{raw-text}, Emacs copies those | |
662 byte values unchanged, and sets @code{enable-multibyte-characters} to | |
663 @code{nil} in the current buffer so that they will be interpreted | |
664 properly. @code{raw-text} handles end-of-line conversion in the usual | |
665 way, based on the data encountered, and has the usual three variants to | |
666 specify the kind of end-of-line conversion to use. | |
667 | |
668 In contrast, the coding system @code{no-conversion} specifies no | |
669 character code conversion at all---none for non-ASCII byte values and | |
670 none for end of line. This is useful for reading or writing binary | |
671 files, tar files, and other files that must be examined verbatim. It, | |
672 too, sets @code{enable-multibyte-characters} to @code{nil}. | |
673 | |
674 The easiest way to edit a file with no conversion of any kind is with | |
675 the @kbd{M-x find-file-literally} command. This uses | |
676 @code{no-conversion}, and also suppresses other Emacs features that | |
677 might convert the file contents before you see them. @xref{Visiting}. | |
678 | |
679 The coding system @code{emacs-mule} means that the file contains | |
680 non-ASCII characters stored with the internal Emacs encoding. It | |
681 handles end-of-line conversion based on the data encountered, and has | |
682 the usual three variants to specify the kind of end-of-line conversion. | |
683 | |
684 @node Recognize Coding | |
685 @section Recognizing Coding Systems | |
686 | |
37584
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
687 Emacs tries to recognize which coding system to use for a given text |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
688 as an integral part of reading that text. (This applies to files |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
689 being read, output from subprocesses, text from X selections, etc.) |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
690 Emacs can select the right coding system automatically most of the |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
691 time---once you have specified your preferences. |
25829 | 692 |
693 Some coding systems can be recognized or distinguished by which byte | |
694 sequences appear in the data. However, there are coding systems that | |
695 cannot be distinguished, not even potentially. For example, there is no | |
696 way to distinguish between Latin-1 and Latin-2; they use the same byte | |
697 values with different meanings. | |
698 | |
699 Emacs handles this situation by means of a priority list of coding | |
700 systems. Whenever Emacs reads a file, if you do not specify the coding | |
701 system to use, Emacs checks the data against each coding system, | |
702 starting with the first in priority and working down the list, until it | |
703 finds a coding system that fits the data. Then it converts the file | |
704 contents assuming that they are represented in this coding system. | |
705 | |
706 The priority list of coding systems depends on the selected language | |
707 environment (@pxref{Language Environments}). For example, if you use | |
708 French, you probably want Emacs to prefer Latin-1 to Latin-2; if you use | |
709 Czech, you probably want Latin-2 to be preferred. This is one of the | |
710 reasons to specify a language environment. | |
711 | |
712 @findex prefer-coding-system | |
713 However, you can alter the priority list in detail with the command | |
714 @kbd{M-x prefer-coding-system}. This command reads the name of a coding | |
715 system from the minibuffer, and adds it to the front of the priority | |
716 list, so that it is preferred to all others. If you use this command | |
717 several times, each use adds one element to the front of the priority | |
718 list. | |
719 | |
720 If you use a coding system that specifies the end-of-line conversion | |
721 type, such as @code{iso-8859-1-dos}, what that means is that Emacs | |
722 should attempt to recognize @code{iso-8859-1} with priority, and should | |
723 use DOS end-of-line conversion in case it recognizes @code{iso-8859-1}. | |
724 | |
725 @vindex file-coding-system-alist | |
726 Sometimes a file name indicates which coding system to use for the | |
727 file. The variable @code{file-coding-system-alist} specifies this | |
728 correspondence. There is a special function | |
729 @code{modify-coding-system-alist} for adding elements to this list. For | |
730 example, to read and write all @samp{.txt} files using the coding system | |
731 @code{china-iso-8bit}, you can execute this Lisp expression: | |
732 | |
733 @smallexample | |
734 (modify-coding-system-alist 'file "\\.txt\\'" 'china-iso-8bit) | |
735 @end smallexample | |
736 | |
737 @noindent | |
738 The first argument should be @code{file}, the second argument should be | |
739 a regular expression that determines which files this applies to, and | |
740 the third argument says which coding system to use for these files. | |
741 | |
742 @vindex inhibit-eol-conversion | |
30375
5c4951d58989
(Recognize Coding): Document the variable inhibit-iso-escape-detection.
Eli Zaretskii <eliz@gnu.org>
parents:
29826
diff
changeset
|
743 @cindex DOS-style end-of-line display |
25829 | 744 Emacs recognizes which kind of end-of-line conversion to use based on |
745 the contents of the file: if it sees only carriage-returns, or only | |
746 carriage-return linefeed sequences, then it chooses the end-of-line | |
747 conversion accordingly. You can inhibit the automatic use of | |
748 end-of-line conversion by setting the variable @code{inhibit-eol-conversion} | |
37019
1deafff9fd1f
(Language Environments): Explain how to update the X
Eli Zaretskii <eliz@gnu.org>
parents:
36875
diff
changeset
|
749 to non-@code{nil}. If you do that, DOS-style files will be displayed |
1deafff9fd1f
(Language Environments): Explain how to update the X
Eli Zaretskii <eliz@gnu.org>
parents:
36875
diff
changeset
|
750 with the @samp{^M} characters visible in the buffer; some people |
1deafff9fd1f
(Language Environments): Explain how to update the X
Eli Zaretskii <eliz@gnu.org>
parents:
36875
diff
changeset
|
751 prefer this to the more subtle @samp{(DOS)} end-of-line type |
1deafff9fd1f
(Language Environments): Explain how to update the X
Eli Zaretskii <eliz@gnu.org>
parents:
36875
diff
changeset
|
752 indication near the left edge of the mode line (@pxref{Mode Line, |
37081 | 753 eol-mnemonic}). |
25829 | 754 |
30375
5c4951d58989
(Recognize Coding): Document the variable inhibit-iso-escape-detection.
Eli Zaretskii <eliz@gnu.org>
parents:
29826
diff
changeset
|
755 @vindex inhibit-iso-escape-detection |
5c4951d58989
(Recognize Coding): Document the variable inhibit-iso-escape-detection.
Eli Zaretskii <eliz@gnu.org>
parents:
29826
diff
changeset
|
756 @cindex escape sequences in files |
5c4951d58989
(Recognize Coding): Document the variable inhibit-iso-escape-detection.
Eli Zaretskii <eliz@gnu.org>
parents:
29826
diff
changeset
|
757 By default, the automatic detection of coding system is sensitive to |
5c4951d58989
(Recognize Coding): Document the variable inhibit-iso-escape-detection.
Eli Zaretskii <eliz@gnu.org>
parents:
29826
diff
changeset
|
758 escape sequences. If Emacs sees a sequence of characters that begin |
36170
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
759 with an escape character, and the sequence is valid as an ISO-2022 |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
760 code, that tells Emacs to use one of the ISO-2022 encodings to decode |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
761 the file. |
30375
5c4951d58989
(Recognize Coding): Document the variable inhibit-iso-escape-detection.
Eli Zaretskii <eliz@gnu.org>
parents:
29826
diff
changeset
|
762 |
36170
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
763 However, there may be cases that you want to read escape sequences |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
764 in a file as is. In such a case, you can set the variable |
30375
5c4951d58989
(Recognize Coding): Document the variable inhibit-iso-escape-detection.
Eli Zaretskii <eliz@gnu.org>
parents:
29826
diff
changeset
|
765 @code{inhibit-iso-escape-detection} to non-@code{nil}. Then the code |
36170
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
766 detection ignores any escape sequences, and never uses an ISO-2022 |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
767 encoding. The result is that all escape sequences become visible in |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
768 the buffer. |
30375
5c4951d58989
(Recognize Coding): Document the variable inhibit-iso-escape-detection.
Eli Zaretskii <eliz@gnu.org>
parents:
29826
diff
changeset
|
769 |
5c4951d58989
(Recognize Coding): Document the variable inhibit-iso-escape-detection.
Eli Zaretskii <eliz@gnu.org>
parents:
29826
diff
changeset
|
770 The default value of @code{inhibit-iso-escape-detection} is |
36170
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
771 @code{nil}. We recommend that you not change it permanently, only for |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
772 one specific operation. That's because many Emacs Lisp source files |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
773 that contain non-ASCII characters are encoded in the coding system |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
774 @code{iso-2022-7bit} in the Emacs distribution, and they won't be |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
775 decoded correctly when you visit those files if you suppress the |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
776 escape sequence detection. |
30375
5c4951d58989
(Recognize Coding): Document the variable inhibit-iso-escape-detection.
Eli Zaretskii <eliz@gnu.org>
parents:
29826
diff
changeset
|
777 |
25829 | 778 @vindex coding |
779 You can specify the coding system for a particular file using the | |
780 @samp{-*-@dots{}-*-} construct at the beginning of a file, or a local | |
781 variables list at the end (@pxref{File Variables}). You do this by | |
782 defining a value for the ``variable'' named @code{coding}. Emacs does | |
783 not really have a variable @code{coding}; instead of setting a variable, | |
784 it uses the specified coding system for the file. For example, | |
785 @samp{-*-mode: C; coding: latin-1;-*-} specifies use of the Latin-1 | |
786 coding system, as well as C mode. If you specify the coding explicitly | |
787 in the file, that overrides @code{file-coding-system-alist}. | |
788 | |
789 @vindex auto-coding-alist | |
37766
9be4cab94990
Add something for auto-coding-regexp-alist.
Gerd Moellmann <gerd@gnu.org>
parents:
37630
diff
changeset
|
790 @vindex auto-coding-regexp-alist |
9be4cab94990
Add something for auto-coding-regexp-alist.
Gerd Moellmann <gerd@gnu.org>
parents:
37630
diff
changeset
|
791 The variables @code{auto-coding-alist} and |
9be4cab94990
Add something for auto-coding-regexp-alist.
Gerd Moellmann <gerd@gnu.org>
parents:
37630
diff
changeset
|
792 @code{auto-coding-regexp-alist} are the strongest way to specify the |
9be4cab94990
Add something for auto-coding-regexp-alist.
Gerd Moellmann <gerd@gnu.org>
parents:
37630
diff
changeset
|
793 coding system for certain patterns of file names, or for files |
9be4cab94990
Add something for auto-coding-regexp-alist.
Gerd Moellmann <gerd@gnu.org>
parents:
37630
diff
changeset
|
794 containing certain patterns; these variables even override |
9be4cab94990
Add something for auto-coding-regexp-alist.
Gerd Moellmann <gerd@gnu.org>
parents:
37630
diff
changeset
|
795 @samp{-*-coding:-*-} tags in the file itself. Emacs uses |
38050
89031b4b9a28
Proofreading fixes from Tim Sanders <tim@timsanders.freeserve.co.uk>.
Eli Zaretskii <eliz@gnu.org>
parents:
37870
diff
changeset
|
796 @code{auto-coding-alist} for tar and archive files, to prevent it |
37766
9be4cab94990
Add something for auto-coding-regexp-alist.
Gerd Moellmann <gerd@gnu.org>
parents:
37630
diff
changeset
|
797 from being confused by a @samp{-*-coding:-*-} tag in a member of the |
9be4cab94990
Add something for auto-coding-regexp-alist.
Gerd Moellmann <gerd@gnu.org>
parents:
37630
diff
changeset
|
798 archive and thinking it applies to the archive file as a whole. |
9be4cab94990
Add something for auto-coding-regexp-alist.
Gerd Moellmann <gerd@gnu.org>
parents:
37630
diff
changeset
|
799 Likewise, Emacs uses @code{auto-coding-regexp-alist} to ensure that |
9be4cab94990
Add something for auto-coding-regexp-alist.
Gerd Moellmann <gerd@gnu.org>
parents:
37630
diff
changeset
|
800 RMAIL files, whose names in general don't match any particular pattern, |
9be4cab94990
Add something for auto-coding-regexp-alist.
Gerd Moellmann <gerd@gnu.org>
parents:
37630
diff
changeset
|
801 are decoded correctly. |
25829 | 802 |
37584
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
803 If Emacs recognizes the encoding of a file incorrectly, you can |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
804 reread the file using the correct coding system by typing @kbd{C-x |
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
805 @key{RET} c @var{coding-system} @key{RET} M-x revert-buffer |
38133 | 806 @key{RET}}. To see what coding system Emacs actually used to decode |
807 the file, look at the coding system mnemonic letter near the left edge | |
808 of the mode line (@pxref{Mode Line}), or type @kbd{C-h C @key{RET}}. | |
37584
9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
Eli Zaretskii <eliz@gnu.org>
parents:
37086
diff
changeset
|
809 |
25829 | 810 @vindex buffer-file-coding-system |
811 Once Emacs has chosen a coding system for a buffer, it stores that | |
812 coding system in @code{buffer-file-coding-system} and uses that coding | |
813 system, by default, for operations that write from this buffer into a | |
814 file. This includes the commands @code{save-buffer} and | |
815 @code{write-region}. If you want to write files from this buffer using | |
816 a different coding system, you can specify a different coding system for | |
817 the buffer using @code{set-buffer-file-coding-system} (@pxref{Specify | |
818 Coding}). | |
819 | |
36170
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
820 You can insert any possible character into any Emacs buffer, but |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
821 most coding systems can only handle some of the possible characters. |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
822 This means that you can insert characters that cannot be encoded with |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
823 the coding system that will be used to save the buffer. For example, |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
824 you could start with an ASCII file and insert a few Latin-1 characters |
36334
86322cde2e42
(Recognize Coding): Remove doubled `or'.
Gerd Moellmann <gerd@gnu.org>
parents:
36263
diff
changeset
|
825 into it, or you could edit a text file in Polish encoded in |
36170
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
826 @code{iso-8859-2} and add to it translations of several Polish words |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
827 into Russian. When you save the buffer, Emacs cannot use the current |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
828 value of @code{buffer-file-coding-system}, because the characters you |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
829 added cannot be encoded by that coding system. |
31021
5380bd6b450e
Document the way Emacs prompts for a safe coding system when the
Eli Zaretskii <eliz@gnu.org>
parents:
30375
diff
changeset
|
830 |
5380bd6b450e
Document the way Emacs prompts for a safe coding system when the
Eli Zaretskii <eliz@gnu.org>
parents:
30375
diff
changeset
|
831 When that happens, Emacs tries the most-preferred coding system (set |
5380bd6b450e
Document the way Emacs prompts for a safe coding system when the
Eli Zaretskii <eliz@gnu.org>
parents:
30375
diff
changeset
|
832 by @kbd{M-x prefer-coding-system} or @kbd{M-x |
36170
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
833 set-language-environment}), and if that coding system can safely |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
834 encode all of the characters in the buffer, Emacs uses it, and stores |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
835 its value in @code{buffer-file-coding-system}. Otherwise, Emacs |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
836 displays a list of coding systems suitable for encoding the buffer's |
38050
89031b4b9a28
Proofreading fixes from Tim Sanders <tim@timsanders.freeserve.co.uk>.
Eli Zaretskii <eliz@gnu.org>
parents:
37870
diff
changeset
|
837 contents, and asks you to choose one of those coding systems. |
31021
5380bd6b450e
Document the way Emacs prompts for a safe coding system when the
Eli Zaretskii <eliz@gnu.org>
parents:
30375
diff
changeset
|
838 |
36170
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
839 If you insert the unsuitable characters in a mail message, Emacs |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
840 behaves a bit differently. It additionally checks whether the |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
841 most-preferred coding system is recommended for use in MIME messages; |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
842 if it isn't, Emacs tells you that the most-preferred coding system is |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
843 not recommended and prompts you for another coding system. This is so |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
844 you won't inadvertently send a message encoded in a way that your |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
845 recipient's mail software will have difficulty decoding. (If you do |
38050
89031b4b9a28
Proofreading fixes from Tim Sanders <tim@timsanders.freeserve.co.uk>.
Eli Zaretskii <eliz@gnu.org>
parents:
37870
diff
changeset
|
846 want to use the most-preferred coding system, you can still type its |
38133 | 847 name in response to the question.) |
31021
5380bd6b450e
Document the way Emacs prompts for a safe coding system when the
Eli Zaretskii <eliz@gnu.org>
parents:
30375
diff
changeset
|
848 |
25829 | 849 @vindex sendmail-coding-system |
850 When you send a message with Mail mode (@pxref{Sending Mail}), Emacs has | |
851 four different ways to determine the coding system to use for encoding | |
852 the message text. It tries the buffer's own value of | |
853 @code{buffer-file-coding-system}, if that is non-@code{nil}. Otherwise, | |
854 it uses the value of @code{sendmail-coding-system}, if that is | |
855 non-@code{nil}. The third way is to use the default coding system for | |
856 new files, which is controlled by your choice of language environment, | |
857 if that is non-@code{nil}. If all of these three values are @code{nil}, | |
858 Emacs encodes outgoing mail using the Latin-1 coding system. | |
859 | |
860 @vindex rmail-decode-mime-charset | |
861 When you get new mail in Rmail, each message is translated | |
862 automatically from the coding system it is written in---as if it were a | |
863 separate file. This uses the priority list of coding systems that you | |
864 have specified. If a MIME message specifies a character set, Rmail | |
865 obeys that specification, unless @code{rmail-decode-mime-charset} is | |
866 @code{nil}. | |
867 | |
868 @vindex rmail-file-coding-system | |
869 For reading and saving Rmail files themselves, Emacs uses the coding | |
870 system specified by the variable @code{rmail-file-coding-system}. The | |
871 default value is @code{nil}, which means that Rmail files are not | |
872 translated (they are read and written in the Emacs internal character | |
873 code). | |
874 | |
875 @node Specify Coding | |
876 @section Specifying a Coding System | |
877 | |
878 In cases where Emacs does not automatically choose the right coding | |
879 system, you can use these commands to specify one: | |
880 | |
881 @table @kbd | |
882 @item C-x @key{RET} f @var{coding} @key{RET} | |
883 Use coding system @var{coding} for the visited file | |
884 in the current buffer. | |
885 | |
886 @item C-x @key{RET} c @var{coding} @key{RET} | |
887 Specify coding system @var{coding} for the immediately following | |
888 command. | |
889 | |
890 @item C-x @key{RET} k @var{coding} @key{RET} | |
891 Use coding system @var{coding} for keyboard input. | |
892 | |
893 @item C-x @key{RET} t @var{coding} @key{RET} | |
894 Use coding system @var{coding} for terminal output. | |
895 | |
896 @item C-x @key{RET} p @var{input-coding} @key{RET} @var{output-coding} @key{RET} | |
897 Use coding systems @var{input-coding} and @var{output-coding} for | |
898 subprocess input and output in the current buffer. | |
899 | |
900 @item C-x @key{RET} x @var{coding} @key{RET} | |
901 Use coding system @var{coding} for transferring selections to and from | |
902 other programs through the window system. | |
903 | |
904 @item C-x @key{RET} X @var{coding} @key{RET} | |
905 Use coding system @var{coding} for transferring @emph{one} | |
906 selection---the next one---to or from the window system. | |
907 @end table | |
908 | |
909 @kindex C-x RET f | |
910 @findex set-buffer-file-coding-system | |
911 The command @kbd{C-x @key{RET} f} (@code{set-buffer-file-coding-system}) | |
912 specifies the file coding system for the current buffer---in other | |
913 words, which coding system to use when saving or rereading the visited | |
914 file. You specify which coding system using the minibuffer. Since this | |
915 command applies to a file you have already visited, it affects only the | |
916 way the file is saved. | |
917 | |
918 @kindex C-x RET c | |
919 @findex universal-coding-system-argument | |
920 Another way to specify the coding system for a file is when you visit | |
921 the file. First use the command @kbd{C-x @key{RET} c} | |
922 (@code{universal-coding-system-argument}); this command uses the | |
923 minibuffer to read a coding system name. After you exit the minibuffer, | |
924 the specified coding system is used for @emph{the immediately following | |
925 command}. | |
926 | |
927 So if the immediately following command is @kbd{C-x C-f}, for example, | |
928 it reads the file using that coding system (and records the coding | |
929 system for when the file is saved). Or if the immediately following | |
930 command is @kbd{C-x C-w}, it writes the file using that coding system. | |
931 Other file commands affected by a specified coding system include | |
932 @kbd{C-x C-i} and @kbd{C-x C-v}, as well as the other-window variants of | |
933 @kbd{C-x C-f}. | |
934 | |
935 @kbd{C-x @key{RET} c} also affects commands that start subprocesses, | |
936 including @kbd{M-x shell} (@pxref{Shell}). | |
937 | |
938 However, if the immediately following command does not use the coding | |
939 system, then @kbd{C-x @key{RET} c} ultimately has no effect. | |
940 | |
941 An easy way to visit a file with no conversion is with the @kbd{M-x | |
942 find-file-literally} command. @xref{Visiting}. | |
943 | |
944 @vindex default-buffer-file-coding-system | |
945 The variable @code{default-buffer-file-coding-system} specifies the | |
946 choice of coding system to use when you create a new file. It applies | |
947 when you find a new file, and when you create a buffer and then save it | |
948 in a file. Selecting a language environment typically sets this | |
949 variable to a good choice of default coding system for that language | |
950 environment. | |
951 | |
952 @kindex C-x RET t | |
953 @findex set-terminal-coding-system | |
954 The command @kbd{C-x @key{RET} t} (@code{set-terminal-coding-system}) | |
955 specifies the coding system for terminal output. If you specify a | |
956 character code for terminal output, all characters output to the | |
957 terminal are translated into that coding system. | |
958 | |
959 This feature is useful for certain character-only terminals built to | |
960 support specific languages or character sets---for example, European | |
961 terminals that support one of the ISO Latin character sets. You need to | |
962 specify the terminal coding system when using multibyte text, so that | |
963 Emacs knows which characters the terminal can actually handle. | |
964 | |
965 By default, output to the terminal is not translated at all, unless | |
33745
78ec4a7ba765
(Undisplayable Characters): New node.
Dave Love <fx@gnu.org>
parents:
32386
diff
changeset
|
966 Emacs can deduce the proper coding system from your terminal type or |
78ec4a7ba765
(Undisplayable Characters): New node.
Dave Love <fx@gnu.org>
parents:
32386
diff
changeset
|
967 your locale specification (@pxref{Language Environments}). |
25829 | 968 |
969 @kindex C-x RET k | |
970 @findex set-keyboard-coding-system | |
34691 | 971 @vindex keyboard-coding-system |
25829 | 972 The command @kbd{C-x @key{RET} k} (@code{set-keyboard-coding-system}) |
34691 | 973 or the Custom option @code{keyboard-coding-system} |
25829 | 974 specifies the coding system for keyboard input. Character-code |
975 translation of keyboard input is useful for terminals with keys that | |
976 send non-ASCII graphic characters---for example, some terminals designed | |
977 for ISO Latin-1 or subsets of it. | |
978 | |
979 By default, keyboard input is not translated at all. | |
980 | |
981 There is a similarity between using a coding system translation for | |
982 keyboard input, and using an input method: both define sequences of | |
983 keyboard input that translate into single characters. However, input | |
984 methods are designed to be convenient for interactive use by humans, and | |
985 the sequences that are translated are typically sequences of ASCII | |
986 printing characters. Coding systems typically translate sequences of | |
987 non-graphic characters. | |
988 | |
989 @kindex C-x RET x | |
990 @kindex C-x RET X | |
991 @findex set-selection-coding-system | |
992 @findex set-next-selection-coding-system | |
993 The command @kbd{C-x @key{RET} x} (@code{set-selection-coding-system}) | |
994 specifies the coding system for sending selected text to the window | |
995 system, and for receiving the text of selections made in other | |
996 applications. This command applies to all subsequent selections, until | |
997 you override it by using the command again. The command @kbd{C-x | |
998 @key{RET} X} (@code{set-next-selection-coding-system}) specifies the | |
999 coding system for the next selection made in Emacs or read by Emacs. | |
1000 | |
1001 @kindex C-x RET p | |
1002 @findex set-buffer-process-coding-system | |
1003 The command @kbd{C-x @key{RET} p} (@code{set-buffer-process-coding-system}) | |
1004 specifies the coding system for input and output to a subprocess. This | |
1005 command applies to the current buffer; normally, each subprocess has its | |
1006 own buffer, and thus you can use this command to specify translation to | |
1007 and from a particular subprocess by giving the command in the | |
1008 corresponding buffer. | |
1009 | |
29826
05c0499d035a
(set-buffer-process-coding-system): Documentation fixed.
Kenichi Handa <handa@m17n.org>
parents:
29107
diff
changeset
|
1010 The default for translation of process input and output depends on the |
05c0499d035a
(set-buffer-process-coding-system): Documentation fixed.
Kenichi Handa <handa@m17n.org>
parents:
29107
diff
changeset
|
1011 current language environment. |
25829 | 1012 |
1013 @vindex file-name-coding-system | |
37019
1deafff9fd1f
(Language Environments): Explain how to update the X
Eli Zaretskii <eliz@gnu.org>
parents:
36875
diff
changeset
|
1014 @cindex file names with non-ASCII characters |
25829 | 1015 The variable @code{file-name-coding-system} specifies a coding system |
1016 to use for encoding file names. If you set the variable to a coding | |
1017 system name (as a Lisp symbol or a string), Emacs encodes file names | |
1018 using that coding system for all file operations. This makes it | |
1019 possible to use non-ASCII characters in file names---or, at least, those | |
1020 non-ASCII characters which the specified coding system can encode. | |
1021 | |
1022 If @code{file-name-coding-system} is @code{nil}, Emacs uses a default | |
1023 coding system determined by the selected language environment. In the | |
1024 default language environment, any non-ASCII characters in file names are | |
1025 not encoded specially; they appear in the file system using the internal | |
1026 Emacs representation. | |
1027 | |
1028 @strong{Warning:} if you change @code{file-name-coding-system} (or the | |
1029 language environment) in the middle of an Emacs session, problems can | |
1030 result if you have already visited files whose names were encoded using | |
1031 the earlier coding system and cannot be encoded (or are encoded | |
1032 differently) under the new coding system. If you try to save one of | |
1033 these buffers under the visited file name, saving may use the wrong file | |
1034 name, or it may get an error. If such a problem happens, use @kbd{C-x | |
1035 C-w} to specify a new file name for that buffer. | |
1036 | |
26140
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
1037 @vindex locale-coding-system |
36170
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
1038 The variable @code{locale-coding-system} specifies a coding system |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
1039 to use when encoding and decoding system strings such as system error |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
1040 messages and @code{format-time-string} formats and time stamps. You |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
1041 should choose a coding system that is compatible with the underlying |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
1042 system's text representation, which is normally specified by one of |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
1043 the environment variables @env{LC_ALL}, @env{LC_CTYPE}, and |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
1044 @env{LANG}. (The first one whose value is nonempty is the one that |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
1045 determines the text representation.) |
26140
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
1046 |
25829 | 1047 @node Fontsets |
1048 @section Fontsets | |
1049 @cindex fontsets | |
1050 | |
35188
94d46968a93f
Don't say "X Windows". From Colin Walters <walters@cis.ohio-state.edu>.
Eli Zaretskii <eliz@gnu.org>
parents:
35163
diff
changeset
|
1051 A font for X typically defines shapes for one alphabet or script. |
94d46968a93f
Don't say "X Windows". From Colin Walters <walters@cis.ohio-state.edu>.
Eli Zaretskii <eliz@gnu.org>
parents:
35163
diff
changeset
|
1052 Therefore, displaying the entire range of scripts that Emacs supports |
94d46968a93f
Don't say "X Windows". From Colin Walters <walters@cis.ohio-state.edu>.
Eli Zaretskii <eliz@gnu.org>
parents:
35163
diff
changeset
|
1053 requires a collection of many fonts. In Emacs, such a collection is |
94d46968a93f
Don't say "X Windows". From Colin Walters <walters@cis.ohio-state.edu>.
Eli Zaretskii <eliz@gnu.org>
parents:
35163
diff
changeset
|
1054 called a @dfn{fontset}. A fontset is defined by a list of fonts, each |
94d46968a93f
Don't say "X Windows". From Colin Walters <walters@cis.ohio-state.edu>.
Eli Zaretskii <eliz@gnu.org>
parents:
35163
diff
changeset
|
1055 assigned to handle a range of character codes. |
25829 | 1056 |
1057 Each fontset has a name, like a font. The available X fonts are | |
1058 defined by the X server; fontsets, however, are defined within Emacs | |
1059 itself. Once you have defined a fontset, you can use it within Emacs by | |
1060 specifying its name, anywhere that you could use a single font. Of | |
1061 course, Emacs fontsets can use only the fonts that the X server | |
1062 supports; if certain characters appear on the screen as hollow boxes, | |
1063 this means that the fontset in use for them has no font for those | |
36170
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
1064 characters.@footnote{The Emacs installation instructions have information on |
33745
78ec4a7ba765
(Undisplayable Characters): New node.
Dave Love <fx@gnu.org>
parents:
32386
diff
changeset
|
1065 additional font support.} |
25829 | 1066 |
1067 Emacs creates two fontsets automatically: the @dfn{standard fontset} | |
1068 and the @dfn{startup fontset}. The standard fontset is most likely to | |
1069 have fonts for a wide variety of non-ASCII characters; however, this is | |
1070 not the default for Emacs to use. (By default, Emacs tries to find a | |
1071 font which has bold and italic variants.) You can specify use of the | |
1072 standard fontset with the @samp{-fn} option, or with the @samp{Font} X | |
1073 resource (@pxref{Font X}). For example, | |
1074 | |
1075 @example | |
1076 emacs -fn fontset-standard | |
1077 @end example | |
1078 | |
1079 A fontset does not necessarily specify a font for every character | |
1080 code. If a fontset specifies no font for a certain character, or if it | |
1081 specifies a font that does not exist on your system, then it cannot | |
1082 display that character properly. It will display that character as an | |
1083 empty box instead. | |
1084 | |
1085 @vindex highlight-wrong-size-font | |
1086 The fontset height and width are determined by the ASCII characters | |
1087 (that is, by the font used for ASCII characters in that fontset). If | |
1088 another font in the fontset has a different height, or a different | |
1089 width, then characters assigned to that font are clipped to the | |
1090 fontset's size. If @code{highlight-wrong-size-font} is non-@code{nil}, | |
1091 a box is displayed around these wrong-size characters as well. | |
1092 | |
1093 @node Defining Fontsets | |
1094 @section Defining fontsets | |
1095 | |
1096 @vindex standard-fontset-spec | |
1097 @cindex standard fontset | |
1098 Emacs creates a standard fontset automatically according to the value | |
1099 of @code{standard-fontset-spec}. This fontset's name is | |
1100 | |
1101 @example | |
1102 -*-fixed-medium-r-normal-*-16-*-*-*-*-*-fontset-standard | |
1103 @end example | |
1104 | |
1105 @noindent | |
1106 or just @samp{fontset-standard} for short. | |
1107 | |
1108 Bold, italic, and bold-italic variants of the standard fontset are | |
1109 created automatically. Their names have @samp{bold} instead of | |
1110 @samp{medium}, or @samp{i} instead of @samp{r}, or both. | |
1111 | |
1112 @cindex startup fontset | |
1113 If you specify a default ASCII font with the @samp{Font} resource or | |
1114 the @samp{-fn} argument, Emacs generates a fontset from it | |
1115 automatically. This is the @dfn{startup fontset} and its name is | |
1116 @code{fontset-startup}. It does this by replacing the @var{foundry}, | |
1117 @var{family}, @var{add_style}, and @var{average_width} fields of the | |
1118 font name with @samp{*}, replacing @var{charset_registry} field with | |
1119 @samp{fontset}, and replacing @var{charset_encoding} field with | |
1120 @samp{startup}, then using the resulting string to specify a fontset. | |
1121 | |
1122 For instance, if you start Emacs this way, | |
1123 | |
1124 @example | |
1125 emacs -fn "*courier-medium-r-normal--14-140-*-iso8859-1" | |
1126 @end example | |
1127 | |
1128 @noindent | |
1129 Emacs generates the following fontset and uses it for the initial X | |
1130 window frame: | |
1131 | |
1132 @example | |
1133 -*-*-medium-r-normal-*-14-140-*-*-*-*-fontset-startup | |
1134 @end example | |
1135 | |
1136 With the X resource @samp{Emacs.Font}, you can specify a fontset name | |
1137 just like an actual font name. But be careful not to specify a fontset | |
1138 name in a wildcard resource like @samp{Emacs*Font}---that wildcard | |
1139 specification applies to various other purposes, such as menus, and | |
1140 menus cannot handle fontsets. | |
1141 | |
1142 You can specify additional fontsets using X resources named | |
1143 @samp{Fontset-@var{n}}, where @var{n} is an integer starting from 0. | |
1144 The resource value should have this form: | |
1145 | |
1146 @smallexample | |
1147 @var{fontpattern}, @r{[}@var{charsetname}:@var{fontname}@r{]@dots{}} | |
1148 @end smallexample | |
1149 | |
1150 @noindent | |
1151 @var{fontpattern} should have the form of a standard X font name, except | |
1152 for the last two fields. They should have the form | |
1153 @samp{fontset-@var{alias}}. | |
1154 | |
1155 The fontset has two names, one long and one short. The long name is | |
1156 @var{fontpattern}. The short name is @samp{fontset-@var{alias}}. You | |
1157 can refer to the fontset by either name. | |
1158 | |
1159 The construct @samp{@var{charset}:@var{font}} specifies which font to | |
1160 use (in this fontset) for one particular character set. Here, | |
1161 @var{charset} is the name of a character set, and @var{font} is the | |
1162 font to use for that character set. You can use this construct any | |
1163 number of times in defining one fontset. | |
1164 | |
1165 For the other character sets, Emacs chooses a font based on | |
1166 @var{fontpattern}. It replaces @samp{fontset-@var{alias}} with values | |
1167 that describe the character set. For the ASCII character font, | |
1168 @samp{fontset-@var{alias}} is replaced with @samp{ISO8859-1}. | |
1169 | |
1170 In addition, when several consecutive fields are wildcards, Emacs | |
1171 collapses them into a single wildcard. This is to prevent use of | |
1172 auto-scaled fonts. Fonts made by scaling larger fonts are not usable | |
1173 for editing, and scaling a smaller font is not useful because it is | |
1174 better to use the smaller font in its own size, which Emacs does. | |
1175 | |
1176 Thus if @var{fontpattern} is this, | |
1177 | |
1178 @example | |
1179 -*-fixed-medium-r-normal-*-24-*-*-*-*-*-fontset-24 | |
1180 @end example | |
1181 | |
1182 @noindent | |
1183 the font specification for ASCII characters would be this: | |
1184 | |
1185 @example | |
1186 -*-fixed-medium-r-normal-*-24-*-ISO8859-1 | |
1187 @end example | |
1188 | |
1189 @noindent | |
1190 and the font specification for Chinese GB2312 characters would be this: | |
1191 | |
1192 @example | |
1193 -*-fixed-medium-r-normal-*-24-*-gb2312*-* | |
1194 @end example | |
1195 | |
1196 You may not have any Chinese font matching the above font | |
1197 specification. Most X distributions include only Chinese fonts that | |
1198 have @samp{song ti} or @samp{fangsong ti} in @var{family} field. In | |
1199 such a case, @samp{Fontset-@var{n}} can be specified as below: | |
1200 | |
1201 @smallexample | |
1202 Emacs.Fontset-0: -*-fixed-medium-r-normal-*-24-*-*-*-*-*-fontset-24,\ | |
1203 chinese-gb2312:-*-*-medium-r-normal-*-24-*-gb2312*-* | |
1204 @end smallexample | |
1205 | |
1206 @noindent | |
1207 Then, the font specifications for all but Chinese GB2312 characters have | |
1208 @samp{fixed} in the @var{family} field, and the font specification for | |
1209 Chinese GB2312 characters has a wild card @samp{*} in the @var{family} | |
1210 field. | |
1211 | |
1212 @findex create-fontset-from-fontset-spec | |
1213 The function that processes the fontset resource value to create the | |
1214 fontset is called @code{create-fontset-from-fontset-spec}. You can also | |
1215 call this function explicitly to create a fontset. | |
1216 | |
1217 @xref{Font X}, for more information about font naming in X. | |
1218 | |
33745
78ec4a7ba765
(Undisplayable Characters): New node.
Dave Love <fx@gnu.org>
parents:
32386
diff
changeset
|
1219 @node Undisplayable Characters |
78ec4a7ba765
(Undisplayable Characters): New node.
Dave Love <fx@gnu.org>
parents:
32386
diff
changeset
|
1220 @section Undisplayable Characters |
78ec4a7ba765
(Undisplayable Characters): New node.
Dave Love <fx@gnu.org>
parents:
32386
diff
changeset
|
1221 |
36170
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
1222 Your terminal may be unable to display some non-@sc{ascii} |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
1223 characters. Most non-windowing terminals can only use a single |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
1224 character set (use the variable @code{default-terminal-coding-system} |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
1225 (@pxref{Specify Coding}) to tell Emacs which one); characters which |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
1226 can't be encoded in that coding system are displayed as @samp{?} by |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
1227 default. |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
1228 |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
1229 Windowing terminals can display a broader range of characters, but |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
1230 you may not have fonts installed for all of them; characters that have |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
1231 no font appear as a hollow box. |
33745
78ec4a7ba765
(Undisplayable Characters): New node.
Dave Love <fx@gnu.org>
parents:
32386
diff
changeset
|
1232 |
36170
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
1233 If you use Latin-1 characters but your terminal can't display |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
1234 Latin-1, you can arrange to display mnemonic @sc{ascii} sequences |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
1235 instead, e.g.@: @samp{"o} for o-umlaut. Load the library |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
1236 @file{iso-ascii} to do this. |
33745
78ec4a7ba765
(Undisplayable Characters): New node.
Dave Love <fx@gnu.org>
parents:
32386
diff
changeset
|
1237 |
36875 | 1238 @vindex latin1-display |
36170
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
1239 If your terminal can display Latin-1, you can display characters |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
1240 from other European character sets using a mixture of equivalent |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
1241 Latin-1 characters and @sc{ascii} mnemonics. Use the Custom option |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
1242 @code{latin1-display} to enable this. The mnemonic @sc{ascii} |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
1243 sequences mostly correspond to those of the prefix input methods. |
33745
78ec4a7ba765
(Undisplayable Characters): New node.
Dave Love <fx@gnu.org>
parents:
32386
diff
changeset
|
1244 |
27211
0699f691fac1
Don't conflate single-byte with European.
Dave Love <fx@gnu.org>
parents:
27156
diff
changeset
|
1245 @node Single-Byte Character Support |
0699f691fac1
Don't conflate single-byte with European.
Dave Love <fx@gnu.org>
parents:
27156
diff
changeset
|
1246 @section Single-byte Character Set Support |
25829 | 1247 |
1248 @cindex European character sets | |
1249 @cindex accented characters | |
1250 @cindex ISO Latin character sets | |
1251 @cindex Unibyte operation | |
1252 The ISO 8859 Latin-@var{n} character sets define character codes in | |
1253 the range 160 to 255 to handle the accented letters and punctuation | |
27211
0699f691fac1
Don't conflate single-byte with European.
Dave Love <fx@gnu.org>
parents:
27156
diff
changeset
|
1254 needed by various European languages (and some non-European ones). |
0699f691fac1
Don't conflate single-byte with European.
Dave Love <fx@gnu.org>
parents:
27156
diff
changeset
|
1255 If you disable multibyte |
25829 | 1256 characters, Emacs can still handle @emph{one} of these character codes |
1257 at a time. To specify @emph{which} of these codes to use, invoke | |
1258 @kbd{M-x set-language-environment} and specify a suitable language | |
1259 environment such as @samp{Latin-@var{n}}. | |
1260 | |
1261 For more information about unibyte operation, see @ref{Enabling | |
1262 Multibyte}. Note particularly that you probably want to ensure that | |
1263 your initialization files are read as unibyte if they contain non-ASCII | |
1264 characters. | |
1265 | |
1266 @vindex unibyte-display-via-language-environment | |
1267 Emacs can also display those characters, provided the terminal or font | |
1268 in use supports them. This works automatically. Alternatively, if you | |
1269 are using a window system, Emacs can also display single-byte characters | |
1270 through fontsets, in effect by displaying the equivalent multibyte | |
1271 characters according to the current language environment. To request | |
1272 this, set the variable @code{unibyte-display-via-language-environment} | |
1273 to a non-@code{nil} value. | |
1274 | |
1275 @cindex @code{iso-ascii} library | |
1276 If your terminal does not support display of the Latin-1 character | |
1277 set, Emacs can display these characters as ASCII sequences which at | |
1278 least give you a clear idea of what the characters are. To do this, | |
1279 load the library @code{iso-ascii}. Similar libraries for other | |
1280 Latin-@var{n} character sets could be implemented, but we don't have | |
1281 them yet. | |
1282 | |
1283 @findex standard-display-8bit | |
1284 @cindex 8-bit display | |
1285 Normally non-ISO-8859 characters (between characters 128 and 159 | |
1286 inclusive) are displayed as octal escapes. You can change this for | |
36185 | 1287 non-standard ``extended'' versions of ISO-8859 character sets by using the |
25829 | 1288 function @code{standard-display-8bit} in the @code{disp-table} library. |
1289 | |
28552 | 1290 There are several ways you can input single-byte non-ASCII |
25829 | 1291 characters: |
1292 | |
1293 @itemize @bullet | |
27211
0699f691fac1
Don't conflate single-byte with European.
Dave Love <fx@gnu.org>
parents:
27156
diff
changeset
|
1294 @cindex 8-bit input |
25829 | 1295 @item |
1296 If your keyboard can generate character codes 128 and up, representing | |
38050
89031b4b9a28
Proofreading fixes from Tim Sanders <tim@timsanders.freeserve.co.uk>.
Eli Zaretskii <eliz@gnu.org>
parents:
37870
diff
changeset
|
1297 non-ASCII characters, you can type those character codes directly. |
25829 | 1298 |
36170
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
1299 On a windowing terminal, you should not need to do anything special to |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
1300 use these keys; they should simply work. On a text-only terminal, you |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
1301 should use the command @code{M-x set-keyboard-coding-system} or the |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
1302 Custom option @code{keyboard-coding-system} to specify which coding |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
1303 system your keyboard uses (@pxref{Specify Coding}). Enabling this |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
1304 feature will probably require you to use @kbd{ESC} to type Meta |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
1305 characters; however, on a Linux console or in @code{xterm}, you can |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
1306 arrange for Meta to be converted to @kbd{ESC} and still be able type |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
1307 8-bit characters present directly on the keyboard or using |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
1308 @kbd{Compose} or @kbd{AltGr} keys. @xref{User Input}. |
27211
0699f691fac1
Don't conflate single-byte with European.
Dave Love <fx@gnu.org>
parents:
27156
diff
changeset
|
1309 |
25829 | 1310 @item |
1311 You can use an input method for the selected language environment. | |
1312 @xref{Input Methods}. When you use an input method in a unibyte buffer, | |
1313 the non-ASCII character you specify with it is converted to unibyte. | |
1314 | |
1315 @kindex C-x 8 | |
1316 @cindex @code{iso-transl} library | |
31077 | 1317 @cindex compose character |
1318 @cindex dead character | |
25829 | 1319 @item |
1320 For Latin-1 only, you can use the | |
1321 key @kbd{C-x 8} as a ``compose character'' prefix for entry of | |
1322 non-ASCII Latin-1 printing characters. @kbd{C-x 8} is good for | |
1323 insertion (in the minibuffer as well as other buffers), for searching, | |
1324 and in any other context where a key sequence is allowed. | |
1325 | |
1326 @kbd{C-x 8} works by loading the @code{iso-transl} library. Once that | |
1327 library is loaded, the @key{ALT} modifier key, if you have one, serves | |
1328 the same purpose as @kbd{C-x 8}; use @key{ALT} together with an accent | |
1329 character to modify the following letter. In addition, if you have keys | |
36170
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
1330 for the Latin-1 ``dead accent characters,'' they too are defined to |
25829 | 1331 compose with the following character, once @code{iso-transl} is loaded. |
28552 | 1332 Use @kbd{C-x 8 C-h} to list the available translations as mnemonic |
1333 command names. | |
1334 | |
31077 | 1335 @item |
28552 | 1336 @cindex @code{iso-acc} library |
31077 | 1337 @cindex ISO Accents mode |
1338 @findex iso-accents-mode | |
31280
55ce1d116cc7
(Single-Byte Character Support): Modify iso-accents-mode index entry.
Eli Zaretskii <eliz@gnu.org>
parents:
31277
diff
changeset
|
1339 @cindex Latin-1, Latin-2 and Latin-3 input mode |
38133 | 1340 For Latin-1, Latin-2 and Latin-3, @kbd{M-x iso-accents-mode} enables |
1341 a minor mode that works much like the @code{latin-1-prefix} input | |
38050
89031b4b9a28
Proofreading fixes from Tim Sanders <tim@timsanders.freeserve.co.uk>.
Eli Zaretskii <eliz@gnu.org>
parents:
37870
diff
changeset
|
1342 method, but does not depend on having the input methods installed. This |
36170
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
1343 mode is buffer-local. It can be customized for various languages with |
0fd801cdb9fd
Clarify undisplayable characters, --unibyte, locales.
Richard M. Stallman <rms@gnu.org>
parents:
35206
diff
changeset
|
1344 @kbd{M-x iso-accents-customize}. |
25829 | 1345 @end itemize |