Mercurial > emacs
annotate man/mule.texi @ 28285:c54d62415e91
Changed the type of parameter passed to the
function defined by `quickurl-format-function'. Before only the
text of the URL was passed. Now the whole URL structure is passed
and the function is responsible for extracting the parts it
requires. Changed the default of `quickurl-format-function'
accordingly.
(quickurl-insert): Changed the `funcall' of
`quickurl-format-function' to match the above change.
(quickurl-list-insert): Changed the `url' case so that it makes
use of `quickurl-format-function', previous to this the format was
hard wired.
author | Gerd Moellmann <gerd@gnu.org> |
---|---|
date | Thu, 23 Mar 2000 13:53:14 +0000 |
parents | 0699f691fac1 |
children | ccadb68eaefd |
rev | line source |
---|---|
25829 | 1 @c This is part of the Emacs manual. |
2 @c Copyright (C) 1997, 1999 Free Software Foundation, Inc. | |
3 @c See file emacs.texi for copying conditions. | |
4 @node International, Major Modes, Frames, Top | |
5 @chapter International Character Set Support | |
6 @cindex MULE | |
7 @cindex international scripts | |
8 @cindex multibyte characters | |
9 @cindex encoding of characters | |
10 | |
11 @cindex Chinese | |
26140
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
12 @cindex Cyrillic |
25829 | 13 @cindex Devanagari |
14 @cindex Hindi | |
15 @cindex Marathi | |
26140
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
16 @cindex Ethiopic |
25829 | 17 @cindex Greek |
26140
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
18 @cindex Hebrew |
25829 | 19 @cindex IPA |
20 @cindex Japanese | |
21 @cindex Korean | |
22 @cindex Lao | |
23 @cindex Thai | |
24 @cindex Tibetan | |
25 @cindex Vietnamese | |
26 Emacs supports a wide variety of international character sets, | |
27 including European variants of the Latin alphabet, as well as Chinese, | |
26140
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
28 Cyrillic, Devanagari (Hindi and Marathi), Ethiopic, Greek, Hebrew, IPA, |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
29 Japanese, Korean, Lao, Thai, Tibetan, and Vietnamese scripts. These features |
25829 | 30 have been merged from the modified version of Emacs known as MULE (for |
31 ``MULti-lingual Enhancement to GNU Emacs'') | |
32 | |
33 @menu | |
34 * International Intro:: Basic concepts of multibyte characters. | |
35 * Enabling Multibyte:: Controlling whether to use multibyte characters. | |
36 * Language Environments:: Setting things up for the language you use. | |
37 * Input Methods:: Entering text characters not on your keyboard. | |
38 * Select Input Method:: Specifying your choice of input methods. | |
39 * Multibyte Conversion:: How single-byte characters convert to multibyte. | |
40 * Coding Systems:: Character set conversion when you read and | |
41 write files, and so on. | |
42 * Recognize Coding:: How Emacs figures out which conversion to use. | |
43 * Specify Coding:: Various ways to choose which conversion to use. | |
44 * Fontsets:: Fontsets are collections of fonts | |
45 that cover the whole spectrum of characters. | |
46 * Defining Fontsets:: Defining a new fontset. | |
27211
0699f691fac1
Don't conflate single-byte with European.
Dave Love <fx@gnu.org>
parents:
27156
diff
changeset
|
47 * Single-Byte Character Support:: |
25829 | 48 You can pick one European character set |
49 to use without multibyte characters. | |
50 @end menu | |
51 | |
52 @node International Intro | |
53 @section Introduction to International Character Sets | |
54 | |
55 The users of these scripts have established many more-or-less standard | |
56 coding systems for storing files. Emacs internally uses a single | |
57 multibyte character encoding, so that it can intermix characters from | |
58 all these scripts in a single buffer or string. This encoding | |
59 represents each non-ASCII character as a sequence of bytes in the range | |
60 0200 through 0377. Emacs translates between the multibyte character | |
61 encoding and various other coding systems when reading and writing | |
62 files, when exchanging data with subprocesses, and (in some cases) in | |
63 the @kbd{C-q} command (@pxref{Multibyte Conversion}). | |
64 | |
65 @kindex C-h h | |
66 @findex view-hello-file | |
67 The command @kbd{C-h h} (@code{view-hello-file}) displays the file | |
68 @file{etc/HELLO}, which shows how to say ``hello'' in many languages. | |
27156
488f307b4f59
(International Intro): Add a link to to Fontsets.
Gerd Moellmann <gerd@gnu.org>
parents:
26513
diff
changeset
|
69 This illustrates various scripts. If the font you're using doesn't have |
488f307b4f59
(International Intro): Add a link to to Fontsets.
Gerd Moellmann <gerd@gnu.org>
parents:
26513
diff
changeset
|
70 characters for all those different languages, you will see some hollow |
488f307b4f59
(International Intro): Add a link to to Fontsets.
Gerd Moellmann <gerd@gnu.org>
parents:
26513
diff
changeset
|
71 boxes instead of characters; see @ref{Fontsets}. |
25829 | 72 |
73 Keyboards, even in the countries where these character sets are used, | |
74 generally don't have keys for all the characters in them. So Emacs | |
75 supports various @dfn{input methods}, typically one for each script or | |
76 language, to make it convenient to type them. | |
77 | |
78 @kindex C-x RET | |
79 The prefix key @kbd{C-x @key{RET}} is used for commands that pertain | |
80 to multibyte characters, coding systems, and input methods. | |
81 | |
82 @node Enabling Multibyte | |
83 @section Enabling Multibyte Characters | |
84 | |
85 You can enable or disable multibyte character support, either for | |
86 Emacs as a whole, or for a single buffer. When multibyte characters are | |
87 disabled in a buffer, then each byte in that buffer represents a | |
88 character, even codes 0200 through 0377. The old features for | |
89 supporting the European character sets, ISO Latin-1 and ISO Latin-2, | |
90 work as they did in Emacs 19 and also work for the other ISO 8859 | |
91 character sets. | |
92 | |
93 However, there is no need to turn off multibyte character support to | |
94 use ISO Latin; the Emacs multibyte character set includes all the | |
95 characters in these character sets, and Emacs can translate | |
96 automatically to and from the ISO codes. | |
97 | |
98 To edit a particular file in unibyte representation, visit it using | |
99 @code{find-file-literally}. @xref{Visiting}. To convert a buffer in | |
100 multibyte representation into a single-byte representation of the same | |
101 characters, the easiest way is to save the contents in a file, kill the | |
102 buffer, and find the file again with @code{find-file-literally}. You | |
103 can also use @kbd{C-x @key{RET} c} | |
104 (@code{universal-coding-system-argument}) and specify @samp{raw-text} as | |
105 the coding system with which to find or save a file. @xref{Specify | |
106 Coding}. Finding a file as @samp{raw-text} doesn't disable format | |
107 conversion, uncompression and auto mode selection as | |
108 @code{find-file-literally} does. | |
109 | |
110 @vindex enable-multibyte-characters | |
111 @vindex default-enable-multibyte-characters | |
112 To turn off multibyte character support by default, start Emacs with | |
113 the @samp{--unibyte} option (@pxref{Initial Options}), or set the | |
114 environment variable @samp{EMACS_UNIBYTE}. You can also customize | |
115 @code{enable-multibyte-characters} or, equivalently, directly set the | |
116 variable @code{default-enable-multibyte-characters} in your init file to | |
117 have basically the same effect as @samp{--unibyte}. | |
118 | |
119 Multibyte strings are not created during initialization from the | |
120 values of environment variables, @file{/etc/passwd} entries etc.@: that | |
121 contain non-ASCII 8-bit characters. However, the initialization file is | |
122 normally read as multibyte---like Lisp files in general---even with | |
123 @samp{--unibyte}. To avoid multibyte strings being generated by | |
124 non-ASCII characters in it, put @samp{-*-unibyte: t;-*-} in a comment on | |
125 the first line. Do the same for initialization files for packages like | |
126 Gnus. | |
127 | |
128 The mode line indicates whether multibyte character support is enabled | |
129 in the current buffer. If it is, there are two or more characters (most | |
130 often two dashes) before the colon near the beginning of the mode line. | |
131 When multibyte characters are not enabled, just one dash precedes the | |
132 colon. | |
133 | |
134 @node Language Environments | |
135 @section Language Environments | |
136 @cindex language environments | |
137 | |
138 All supported character sets are supported in Emacs buffers whenever | |
139 multibyte characters are enabled; there is no need to select a | |
140 particular language in order to display its characters in an Emacs | |
141 buffer. However, it is important to select a @dfn{language environment} | |
142 in order to set various defaults. The language environment really | |
143 represents a choice of preferred script (more or less) rather than a | |
144 choice of language. | |
145 | |
146 The language environment controls which coding systems to recognize | |
147 when reading text (@pxref{Recognize Coding}). This applies to files, | |
148 incoming mail, netnews, and any other text you read into Emacs. It may | |
149 also specify the default coding system to use when you create a file. | |
150 Each language environment also specifies a default input method. | |
151 | |
152 @findex set-language-environment | |
26140
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
153 @vindex current-language-environment |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
154 To select a language environment, customize the option |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
155 @code{current-language-environment} or use the command @kbd{M-x |
25829 | 156 set-language-environment}. It makes no difference which buffer is |
157 current when you use this command, because the effects apply globally to | |
158 the Emacs session. The supported language environments include: | |
159 | |
160 @quotation | |
26140
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
161 Chinese-BIG5, Chinese-CNS, Chinese-GB, Cyrillic-ALT, Cyrillic-ISO, |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
162 Cyrillic-KOI8, Czech, Devanagari, English, Ethiopic, German, Greek, |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
163 Hebrew, IPA, Japanese, Korean, Lao, Latin-1, Latin-2, Latin-3, |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
164 Latin-4, Latin-5, Latin-8, Latin-9, Romanian, Slovak, Slovenian, Thai, |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
165 Tibetan, Turkish, and Vietnamese. |
25829 | 166 @end quotation |
167 | |
26140
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
168 @findex set-locale-environment |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
169 @vindex locale-language-names |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
170 @vindex locale-charset-language-names |
25829 | 171 Some operating systems let you specify the language you are using by |
26140
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
172 setting the locale environment variables @env{LC_ALL}, @env{LC_CTYPE}, |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
173 and @env{LANG}; the first of these which is nonempty specifies your |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
174 locale. Emacs handles this during startup by invoking the |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
175 @code{set-locale-environment} function, which matches your locale |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
176 against entries in the value of the variable |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
177 @code{locale-language-names} and selects the corresponding language |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
178 environment if a match is found. But if your locale also matches an |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
179 entry in the variable @code{locale-charset-language-names}, this entry |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
180 is preferred if its character set disagrees. For example, suppose the |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
181 locale @samp{en_GB.ISO8859-15} matches @code{"Latin-1"} in |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
182 @code{locale-language-names} and @code{"Latin-9"} in |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
183 @code{locale-charset-language-names}; since these two language |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
184 environments' character sets disagree, Emacs uses @code{"Latin-9"}. |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
185 |
26513
949ca235ee9e
Describe the relationship between set-locale-environment and
Paul Eggert <eggert@twinsun.com>
parents:
26140
diff
changeset
|
186 If all goes well, the @code{set-locale-environment} function selects |
949ca235ee9e
Describe the relationship between set-locale-environment and
Paul Eggert <eggert@twinsun.com>
parents:
26140
diff
changeset
|
187 the language environment, since language is part of locale. It also |
949ca235ee9e
Describe the relationship between set-locale-environment and
Paul Eggert <eggert@twinsun.com>
parents:
26140
diff
changeset
|
188 adjusts the display table and terminal coding system, the locale coding |
949ca235ee9e
Describe the relationship between set-locale-environment and
Paul Eggert <eggert@twinsun.com>
parents:
26140
diff
changeset
|
189 system, and the preferred coding system as needed for the locale. |
949ca235ee9e
Describe the relationship between set-locale-environment and
Paul Eggert <eggert@twinsun.com>
parents:
26140
diff
changeset
|
190 |
949ca235ee9e
Describe the relationship between set-locale-environment and
Paul Eggert <eggert@twinsun.com>
parents:
26140
diff
changeset
|
191 Since the @code{set-locale-environment} function is automatically |
949ca235ee9e
Describe the relationship between set-locale-environment and
Paul Eggert <eggert@twinsun.com>
parents:
26140
diff
changeset
|
192 invoked during startup, you normally do not need to invoke it yourself. |
949ca235ee9e
Describe the relationship between set-locale-environment and
Paul Eggert <eggert@twinsun.com>
parents:
26140
diff
changeset
|
193 However, if you modify the @env{LC_ALL}, @env{LC_CTYPE}, or @env{LANG} |
949ca235ee9e
Describe the relationship between set-locale-environment and
Paul Eggert <eggert@twinsun.com>
parents:
26140
diff
changeset
|
194 environment variables, you may want to invoke the |
949ca235ee9e
Describe the relationship between set-locale-environment and
Paul Eggert <eggert@twinsun.com>
parents:
26140
diff
changeset
|
195 @code{set-locale-environment} function afterwards. |
949ca235ee9e
Describe the relationship between set-locale-environment and
Paul Eggert <eggert@twinsun.com>
parents:
26140
diff
changeset
|
196 |
26140
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
197 @findex set-locale-environment |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
198 @vindex locale-preferred-coding-systems |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
199 The @code{set-locale-environment} function normally uses the preferred |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
200 coding system established by the language environment to decode system |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
201 messages. But if your locale matches an entry in the variable |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
202 @code{locale-preferred-coding-systems}, Emacs uses the corresponding |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
203 coding system instead. For example, if the locale @samp{ja_JP.PCK} |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
204 matches @code{japanese-shift-jis} in |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
205 @code{locale-preferred-coding-systems}, Emacs uses that encoding even |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
206 though it might normally use @code{japanese-iso-8bit}. |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
207 |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
208 The environment chosen from the locale when Emacs starts is |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
209 overidden by any explicit use of the command |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
210 @code{set-language-environment} or customization of |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
211 @code{current-language-environment} in your init file. |
25829 | 212 |
213 @kindex C-h L | |
214 @findex describe-language-environment | |
215 To display information about the effects of a certain language | |
216 environment @var{lang-env}, use the command @kbd{C-h L @var{lang-env} | |
217 @key{RET}} (@code{describe-language-environment}). This tells you which | |
218 languages this language environment is useful for, and lists the | |
219 character sets, coding systems, and input methods that go with it. It | |
220 also shows some sample text to illustrate scripts used in this language | |
221 environment. By default, this command describes the chosen language | |
222 environment. | |
223 | |
224 @vindex set-language-environment-hook | |
225 You can customize any language environment with the normal hook | |
226 @code{set-language-environment-hook}. The command | |
227 @code{set-language-environment} runs that hook after setting up the new | |
228 language environment. The hook functions can test for a specific | |
229 language environment by checking the variable | |
230 @code{current-language-environment}. | |
231 | |
232 @vindex exit-language-environment-hook | |
233 Before it starts to set up the new language environment, | |
234 @code{set-language-environment} first runs the hook | |
235 @code{exit-language-environment-hook}. This hook is useful for undoing | |
236 customizations that were made with @code{set-language-environment-hook}. | |
237 For instance, if you set up a special key binding in a specific language | |
238 environment using @code{set-language-environment-hook}, you should set | |
239 up @code{exit-language-environment-hook} to restore the normal binding | |
240 for that key. | |
241 | |
242 @node Input Methods | |
243 @section Input Methods | |
244 | |
245 @cindex input methods | |
246 An @dfn{input method} is a kind of character conversion designed | |
247 specifically for interactive input. In Emacs, typically each language | |
248 has its own input method; sometimes several languages which use the same | |
249 characters can share one input method. A few languages support several | |
250 input methods. | |
251 | |
252 The simplest kind of input method works by mapping ASCII letters into | |
253 another alphabet. This is how the Greek and Russian input methods work. | |
254 | |
255 A more powerful technique is composition: converting sequences of | |
256 characters into one letter. Many European input methods use composition | |
257 to produce a single non-ASCII letter from a sequence that consists of a | |
258 letter followed by accent characters (or vice versa). For example, some | |
259 methods convert the sequence @kbd{a'} into a single accented letter. | |
260 These input methods have no special commands of their own; all they do | |
261 is compose sequences of printing characters. | |
262 | |
263 The input methods for syllabic scripts typically use mapping followed | |
264 by composition. The input methods for Thai and Korean work this way. | |
265 First, letters are mapped into symbols for particular sounds or tone | |
266 marks; then, sequences of these which make up a whole syllable are | |
267 mapped into one syllable sign. | |
268 | |
269 Chinese and Japanese require more complex methods. In Chinese input | |
270 methods, first you enter the phonetic spelling of a Chinese word (in | |
271 input method @code{chinese-py}, among others), or a sequence of portions | |
272 of the character (input methods @code{chinese-4corner} and | |
273 @code{chinese-sw}, and others). Since one phonetic spelling typically | |
274 corresponds to many different Chinese characters, you must select one of | |
275 the alternatives using special Emacs commands. Keys such as @kbd{C-f}, | |
276 @kbd{C-b}, @kbd{C-n}, @kbd{C-p}, and digits have special definitions in | |
277 this situation, used for selecting among the alternatives. @key{TAB} | |
278 displays a buffer showing all the possibilities. | |
279 | |
280 In Japanese input methods, first you input a whole word using | |
281 phonetic spelling; then, after the word is in the buffer, Emacs converts | |
282 it into one or more characters using a large dictionary. One phonetic | |
283 spelling corresponds to many differently written Japanese words, so you | |
284 must select one of them; use @kbd{C-n} and @kbd{C-p} to cycle through | |
285 the alternatives. | |
286 | |
287 Sometimes it is useful to cut off input method processing so that the | |
288 characters you have just entered will not combine with subsequent | |
289 characters. For example, in input method @code{latin-1-postfix}, the | |
290 sequence @kbd{e '} combines to form an @samp{e} with an accent. What if | |
291 you want to enter them as separate characters? | |
292 | |
293 One way is to type the accent twice; that is a special feature for | |
294 entering the separate letter and accent. For example, @kbd{e ' '} gives | |
295 you the two characters @samp{e'}. Another way is to type another letter | |
296 after the @kbd{e}---something that won't combine with that---and | |
297 immediately delete it. For example, you could type @kbd{e e @key{DEL} | |
298 '} to get separate @samp{e} and @samp{'}. | |
299 | |
300 Another method, more general but not quite as easy to type, is to use | |
301 @kbd{C-\ C-\} between two characters to stop them from combining. This | |
302 is the command @kbd{C-\} (@code{toggle-input-method}) used twice. | |
303 @ifinfo | |
304 @xref{Select Input Method}. | |
305 @end ifinfo | |
306 | |
307 @kbd{C-\ C-\} is especially useful inside an incremental search, | |
308 because it stops waiting for more characters to combine, and starts | |
309 searching for what you have already entered. | |
310 | |
311 @vindex input-method-verbose-flag | |
312 @vindex input-method-highlight-flag | |
313 The variables @code{input-method-highlight-flag} and | |
314 @code{input-method-verbose-flag} control how input methods explain what | |
315 is happening. If @code{input-method-highlight-flag} is non-@code{nil}, | |
316 the partial sequence is highlighted in the buffer. If | |
317 @code{input-method-verbose-flag} is non-@code{nil}, the list of possible | |
318 characters to type next is displayed in the echo area (but not when you | |
319 are in the minibuffer). | |
320 | |
321 @node Select Input Method | |
322 @section Selecting an Input Method | |
323 | |
324 @table @kbd | |
325 @item C-\ | |
326 Enable or disable use of the selected input method. | |
327 | |
328 @item C-x @key{RET} C-\ @var{method} @key{RET} | |
329 Select a new input method for the current buffer. | |
330 | |
331 @item C-h I @var{method} @key{RET} | |
332 @itemx C-h C-\ @var{method} @key{RET} | |
333 @findex describe-input-method | |
334 @kindex C-h I | |
335 @kindex C-h C-\ | |
336 Describe the input method @var{method} (@code{describe-input-method}). | |
337 By default, it describes the current input method (if any). | |
338 This description should give you the full details of how to | |
339 use any particular input method. | |
340 | |
341 @item M-x list-input-methods | |
342 Display a list of all the supported input methods. | |
343 @end table | |
344 | |
345 @findex set-input-method | |
346 @vindex current-input-method | |
347 @kindex C-x RET C-\ | |
348 To choose an input method for the current buffer, use @kbd{C-x | |
349 @key{RET} C-\} (@code{set-input-method}). This command reads the | |
350 input method name with the minibuffer; the name normally starts with the | |
351 language environment that it is meant to be used with. The variable | |
352 @code{current-input-method} records which input method is selected. | |
353 | |
354 @findex toggle-input-method | |
355 @kindex C-\ | |
356 Input methods use various sequences of ASCII characters to stand for | |
357 non-ASCII characters. Sometimes it is useful to turn off the input | |
358 method temporarily. To do this, type @kbd{C-\} | |
359 (@code{toggle-input-method}). To reenable the input method, type | |
360 @kbd{C-\} again. | |
361 | |
362 If you type @kbd{C-\} and you have not yet selected an input method, | |
363 it prompts for you to specify one. This has the same effect as using | |
364 @kbd{C-x @key{RET} C-\} to specify an input method. | |
365 | |
366 @vindex default-input-method | |
367 Selecting a language environment specifies a default input method for | |
368 use in various buffers. When you have a default input method, you can | |
369 select it in the current buffer by typing @kbd{C-\}. The variable | |
370 @code{default-input-method} specifies the default input method | |
371 (@code{nil} means there is none). | |
372 | |
373 @findex quail-set-keyboard-layout | |
374 Some input methods for alphabetic scripts work by (in effect) | |
375 remapping the keyboard to emulate various keyboard layouts commonly used | |
376 for those scripts. How to do this remapping properly depends on your | |
377 actual keyboard layout. To specify which layout your keyboard has, use | |
378 the command @kbd{M-x quail-set-keyboard-layout}. | |
379 | |
380 @findex list-input-methods | |
381 To display a list of all the supported input methods, type @kbd{M-x | |
382 list-input-methods}. The list gives information about each input | |
383 method, including the string that stands for it in the mode line. | |
384 | |
385 @node Multibyte Conversion | |
386 @section Unibyte and Multibyte Non-ASCII characters | |
387 | |
388 When multibyte characters are enabled, character codes 0240 (octal) | |
389 through 0377 (octal) are not really legitimate in the buffer. The valid | |
390 non-ASCII printing characters have codes that start from 0400. | |
391 | |
392 If you type a self-inserting character in the invalid range 0240 | |
393 through 0377, Emacs assumes you intended to use one of the ISO | |
394 Latin-@var{n} character sets, and converts it to the Emacs code | |
395 representing that Latin-@var{n} character. You select @emph{which} ISO | |
396 Latin character set to use through your choice of language environment | |
397 @iftex | |
398 (see above). | |
399 @end iftex | |
400 @ifinfo | |
401 (@pxref{Language Environments}). | |
402 @end ifinfo | |
403 If you do not specify a choice, the default is Latin-1. | |
404 | |
405 The same thing happens when you use @kbd{C-q} to enter an octal code | |
406 in this range. | |
407 | |
408 @node Coding Systems | |
409 @section Coding Systems | |
410 @cindex coding systems | |
411 | |
412 Users of various languages have established many more-or-less standard | |
413 coding systems for representing them. Emacs does not use these coding | |
414 systems internally; instead, it converts from various coding systems to | |
415 its own system when reading data, and converts the internal coding | |
416 system to other coding systems when writing data. Conversion is | |
417 possible in reading or writing files, in sending or receiving from the | |
418 terminal, and in exchanging data with subprocesses. | |
419 | |
420 Emacs assigns a name to each coding system. Most coding systems are | |
421 used for one language, and the name of the coding system starts with the | |
422 language name. Some coding systems are used for several languages; | |
423 their names usually start with @samp{iso}. There are also special | |
424 coding systems @code{no-conversion}, @code{raw-text} and | |
425 @code{emacs-mule} which do not convert printing characters at all. | |
426 | |
427 @cindex end-of-line conversion | |
428 In addition to converting various representations of non-ASCII | |
429 characters, a coding system can perform end-of-line conversion. Emacs | |
430 handles three different conventions for how to separate lines in a file: | |
431 newline, carriage-return linefeed, and just carriage-return. | |
432 | |
433 @table @kbd | |
434 @item C-h C @var{coding} @key{RET} | |
435 Describe coding system @var{coding}. | |
436 | |
437 @item C-h C @key{RET} | |
438 Describe the coding systems currently in use. | |
439 | |
440 @item M-x list-coding-systems | |
441 Display a list of all the supported coding systems. | |
442 @end table | |
443 | |
444 @kindex C-h C | |
445 @findex describe-coding-system | |
446 The command @kbd{C-h C} (@code{describe-coding-system}) displays | |
447 information about particular coding systems. You can specify a coding | |
448 system name as argument; alternatively, with an empty argument, it | |
449 describes the coding systems currently selected for various purposes, | |
450 both in the current buffer and as the defaults, and the priority list | |
451 for recognizing coding systems (@pxref{Recognize Coding}). | |
452 | |
453 @findex list-coding-systems | |
454 To display a list of all the supported coding systems, type @kbd{M-x | |
455 list-coding-systems}. The list gives information about each coding | |
456 system, including the letter that stands for it in the mode line | |
457 (@pxref{Mode Line}). | |
458 | |
459 @cindex end-of-line conversion | |
460 @cindex MS-DOS end-of-line conversion | |
461 @cindex Macintosh end-of-line conversion | |
462 Each of the coding systems that appear in this list---except for | |
463 @code{no-conversion}, which means no conversion of any kind---specifies | |
464 how and whether to convert printing characters, but leaves the choice of | |
465 end-of-line conversion to be decided based on the contents of each file. | |
466 For example, if the file appears to use the sequence carriage-return | |
467 linefeed to separate lines, DOS end-of-line conversion will be used. | |
468 | |
469 Each of the listed coding systems has three variants which specify | |
470 exactly what to do for end-of-line conversion: | |
471 | |
472 @table @code | |
473 @item @dots{}-unix | |
474 Don't do any end-of-line conversion; assume the file uses | |
475 newline to separate lines. (This is the convention normally used | |
476 on Unix and GNU systems.) | |
477 | |
478 @item @dots{}-dos | |
479 Assume the file uses carriage-return linefeed to separate lines, and do | |
480 the appropriate conversion. (This is the convention normally used on | |
481 Microsoft systems.@footnote{It is also specified for MIME `text/*' | |
482 bodies and in other network transport contexts. It is different | |
483 from the SGML reference syntax record-start/record-end format which | |
484 Emacs doesn't support directly.}) | |
485 | |
486 @item @dots{}-mac | |
487 Assume the file uses carriage-return to separate lines, and do the | |
488 appropriate conversion. (This is the convention normally used on the | |
489 Macintosh system.) | |
490 @end table | |
491 | |
492 These variant coding systems are omitted from the | |
493 @code{list-coding-systems} display for brevity, since they are entirely | |
494 predictable. For example, the coding system @code{iso-latin-1} has | |
495 variants @code{iso-latin-1-unix}, @code{iso-latin-1-dos} and | |
496 @code{iso-latin-1-mac}. | |
497 | |
498 The coding system @code{raw-text} is good for a file which is mainly | |
499 ASCII text, but may contain byte values above 127 which are not meant to | |
500 encode non-ASCII characters. With @code{raw-text}, Emacs copies those | |
501 byte values unchanged, and sets @code{enable-multibyte-characters} to | |
502 @code{nil} in the current buffer so that they will be interpreted | |
503 properly. @code{raw-text} handles end-of-line conversion in the usual | |
504 way, based on the data encountered, and has the usual three variants to | |
505 specify the kind of end-of-line conversion to use. | |
506 | |
507 In contrast, the coding system @code{no-conversion} specifies no | |
508 character code conversion at all---none for non-ASCII byte values and | |
509 none for end of line. This is useful for reading or writing binary | |
510 files, tar files, and other files that must be examined verbatim. It, | |
511 too, sets @code{enable-multibyte-characters} to @code{nil}. | |
512 | |
513 The easiest way to edit a file with no conversion of any kind is with | |
514 the @kbd{M-x find-file-literally} command. This uses | |
515 @code{no-conversion}, and also suppresses other Emacs features that | |
516 might convert the file contents before you see them. @xref{Visiting}. | |
517 | |
518 The coding system @code{emacs-mule} means that the file contains | |
519 non-ASCII characters stored with the internal Emacs encoding. It | |
520 handles end-of-line conversion based on the data encountered, and has | |
521 the usual three variants to specify the kind of end-of-line conversion. | |
522 | |
523 @node Recognize Coding | |
524 @section Recognizing Coding Systems | |
525 | |
526 Most of the time, Emacs can recognize which coding system to use for | |
527 any given file---once you have specified your preferences. | |
528 | |
529 Some coding systems can be recognized or distinguished by which byte | |
530 sequences appear in the data. However, there are coding systems that | |
531 cannot be distinguished, not even potentially. For example, there is no | |
532 way to distinguish between Latin-1 and Latin-2; they use the same byte | |
533 values with different meanings. | |
534 | |
535 Emacs handles this situation by means of a priority list of coding | |
536 systems. Whenever Emacs reads a file, if you do not specify the coding | |
537 system to use, Emacs checks the data against each coding system, | |
538 starting with the first in priority and working down the list, until it | |
539 finds a coding system that fits the data. Then it converts the file | |
540 contents assuming that they are represented in this coding system. | |
541 | |
542 The priority list of coding systems depends on the selected language | |
543 environment (@pxref{Language Environments}). For example, if you use | |
544 French, you probably want Emacs to prefer Latin-1 to Latin-2; if you use | |
545 Czech, you probably want Latin-2 to be preferred. This is one of the | |
546 reasons to specify a language environment. | |
547 | |
548 @findex prefer-coding-system | |
549 However, you can alter the priority list in detail with the command | |
550 @kbd{M-x prefer-coding-system}. This command reads the name of a coding | |
551 system from the minibuffer, and adds it to the front of the priority | |
552 list, so that it is preferred to all others. If you use this command | |
553 several times, each use adds one element to the front of the priority | |
554 list. | |
555 | |
556 If you use a coding system that specifies the end-of-line conversion | |
557 type, such as @code{iso-8859-1-dos}, what that means is that Emacs | |
558 should attempt to recognize @code{iso-8859-1} with priority, and should | |
559 use DOS end-of-line conversion in case it recognizes @code{iso-8859-1}. | |
560 | |
561 @vindex file-coding-system-alist | |
562 Sometimes a file name indicates which coding system to use for the | |
563 file. The variable @code{file-coding-system-alist} specifies this | |
564 correspondence. There is a special function | |
565 @code{modify-coding-system-alist} for adding elements to this list. For | |
566 example, to read and write all @samp{.txt} files using the coding system | |
567 @code{china-iso-8bit}, you can execute this Lisp expression: | |
568 | |
569 @smallexample | |
570 (modify-coding-system-alist 'file "\\.txt\\'" 'china-iso-8bit) | |
571 @end smallexample | |
572 | |
573 @noindent | |
574 The first argument should be @code{file}, the second argument should be | |
575 a regular expression that determines which files this applies to, and | |
576 the third argument says which coding system to use for these files. | |
577 | |
578 @vindex inhibit-eol-conversion | |
579 Emacs recognizes which kind of end-of-line conversion to use based on | |
580 the contents of the file: if it sees only carriage-returns, or only | |
581 carriage-return linefeed sequences, then it chooses the end-of-line | |
582 conversion accordingly. You can inhibit the automatic use of | |
583 end-of-line conversion by setting the variable @code{inhibit-eol-conversion} | |
584 to non-@code{nil}. | |
585 | |
586 @vindex coding | |
587 You can specify the coding system for a particular file using the | |
588 @samp{-*-@dots{}-*-} construct at the beginning of a file, or a local | |
589 variables list at the end (@pxref{File Variables}). You do this by | |
590 defining a value for the ``variable'' named @code{coding}. Emacs does | |
591 not really have a variable @code{coding}; instead of setting a variable, | |
592 it uses the specified coding system for the file. For example, | |
593 @samp{-*-mode: C; coding: latin-1;-*-} specifies use of the Latin-1 | |
594 coding system, as well as C mode. If you specify the coding explicitly | |
595 in the file, that overrides @code{file-coding-system-alist}. | |
596 | |
597 @vindex auto-coding-alist | |
598 The variable @code{auto-coding-alist} is the strongest way to specify | |
599 the coding system for certain patterns of file names; this variable even | |
600 overrides @samp{-*-coding:-*-} tags in the file itself. Emacs uses this | |
601 feature for tar and archive files, to prevent Emacs from being confused | |
602 by a @samp{-*-coding:-*-} tag in a member of the archive and thinking it | |
603 applies to the archive file as a whole. | |
604 | |
605 @vindex buffer-file-coding-system | |
606 Once Emacs has chosen a coding system for a buffer, it stores that | |
607 coding system in @code{buffer-file-coding-system} and uses that coding | |
608 system, by default, for operations that write from this buffer into a | |
609 file. This includes the commands @code{save-buffer} and | |
610 @code{write-region}. If you want to write files from this buffer using | |
611 a different coding system, you can specify a different coding system for | |
612 the buffer using @code{set-buffer-file-coding-system} (@pxref{Specify | |
613 Coding}). | |
614 | |
615 @vindex sendmail-coding-system | |
616 When you send a message with Mail mode (@pxref{Sending Mail}), Emacs has | |
617 four different ways to determine the coding system to use for encoding | |
618 the message text. It tries the buffer's own value of | |
619 @code{buffer-file-coding-system}, if that is non-@code{nil}. Otherwise, | |
620 it uses the value of @code{sendmail-coding-system}, if that is | |
621 non-@code{nil}. The third way is to use the default coding system for | |
622 new files, which is controlled by your choice of language environment, | |
623 if that is non-@code{nil}. If all of these three values are @code{nil}, | |
624 Emacs encodes outgoing mail using the Latin-1 coding system. | |
625 | |
626 @vindex rmail-decode-mime-charset | |
627 When you get new mail in Rmail, each message is translated | |
628 automatically from the coding system it is written in---as if it were a | |
629 separate file. This uses the priority list of coding systems that you | |
630 have specified. If a MIME message specifies a character set, Rmail | |
631 obeys that specification, unless @code{rmail-decode-mime-charset} is | |
632 @code{nil}. | |
633 | |
634 @vindex rmail-file-coding-system | |
635 For reading and saving Rmail files themselves, Emacs uses the coding | |
636 system specified by the variable @code{rmail-file-coding-system}. The | |
637 default value is @code{nil}, which means that Rmail files are not | |
638 translated (they are read and written in the Emacs internal character | |
639 code). | |
640 | |
641 @node Specify Coding | |
642 @section Specifying a Coding System | |
643 | |
644 In cases where Emacs does not automatically choose the right coding | |
645 system, you can use these commands to specify one: | |
646 | |
647 @table @kbd | |
648 @item C-x @key{RET} f @var{coding} @key{RET} | |
649 Use coding system @var{coding} for the visited file | |
650 in the current buffer. | |
651 | |
652 @item C-x @key{RET} c @var{coding} @key{RET} | |
653 Specify coding system @var{coding} for the immediately following | |
654 command. | |
655 | |
656 @item C-x @key{RET} k @var{coding} @key{RET} | |
657 Use coding system @var{coding} for keyboard input. | |
658 | |
659 @item C-x @key{RET} t @var{coding} @key{RET} | |
660 Use coding system @var{coding} for terminal output. | |
661 | |
662 @item C-x @key{RET} p @var{input-coding} @key{RET} @var{output-coding} @key{RET} | |
663 Use coding systems @var{input-coding} and @var{output-coding} for | |
664 subprocess input and output in the current buffer. | |
665 | |
666 @item C-x @key{RET} x @var{coding} @key{RET} | |
667 Use coding system @var{coding} for transferring selections to and from | |
668 other programs through the window system. | |
669 | |
670 @item C-x @key{RET} X @var{coding} @key{RET} | |
671 Use coding system @var{coding} for transferring @emph{one} | |
672 selection---the next one---to or from the window system. | |
673 @end table | |
674 | |
675 @kindex C-x RET f | |
676 @findex set-buffer-file-coding-system | |
677 The command @kbd{C-x @key{RET} f} (@code{set-buffer-file-coding-system}) | |
678 specifies the file coding system for the current buffer---in other | |
679 words, which coding system to use when saving or rereading the visited | |
680 file. You specify which coding system using the minibuffer. Since this | |
681 command applies to a file you have already visited, it affects only the | |
682 way the file is saved. | |
683 | |
684 @kindex C-x RET c | |
685 @findex universal-coding-system-argument | |
686 Another way to specify the coding system for a file is when you visit | |
687 the file. First use the command @kbd{C-x @key{RET} c} | |
688 (@code{universal-coding-system-argument}); this command uses the | |
689 minibuffer to read a coding system name. After you exit the minibuffer, | |
690 the specified coding system is used for @emph{the immediately following | |
691 command}. | |
692 | |
693 So if the immediately following command is @kbd{C-x C-f}, for example, | |
694 it reads the file using that coding system (and records the coding | |
695 system for when the file is saved). Or if the immediately following | |
696 command is @kbd{C-x C-w}, it writes the file using that coding system. | |
697 Other file commands affected by a specified coding system include | |
698 @kbd{C-x C-i} and @kbd{C-x C-v}, as well as the other-window variants of | |
699 @kbd{C-x C-f}. | |
700 | |
701 @kbd{C-x @key{RET} c} also affects commands that start subprocesses, | |
702 including @kbd{M-x shell} (@pxref{Shell}). | |
703 | |
704 However, if the immediately following command does not use the coding | |
705 system, then @kbd{C-x @key{RET} c} ultimately has no effect. | |
706 | |
707 An easy way to visit a file with no conversion is with the @kbd{M-x | |
708 find-file-literally} command. @xref{Visiting}. | |
709 | |
710 @vindex default-buffer-file-coding-system | |
711 The variable @code{default-buffer-file-coding-system} specifies the | |
712 choice of coding system to use when you create a new file. It applies | |
713 when you find a new file, and when you create a buffer and then save it | |
714 in a file. Selecting a language environment typically sets this | |
715 variable to a good choice of default coding system for that language | |
716 environment. | |
717 | |
718 @kindex C-x RET t | |
719 @findex set-terminal-coding-system | |
720 The command @kbd{C-x @key{RET} t} (@code{set-terminal-coding-system}) | |
721 specifies the coding system for terminal output. If you specify a | |
722 character code for terminal output, all characters output to the | |
723 terminal are translated into that coding system. | |
724 | |
725 This feature is useful for certain character-only terminals built to | |
726 support specific languages or character sets---for example, European | |
727 terminals that support one of the ISO Latin character sets. You need to | |
728 specify the terminal coding system when using multibyte text, so that | |
729 Emacs knows which characters the terminal can actually handle. | |
730 | |
731 By default, output to the terminal is not translated at all, unless | |
732 Emacs can deduce the proper coding system from your terminal type. | |
733 | |
734 @kindex C-x RET k | |
735 @findex set-keyboard-coding-system | |
736 The command @kbd{C-x @key{RET} k} (@code{set-keyboard-coding-system}) | |
737 specifies the coding system for keyboard input. Character-code | |
738 translation of keyboard input is useful for terminals with keys that | |
739 send non-ASCII graphic characters---for example, some terminals designed | |
740 for ISO Latin-1 or subsets of it. | |
741 | |
742 By default, keyboard input is not translated at all. | |
743 | |
744 There is a similarity between using a coding system translation for | |
745 keyboard input, and using an input method: both define sequences of | |
746 keyboard input that translate into single characters. However, input | |
747 methods are designed to be convenient for interactive use by humans, and | |
748 the sequences that are translated are typically sequences of ASCII | |
749 printing characters. Coding systems typically translate sequences of | |
750 non-graphic characters. | |
751 | |
752 @kindex C-x RET x | |
753 @kindex C-x RET X | |
754 @findex set-selection-coding-system | |
755 @findex set-next-selection-coding-system | |
756 The command @kbd{C-x @key{RET} x} (@code{set-selection-coding-system}) | |
757 specifies the coding system for sending selected text to the window | |
758 system, and for receiving the text of selections made in other | |
759 applications. This command applies to all subsequent selections, until | |
760 you override it by using the command again. The command @kbd{C-x | |
761 @key{RET} X} (@code{set-next-selection-coding-system}) specifies the | |
762 coding system for the next selection made in Emacs or read by Emacs. | |
763 | |
764 @kindex C-x RET p | |
765 @findex set-buffer-process-coding-system | |
766 The command @kbd{C-x @key{RET} p} (@code{set-buffer-process-coding-system}) | |
767 specifies the coding system for input and output to a subprocess. This | |
768 command applies to the current buffer; normally, each subprocess has its | |
769 own buffer, and thus you can use this command to specify translation to | |
770 and from a particular subprocess by giving the command in the | |
771 corresponding buffer. | |
772 | |
773 By default, process input and output are not translated at all. | |
774 | |
775 @vindex file-name-coding-system | |
776 The variable @code{file-name-coding-system} specifies a coding system | |
777 to use for encoding file names. If you set the variable to a coding | |
778 system name (as a Lisp symbol or a string), Emacs encodes file names | |
779 using that coding system for all file operations. This makes it | |
780 possible to use non-ASCII characters in file names---or, at least, those | |
781 non-ASCII characters which the specified coding system can encode. | |
782 | |
783 If @code{file-name-coding-system} is @code{nil}, Emacs uses a default | |
784 coding system determined by the selected language environment. In the | |
785 default language environment, any non-ASCII characters in file names are | |
786 not encoded specially; they appear in the file system using the internal | |
787 Emacs representation. | |
788 | |
789 @strong{Warning:} if you change @code{file-name-coding-system} (or the | |
790 language environment) in the middle of an Emacs session, problems can | |
791 result if you have already visited files whose names were encoded using | |
792 the earlier coding system and cannot be encoded (or are encoded | |
793 differently) under the new coding system. If you try to save one of | |
794 these buffers under the visited file name, saving may use the wrong file | |
795 name, or it may get an error. If such a problem happens, use @kbd{C-x | |
796 C-w} to specify a new file name for that buffer. | |
797 | |
26140
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
798 @vindex locale-coding-system |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
799 The variable @code{locale-coding-system} specifies a coding system to |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
800 use when encoding and decoding system strings such as system error |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
801 messages and @code{format-time-string} formats and time stamps. This |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
802 coding system should be compatible with the underlying system's coding |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
803 system, which is normally specified by the first environment variable in |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
804 the list @env{LC_ALL}, @env{LC_CTYPE}, @env{LANG} whose value is |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
805 nonempty. |
068f7ad41d40
Describe new functions and variables for locales.
Paul Eggert <eggert@twinsun.com>
parents:
25829
diff
changeset
|
806 |
25829 | 807 @node Fontsets |
808 @section Fontsets | |
809 @cindex fontsets | |
810 | |
811 A font for X Windows typically defines shapes for one alphabet or | |
812 script. Therefore, displaying the entire range of scripts that Emacs | |
813 supports requires a collection of many fonts. In Emacs, such a | |
814 collection is called a @dfn{fontset}. A fontset is defined by a list of | |
815 fonts, each assigned to handle a range of character codes. | |
816 | |
817 Each fontset has a name, like a font. The available X fonts are | |
818 defined by the X server; fontsets, however, are defined within Emacs | |
819 itself. Once you have defined a fontset, you can use it within Emacs by | |
820 specifying its name, anywhere that you could use a single font. Of | |
821 course, Emacs fontsets can use only the fonts that the X server | |
822 supports; if certain characters appear on the screen as hollow boxes, | |
823 this means that the fontset in use for them has no font for those | |
824 characters. | |
825 | |
826 Emacs creates two fontsets automatically: the @dfn{standard fontset} | |
827 and the @dfn{startup fontset}. The standard fontset is most likely to | |
828 have fonts for a wide variety of non-ASCII characters; however, this is | |
829 not the default for Emacs to use. (By default, Emacs tries to find a | |
830 font which has bold and italic variants.) You can specify use of the | |
831 standard fontset with the @samp{-fn} option, or with the @samp{Font} X | |
832 resource (@pxref{Font X}). For example, | |
833 | |
834 @example | |
835 emacs -fn fontset-standard | |
836 @end example | |
837 | |
838 A fontset does not necessarily specify a font for every character | |
839 code. If a fontset specifies no font for a certain character, or if it | |
840 specifies a font that does not exist on your system, then it cannot | |
841 display that character properly. It will display that character as an | |
842 empty box instead. | |
843 | |
844 @vindex highlight-wrong-size-font | |
845 The fontset height and width are determined by the ASCII characters | |
846 (that is, by the font used for ASCII characters in that fontset). If | |
847 another font in the fontset has a different height, or a different | |
848 width, then characters assigned to that font are clipped to the | |
849 fontset's size. If @code{highlight-wrong-size-font} is non-@code{nil}, | |
850 a box is displayed around these wrong-size characters as well. | |
851 | |
852 @node Defining Fontsets | |
853 @section Defining fontsets | |
854 | |
855 @vindex standard-fontset-spec | |
856 @cindex standard fontset | |
857 Emacs creates a standard fontset automatically according to the value | |
858 of @code{standard-fontset-spec}. This fontset's name is | |
859 | |
860 @example | |
861 -*-fixed-medium-r-normal-*-16-*-*-*-*-*-fontset-standard | |
862 @end example | |
863 | |
864 @noindent | |
865 or just @samp{fontset-standard} for short. | |
866 | |
867 Bold, italic, and bold-italic variants of the standard fontset are | |
868 created automatically. Their names have @samp{bold} instead of | |
869 @samp{medium}, or @samp{i} instead of @samp{r}, or both. | |
870 | |
871 @cindex startup fontset | |
872 If you specify a default ASCII font with the @samp{Font} resource or | |
873 the @samp{-fn} argument, Emacs generates a fontset from it | |
874 automatically. This is the @dfn{startup fontset} and its name is | |
875 @code{fontset-startup}. It does this by replacing the @var{foundry}, | |
876 @var{family}, @var{add_style}, and @var{average_width} fields of the | |
877 font name with @samp{*}, replacing @var{charset_registry} field with | |
878 @samp{fontset}, and replacing @var{charset_encoding} field with | |
879 @samp{startup}, then using the resulting string to specify a fontset. | |
880 | |
881 For instance, if you start Emacs this way, | |
882 | |
883 @example | |
884 emacs -fn "*courier-medium-r-normal--14-140-*-iso8859-1" | |
885 @end example | |
886 | |
887 @noindent | |
888 Emacs generates the following fontset and uses it for the initial X | |
889 window frame: | |
890 | |
891 @example | |
892 -*-*-medium-r-normal-*-14-140-*-*-*-*-fontset-startup | |
893 @end example | |
894 | |
895 With the X resource @samp{Emacs.Font}, you can specify a fontset name | |
896 just like an actual font name. But be careful not to specify a fontset | |
897 name in a wildcard resource like @samp{Emacs*Font}---that wildcard | |
898 specification applies to various other purposes, such as menus, and | |
899 menus cannot handle fontsets. | |
900 | |
901 You can specify additional fontsets using X resources named | |
902 @samp{Fontset-@var{n}}, where @var{n} is an integer starting from 0. | |
903 The resource value should have this form: | |
904 | |
905 @smallexample | |
906 @var{fontpattern}, @r{[}@var{charsetname}:@var{fontname}@r{]@dots{}} | |
907 @end smallexample | |
908 | |
909 @noindent | |
910 @var{fontpattern} should have the form of a standard X font name, except | |
911 for the last two fields. They should have the form | |
912 @samp{fontset-@var{alias}}. | |
913 | |
914 The fontset has two names, one long and one short. The long name is | |
915 @var{fontpattern}. The short name is @samp{fontset-@var{alias}}. You | |
916 can refer to the fontset by either name. | |
917 | |
918 The construct @samp{@var{charset}:@var{font}} specifies which font to | |
919 use (in this fontset) for one particular character set. Here, | |
920 @var{charset} is the name of a character set, and @var{font} is the | |
921 font to use for that character set. You can use this construct any | |
922 number of times in defining one fontset. | |
923 | |
924 For the other character sets, Emacs chooses a font based on | |
925 @var{fontpattern}. It replaces @samp{fontset-@var{alias}} with values | |
926 that describe the character set. For the ASCII character font, | |
927 @samp{fontset-@var{alias}} is replaced with @samp{ISO8859-1}. | |
928 | |
929 In addition, when several consecutive fields are wildcards, Emacs | |
930 collapses them into a single wildcard. This is to prevent use of | |
931 auto-scaled fonts. Fonts made by scaling larger fonts are not usable | |
932 for editing, and scaling a smaller font is not useful because it is | |
933 better to use the smaller font in its own size, which Emacs does. | |
934 | |
935 Thus if @var{fontpattern} is this, | |
936 | |
937 @example | |
938 -*-fixed-medium-r-normal-*-24-*-*-*-*-*-fontset-24 | |
939 @end example | |
940 | |
941 @noindent | |
942 the font specification for ASCII characters would be this: | |
943 | |
944 @example | |
945 -*-fixed-medium-r-normal-*-24-*-ISO8859-1 | |
946 @end example | |
947 | |
948 @noindent | |
949 and the font specification for Chinese GB2312 characters would be this: | |
950 | |
951 @example | |
952 -*-fixed-medium-r-normal-*-24-*-gb2312*-* | |
953 @end example | |
954 | |
955 You may not have any Chinese font matching the above font | |
956 specification. Most X distributions include only Chinese fonts that | |
957 have @samp{song ti} or @samp{fangsong ti} in @var{family} field. In | |
958 such a case, @samp{Fontset-@var{n}} can be specified as below: | |
959 | |
960 @smallexample | |
961 Emacs.Fontset-0: -*-fixed-medium-r-normal-*-24-*-*-*-*-*-fontset-24,\ | |
962 chinese-gb2312:-*-*-medium-r-normal-*-24-*-gb2312*-* | |
963 @end smallexample | |
964 | |
965 @noindent | |
966 Then, the font specifications for all but Chinese GB2312 characters have | |
967 @samp{fixed} in the @var{family} field, and the font specification for | |
968 Chinese GB2312 characters has a wild card @samp{*} in the @var{family} | |
969 field. | |
970 | |
971 @findex create-fontset-from-fontset-spec | |
972 The function that processes the fontset resource value to create the | |
973 fontset is called @code{create-fontset-from-fontset-spec}. You can also | |
974 call this function explicitly to create a fontset. | |
975 | |
976 @xref{Font X}, for more information about font naming in X. | |
977 | |
27211
0699f691fac1
Don't conflate single-byte with European.
Dave Love <fx@gnu.org>
parents:
27156
diff
changeset
|
978 @node Single-Byte Character Support |
0699f691fac1
Don't conflate single-byte with European.
Dave Love <fx@gnu.org>
parents:
27156
diff
changeset
|
979 @section Single-byte Character Set Support |
25829 | 980 |
981 @cindex European character sets | |
982 @cindex accented characters | |
983 @cindex ISO Latin character sets | |
984 @cindex Unibyte operation | |
985 @vindex enable-multibyte-characters | |
986 The ISO 8859 Latin-@var{n} character sets define character codes in | |
987 the range 160 to 255 to handle the accented letters and punctuation | |
27211
0699f691fac1
Don't conflate single-byte with European.
Dave Love <fx@gnu.org>
parents:
27156
diff
changeset
|
988 needed by various European languages (and some non-European ones). |
0699f691fac1
Don't conflate single-byte with European.
Dave Love <fx@gnu.org>
parents:
27156
diff
changeset
|
989 If you disable multibyte |
25829 | 990 characters, Emacs can still handle @emph{one} of these character codes |
991 at a time. To specify @emph{which} of these codes to use, invoke | |
992 @kbd{M-x set-language-environment} and specify a suitable language | |
993 environment such as @samp{Latin-@var{n}}. | |
994 | |
995 For more information about unibyte operation, see @ref{Enabling | |
996 Multibyte}. Note particularly that you probably want to ensure that | |
997 your initialization files are read as unibyte if they contain non-ASCII | |
998 characters. | |
999 | |
1000 @vindex unibyte-display-via-language-environment | |
1001 Emacs can also display those characters, provided the terminal or font | |
1002 in use supports them. This works automatically. Alternatively, if you | |
1003 are using a window system, Emacs can also display single-byte characters | |
1004 through fontsets, in effect by displaying the equivalent multibyte | |
1005 characters according to the current language environment. To request | |
1006 this, set the variable @code{unibyte-display-via-language-environment} | |
1007 to a non-@code{nil} value. | |
1008 | |
1009 @cindex @code{iso-ascii} library | |
1010 If your terminal does not support display of the Latin-1 character | |
1011 set, Emacs can display these characters as ASCII sequences which at | |
1012 least give you a clear idea of what the characters are. To do this, | |
1013 load the library @code{iso-ascii}. Similar libraries for other | |
1014 Latin-@var{n} character sets could be implemented, but we don't have | |
1015 them yet. | |
1016 | |
1017 @findex standard-display-8bit | |
1018 @cindex 8-bit display | |
1019 Normally non-ISO-8859 characters (between characters 128 and 159 | |
1020 inclusive) are displayed as octal escapes. You can change this for | |
1021 non-standard `extended' versions of ISO-8859 character sets by using the | |
1022 function @code{standard-display-8bit} in the @code{disp-table} library. | |
1023 | |
1024 There are three different ways you can input single-byte non-ASCII | |
1025 characters: | |
1026 | |
1027 @itemize @bullet | |
27211
0699f691fac1
Don't conflate single-byte with European.
Dave Love <fx@gnu.org>
parents:
27156
diff
changeset
|
1028 @cindex 8-bit input |
25829 | 1029 @item |
1030 If your keyboard can generate character codes 128 and up, representing | |
1031 non-ASCII characters, execute the following expression to enable Emacs to | |
1032 understand them: | |
1033 | |
1034 @example | |
1035 (set-input-mode (car (current-input-mode)) | |
1036 (nth 1 (current-input-mode)) | |
1037 0) | |
1038 @end example | |
1039 | |
27211
0699f691fac1
Don't conflate single-byte with European.
Dave Love <fx@gnu.org>
parents:
27156
diff
changeset
|
1040 It is not necessary to do this under a window system which can |
0699f691fac1
Don't conflate single-byte with European.
Dave Love <fx@gnu.org>
parents:
27156
diff
changeset
|
1041 distinguish 8-bit characters and Meta keys. If you do this on a normal |
0699f691fac1
Don't conflate single-byte with European.
Dave Love <fx@gnu.org>
parents:
27156
diff
changeset
|
1042 terminal, you will probably need to use @kbd{ESC} to type Meta |
0699f691fac1
Don't conflate single-byte with European.
Dave Love <fx@gnu.org>
parents:
27156
diff
changeset
|
1043 characters.@footnote{In some cases, such as the Linux console and |
0699f691fac1
Don't conflate single-byte with European.
Dave Love <fx@gnu.org>
parents:
27156
diff
changeset
|
1044 @code{xterm}, you can arrange for Meta to be converted to @kbd{ESC} and |
0699f691fac1
Don't conflate single-byte with European.
Dave Love <fx@gnu.org>
parents:
27156
diff
changeset
|
1045 still be able type 8-bit characters present directly on the keyboard or |
0699f691fac1
Don't conflate single-byte with European.
Dave Love <fx@gnu.org>
parents:
27156
diff
changeset
|
1046 using @kbd{Compose} or @kbd{AltGr} keys.} @xref{User Input}. |
0699f691fac1
Don't conflate single-byte with European.
Dave Love <fx@gnu.org>
parents:
27156
diff
changeset
|
1047 |
25829 | 1048 @item |
1049 You can use an input method for the selected language environment. | |
1050 @xref{Input Methods}. When you use an input method in a unibyte buffer, | |
1051 the non-ASCII character you specify with it is converted to unibyte. | |
1052 | |
1053 @kindex C-x 8 | |
1054 @cindex @code{iso-transl} library | |
1055 @item | |
1056 For Latin-1 only, you can use the | |
1057 key @kbd{C-x 8} as a ``compose character'' prefix for entry of | |
1058 non-ASCII Latin-1 printing characters. @kbd{C-x 8} is good for | |
1059 insertion (in the minibuffer as well as other buffers), for searching, | |
1060 and in any other context where a key sequence is allowed. | |
1061 | |
1062 @kbd{C-x 8} works by loading the @code{iso-transl} library. Once that | |
1063 library is loaded, the @key{ALT} modifier key, if you have one, serves | |
1064 the same purpose as @kbd{C-x 8}; use @key{ALT} together with an accent | |
1065 character to modify the following letter. In addition, if you have keys | |
1066 for the Latin-1 ``dead accent characters'', they too are defined to | |
1067 compose with the following character, once @code{iso-transl} is loaded. | |
1068 @end itemize |