Mercurial > emacs
comparison man/mule.texi @ 68549:9aa281f8a64b
Minor clarifications.
Reduce the specific references to X Windows.
Refer to "graphical" terminals, rather than window systems.
(Text Coding): Renamed from Specify Coding.
(Communication Coding, File Name Coding, Terminal Coding):
New nodes split out from Text Coding.
author | Richard M. Stallman <rms@gnu.org> |
---|---|
date | Thu, 02 Feb 2006 04:40:52 +0000 |
parents | 3723093a21fd |
children | 99dedfb3d00e |
comparison
equal
deleted
inserted
replaced
68548:cd4235065942 | 68549:9aa281f8a64b |
---|---|
38 Emacs supports a wide variety of international character sets, | 38 Emacs supports a wide variety of international character sets, |
39 including European and Vietnamese variants of the Latin alphabet, as | 39 including European and Vietnamese variants of the Latin alphabet, as |
40 well as Cyrillic, Devanagari (for Hindi and Marathi), Ethiopic, Greek, | 40 well as Cyrillic, Devanagari (for Hindi and Marathi), Ethiopic, Greek, |
41 Han (for Chinese and Japanese), Hangul (for Korean), Hebrew, IPA, | 41 Han (for Chinese and Japanese), Hangul (for Korean), Hebrew, IPA, |
42 Kannada, Lao, Malayalam, Tamil, Thai, Tibetan, and Vietnamese scripts. | 42 Kannada, Lao, Malayalam, Tamil, Thai, Tibetan, and Vietnamese scripts. |
43 These features have been merged from the modified version of Emacs | 43 Emacs also supports various encodings of these characters used by |
44 known as MULE (for ``MULti-lingual Enhancement to GNU Emacs'') | |
45 | |
46 Emacs also supports various encodings of these characters used by | |
47 other internationalized software, such as word processors and mailers. | 44 other internationalized software, such as word processors and mailers. |
48 | 45 |
49 Emacs allows editing text with international characters by supporting | 46 Emacs allows editing text with international characters by supporting |
50 all the related activities: | 47 all the related activities: |
51 | 48 |
55 pass non-@acronym{ASCII} text between Emacs and programs it invokes (such as | 52 pass non-@acronym{ASCII} text between Emacs and programs it invokes (such as |
56 compilers, spell-checkers, and mailers). Setting your language | 53 compilers, spell-checkers, and mailers). Setting your language |
57 environment (@pxref{Language Environments}) takes care of setting up the | 54 environment (@pxref{Language Environments}) takes care of setting up the |
58 coding systems and other options for a specific language or culture. | 55 coding systems and other options for a specific language or culture. |
59 Alternatively, you can specify how Emacs should encode or decode text | 56 Alternatively, you can specify how Emacs should encode or decode text |
60 for each command; see @ref{Specify Coding}. | 57 for each command; see @ref{Text Coding}. |
61 | 58 |
62 @item | 59 @item |
63 You can display non-@acronym{ASCII} characters encoded by the various scripts. | 60 You can display non-@acronym{ASCII} characters encoded by the various |
64 This works by using appropriate fonts on X and similar graphics | 61 scripts. This works by using appropriate fonts on graphics displays |
65 displays (@pxref{Defining Fontsets}), and by sending special codes to | 62 (@pxref{Defining Fontsets}), and by sending special codes to text-only |
66 text-only displays (@pxref{Specify Coding}). If some characters are | 63 displays (@pxref{Terminal Coding}). If some characters are displayed |
67 displayed incorrectly, refer to @ref{Undisplayable Characters}, which | 64 incorrectly, refer to @ref{Undisplayable Characters}, which describes |
68 describes possible problems and explains how to solve them. | 65 possible problems and explains how to solve them. |
69 | 66 |
70 @item | 67 @item |
71 You can insert non-@acronym{ASCII} characters or search for them. To do that, | 68 You can insert non-@acronym{ASCII} characters or search for them. To do that, |
72 you can specify an input method (@pxref{Select Input Method}) suitable | 69 you can specify an input method (@pxref{Select Input Method}) suitable |
73 for your language, or use the default input method set up when you set | 70 for your language, or use the default input method set up when you set |
74 your language environment. If | 71 your language environment. If |
75 your keyboard can produce non-@acronym{ASCII} characters, you can select an | 72 your keyboard can produce non-@acronym{ASCII} characters, you can select an |
76 appropriate keyboard coding system (@pxref{Specify Coding}), and Emacs | 73 appropriate keyboard coding system (@pxref{Terminal Coding}), and Emacs |
77 will accept those characters. Latin-1 characters can also be input by | 74 will accept those characters. Latin-1 characters can also be input by |
78 using the @kbd{C-x 8} prefix, see @ref{Single-Byte Character Support, | 75 using the @kbd{C-x 8} prefix, see @ref{Single-Byte Character Support, |
79 C-x 8}. On X Window systems, your locale should be set to an | 76 C-x 8}. |
80 appropriate value to make sure Emacs interprets keyboard input | 77 |
81 correctly; see @ref{Language Environments, locales}. | 78 On X Window systems, your locale should be set to an appropriate value |
79 to make sure Emacs interprets keyboard input correctly; see | |
80 @ref{Language Environments, locales}. | |
82 @end itemize | 81 @end itemize |
83 | 82 |
84 The rest of this chapter describes these issues in detail. | 83 The rest of this chapter describes these issues in detail. |
85 | 84 |
86 @menu | 85 @menu |
91 * Select Input Method:: Specifying your choice of input methods. | 90 * Select Input Method:: Specifying your choice of input methods. |
92 * Multibyte Conversion:: How single-byte characters convert to multibyte. | 91 * Multibyte Conversion:: How single-byte characters convert to multibyte. |
93 * Coding Systems:: Character set conversion when you read and | 92 * Coding Systems:: Character set conversion when you read and |
94 write files, and so on. | 93 write files, and so on. |
95 * Recognize Coding:: How Emacs figures out which conversion to use. | 94 * Recognize Coding:: How Emacs figures out which conversion to use. |
96 * Specify Coding:: Various ways to choose which conversion to use. | 95 * Text Coding:: Choosing conversion to use for file text. |
96 * Communications Coding:: Coding systems for interprocess communication. | |
97 * File Name Coding:: Coding systems for file @emph{names}. | |
98 * Terminal Coding:: Specifying coding systems for converting | |
99 terminal input and output. | |
97 * Fontsets:: Fontsets are collections of fonts | 100 * Fontsets:: Fontsets are collections of fonts |
98 that cover the whole spectrum of characters. | 101 that cover the whole spectrum of characters. |
99 * Defining Fontsets:: Defining a new fontset. | 102 * Defining Fontsets:: Defining a new fontset. |
100 * Undisplayable Characters:: When characters don't display. | 103 * Undisplayable Characters:: When characters don't display. |
101 * Single-Byte Character Support:: You can pick one European character set | 104 * Single-Byte Character Support:: You can pick one European character set |
104 @end menu | 107 @end menu |
105 | 108 |
106 @node International Chars | 109 @node International Chars |
107 @section Introduction to International Character Sets | 110 @section Introduction to International Character Sets |
108 | 111 |
109 The users of international character sets and scripts have established | 112 The users of international character sets and scripts have |
110 many more-or-less standard coding systems for storing files. Emacs | 113 established many more-or-less standard coding systems for storing |
111 internally uses a single multibyte character encoding, so that it can | 114 files. Emacs internally uses a single multibyte character encoding, |
112 intermix characters from all these scripts in a single buffer or string. | 115 so that it can intermix characters from all these scripts in a single |
113 This encoding represents each non-@acronym{ASCII} character as a sequence of bytes | 116 buffer or string. This encoding represents each non-@acronym{ASCII} |
114 in the range 0200 through 0377. Emacs translates between the multibyte | 117 character as a sequence of bytes in the range 0200 through 0377. |
115 character encoding and various other coding systems when reading and | 118 Emacs translates between the multibyte character encoding and various |
116 writing files, when exchanging data with subprocesses, and (in some | 119 other coding systems when reading and writing files, when exchanging |
117 cases) in the @kbd{C-q} command (@pxref{Multibyte Conversion}). | 120 data with subprocesses, and (in some cases) in the @kbd{C-q} command |
121 (@pxref{Multibyte Conversion}). | |
118 | 122 |
119 @kindex C-h h | 123 @kindex C-h h |
120 @findex view-hello-file | 124 @findex view-hello-file |
121 @cindex undisplayable characters | 125 @cindex undisplayable characters |
122 @cindex @samp{?} in display | 126 @cindex @samp{?} in display |
136 to multibyte characters, coding systems, and input methods. | 140 to multibyte characters, coding systems, and input methods. |
137 | 141 |
138 @node Enabling Multibyte | 142 @node Enabling Multibyte |
139 @section Enabling Multibyte Characters | 143 @section Enabling Multibyte Characters |
140 | 144 |
145 By default, Emacs starts in multibyte mode, because that allows you to | |
146 use all the supported languages and scripts without limitations. | |
147 | |
141 @cindex turn multibyte support on or off | 148 @cindex turn multibyte support on or off |
142 You can enable or disable multibyte character support, either for | 149 You can enable or disable multibyte character support, either for |
143 Emacs as a whole, or for a single buffer. When multibyte characters are | 150 Emacs as a whole, or for a single buffer. When multibyte characters |
144 disabled in a buffer, then each byte in that buffer represents a | 151 are disabled in a buffer, we call that @dfn{unibyte mode}. Then each |
145 character, even codes 0200 through 0377. The old features for | 152 byte in that buffer represents a character, even codes 0200 through |
146 supporting the European character sets, ISO Latin-1 and ISO Latin-2, | 153 0377. |
147 work as they did in Emacs 19 and also work for the other ISO 8859 | 154 |
148 character sets. | 155 The old features for supporting the European character sets, ISO |
149 | 156 Latin-1 and ISO Latin-2, work in unibyte mode as they did in Emacs 19 |
150 However, there is no need to turn off multibyte character support to | 157 and also work for the other ISO 8859 character sets. However, there |
151 use ISO Latin; the Emacs multibyte character set includes all the | 158 is no need to turn off multibyte character support to use ISO Latin; |
152 characters in these character sets, and Emacs can translate | 159 the Emacs multibyte character set includes all the characters in these |
153 automatically to and from the ISO codes. | 160 character sets, and Emacs can translate automatically to and from the |
154 | 161 ISO codes. |
155 By default, Emacs starts in multibyte mode, because that allows you to | |
156 use all the supported languages and scripts without limitations. | |
157 | 162 |
158 To edit a particular file in unibyte representation, visit it using | 163 To edit a particular file in unibyte representation, visit it using |
159 @code{find-file-literally}. @xref{Visiting}. To convert a buffer in | 164 @code{find-file-literally}. @xref{Visiting}. To convert a buffer in |
160 multibyte representation into a single-byte representation of the same | 165 multibyte representation into a single-byte representation of the same |
161 characters, the easiest way is to save the contents in a file, kill the | 166 characters, the easiest way is to save the contents in a file, kill the |
162 buffer, and find the file again with @code{find-file-literally}. You | 167 buffer, and find the file again with @code{find-file-literally}. You |
163 can also use @kbd{C-x @key{RET} c} | 168 can also use @kbd{C-x @key{RET} c} |
164 (@code{universal-coding-system-argument}) and specify @samp{raw-text} as | 169 (@code{universal-coding-system-argument}) and specify @samp{raw-text} as |
165 the coding system with which to find or save a file. @xref{Specify | 170 the coding system with which to find or save a file. @xref{Text |
166 Coding}. Finding a file as @samp{raw-text} doesn't disable format | 171 Coding}. Finding a file as @samp{raw-text} doesn't disable format |
167 conversion, uncompression and auto mode selection as | 172 conversion, uncompression and auto mode selection as |
168 @code{find-file-literally} does. | 173 @code{find-file-literally} does. |
169 | 174 |
170 @vindex enable-multibyte-characters | 175 @vindex enable-multibyte-characters |
207 @key{RET} c raw-text @key{RET}} immediately before loading it. | 212 @key{RET} c raw-text @key{RET}} immediately before loading it. |
208 | 213 |
209 The mode line indicates whether multibyte character support is enabled | 214 The mode line indicates whether multibyte character support is enabled |
210 in the current buffer. If it is, there are two or more characters (most | 215 in the current buffer. If it is, there are two or more characters (most |
211 often two dashes) before the colon near the beginning of the mode line. | 216 often two dashes) before the colon near the beginning of the mode line. |
212 When multibyte characters are not enabled, just one dash precedes the | 217 When multibyte characters are not enabled, nothing precedes the colon |
213 colon. | 218 except a single dash. |
214 | 219 |
215 @node Language Environments | 220 @node Language Environments |
216 @section Language Environments | 221 @section Language Environments |
217 @cindex language environments | 222 @cindex language environments |
218 | 223 |
312 | 317 |
313 @kindex C-h L | 318 @kindex C-h L |
314 @findex describe-language-environment | 319 @findex describe-language-environment |
315 To display information about the effects of a certain language | 320 To display information about the effects of a certain language |
316 environment @var{lang-env}, use the command @kbd{C-h L @var{lang-env} | 321 environment @var{lang-env}, use the command @kbd{C-h L @var{lang-env} |
317 @key{RET}} (@code{describe-language-environment}). This tells you which | 322 @key{RET}} (@code{describe-language-environment}). This tells you |
318 languages this language environment is useful for, and lists the | 323 which languages this language environment is useful for, and lists the |
319 character sets, coding systems, and input methods that go with it. It | 324 character sets, coding systems, and input methods that go with it. It |
320 also shows some sample text to illustrate scripts used in this language | 325 also shows some sample text to illustrate scripts used in this |
321 environment. By default, this command describes the chosen language | 326 language environment. If you give an empty input for @var{lang-env}, |
322 environment. | 327 this command describes the chosen language environment. |
323 | 328 |
324 @vindex set-language-environment-hook | 329 @vindex set-language-environment-hook |
325 You can customize any language environment with the normal hook | 330 You can customize any language environment with the normal hook |
326 @code{set-language-environment-hook}. The command | 331 @code{set-language-environment-hook}. The command |
327 @code{set-language-environment} runs that hook after setting up the new | 332 @code{set-language-environment} runs that hook after setting up the new |
481 language environment that it is meant to be used with. The variable | 486 language environment that it is meant to be used with. The variable |
482 @code{current-input-method} records which input method is selected. | 487 @code{current-input-method} records which input method is selected. |
483 | 488 |
484 @findex toggle-input-method | 489 @findex toggle-input-method |
485 @kindex C-\ | 490 @kindex C-\ |
486 Input methods use various sequences of @acronym{ASCII} characters to stand for | 491 Input methods use various sequences of @acronym{ASCII} characters to |
487 non-@acronym{ASCII} characters. Sometimes it is useful to turn off the input | 492 stand for non-@acronym{ASCII} characters. Sometimes it is useful to |
488 method temporarily. To do this, type @kbd{C-\} | 493 turn off the input method temporarily. To do this, type @kbd{C-\} |
489 (@code{toggle-input-method}). To reenable the input method, type | 494 (@code{toggle-input-method}). To reenable the input method, type |
490 @kbd{C-\} again. | 495 @kbd{C-\} again. |
491 | 496 |
492 If you type @kbd{C-\} and you have not yet selected an input method, | 497 If you type @kbd{C-\} and you have not yet selected an input method, |
493 it prompts for you to specify one. This has the same effect as using | 498 it prompts for you to specify one. This has the same effect as using |
672 predictable. For example, the coding system @code{iso-latin-1} has | 677 predictable. For example, the coding system @code{iso-latin-1} has |
673 variants @code{iso-latin-1-unix}, @code{iso-latin-1-dos} and | 678 variants @code{iso-latin-1-unix}, @code{iso-latin-1-dos} and |
674 @code{iso-latin-1-mac}. | 679 @code{iso-latin-1-mac}. |
675 | 680 |
676 The coding system @code{raw-text} is good for a file which is mainly | 681 The coding system @code{raw-text} is good for a file which is mainly |
677 @acronym{ASCII} text, but may contain byte values above 127 which are not meant to | 682 @acronym{ASCII} text, but may contain byte values above 127 which are |
678 encode non-@acronym{ASCII} characters. With @code{raw-text}, Emacs copies those | 683 not meant to encode non-@acronym{ASCII} characters. With |
679 byte values unchanged, and sets @code{enable-multibyte-characters} to | 684 @code{raw-text}, Emacs copies those byte values unchanged, and sets |
680 @code{nil} in the current buffer so that they will be interpreted | 685 @code{enable-multibyte-characters} to @code{nil} in the current buffer |
681 properly. @code{raw-text} handles end-of-line conversion in the usual | 686 so that they will be interpreted properly. @code{raw-text} handles |
682 way, based on the data encountered, and has the usual three variants to | 687 end-of-line conversion in the usual way, based on the data |
683 specify the kind of end-of-line conversion to use. | 688 encountered, and has the usual three variants to specify the kind of |
689 end-of-line conversion to use. | |
684 | 690 |
685 In contrast, the coding system @code{no-conversion} specifies no | 691 In contrast, the coding system @code{no-conversion} specifies no |
686 character code conversion at all---none for non-@acronym{ASCII} byte values and | 692 character code conversion at all---none for non-@acronym{ASCII} byte values and |
687 none for end of line. This is useful for reading or writing binary | 693 none for end of line. This is useful for reading or writing binary |
688 files, tar files, and other files that must be examined verbatim. It, | 694 files, tar files, and other files that must be examined verbatim. It, |
820 pattern, are decoded correctly. One of the builtin | 826 pattern, are decoded correctly. One of the builtin |
821 @code{auto-coding-functions} detects the encoding for XML files. | 827 @code{auto-coding-functions} detects the encoding for XML files. |
822 | 828 |
823 If Emacs recognizes the encoding of a file incorrectly, you can | 829 If Emacs recognizes the encoding of a file incorrectly, you can |
824 reread the file using the correct coding system by typing @kbd{C-x | 830 reread the file using the correct coding system by typing @kbd{C-x |
825 @key{RET} r @var{coding-system} | 831 @key{RET} r @var{coding-system} @key{RET}}. To see what coding system |
826 @key{RET}}. To see what coding system Emacs actually used to decode | 832 Emacs actually used to decode the file, look at the coding system |
827 the file, look at the coding system mnemonic letter near the left edge | 833 mnemonic letter near the left edge of the mode line (@pxref{Mode |
828 of the mode line (@pxref{Mode Line}), or type @kbd{C-h C @key{RET}}. | 834 Line}), or type @kbd{C-h C @key{RET}}. |
829 | 835 |
830 @findex unify-8859-on-decoding-mode | 836 @findex unify-8859-on-decoding-mode |
831 The command @code{unify-8859-on-decoding-mode} enables a mode that | 837 The command @code{unify-8859-on-decoding-mode} enables a mode that |
832 ``unifies'' the Latin alphabets when decoding text. This works by | 838 ``unifies'' the Latin alphabets when decoding text. This works by |
833 converting all non-@acronym{ASCII} Latin-@var{n} characters to either Latin-1 or | 839 converting all non-@acronym{ASCII} Latin-@var{n} characters to either |
834 Unicode characters. This way it is easier to use various | 840 Latin-1 or Unicode characters. This way it is easier to use various |
835 Latin-@var{n} alphabets together. In a future Emacs version we hope | 841 Latin-@var{n} alphabets together. In a future Emacs version we hope |
836 to move towards full Unicode support and complete unification of | 842 to move towards full Unicode support and complete unification of |
837 character sets. | 843 character sets. |
838 | 844 |
839 @vindex buffer-file-coding-system | 845 @vindex buffer-file-coding-system |
841 coding system in @code{buffer-file-coding-system} and uses that coding | 847 coding system in @code{buffer-file-coding-system} and uses that coding |
842 system, by default, for operations that write from this buffer into a | 848 system, by default, for operations that write from this buffer into a |
843 file. This includes the commands @code{save-buffer} and | 849 file. This includes the commands @code{save-buffer} and |
844 @code{write-region}. If you want to write files from this buffer using | 850 @code{write-region}. If you want to write files from this buffer using |
845 a different coding system, you can specify a different coding system for | 851 a different coding system, you can specify a different coding system for |
846 the buffer using @code{set-buffer-file-coding-system} (@pxref{Specify | 852 the buffer using @code{set-buffer-file-coding-system} (@pxref{Text |
847 Coding}). | 853 Coding}). |
848 | 854 |
849 You can insert any possible character into any Emacs buffer, but | 855 You can insert any possible character into any Emacs buffer, but |
850 most coding systems can only handle some of the possible characters. | 856 most coding systems can only handle some of the possible characters. |
851 This means that it is possible for you to insert characters that | 857 This means that it is possible for you to insert characters that |
899 system specified by the variable @code{rmail-file-coding-system}. The | 905 system specified by the variable @code{rmail-file-coding-system}. The |
900 default value is @code{nil}, which means that Rmail files are not | 906 default value is @code{nil}, which means that Rmail files are not |
901 translated (they are read and written in the Emacs internal character | 907 translated (they are read and written in the Emacs internal character |
902 code). | 908 code). |
903 | 909 |
904 @node Specify Coding | 910 @node Text Coding |
905 @section Specifying a Coding System | 911 @section Specifying a Coding System for File Text |
906 | 912 |
907 In cases where Emacs does not automatically choose the right coding | 913 In cases where Emacs does not automatically choose the right coding |
908 system, you can use these commands to specify one: | 914 system for a file's contents, you can use these commands to specify |
915 one: | |
909 | 916 |
910 @table @kbd | 917 @table @kbd |
911 @item C-x @key{RET} f @var{coding} @key{RET} | 918 @item C-x @key{RET} f @var{coding} @key{RET} |
912 Use coding system @var{coding} for saving or revisiting the visited | 919 Use coding system @var{coding} for saving or revisiting the visited |
913 file in the current buffer. | 920 file in the current buffer. |
917 command. | 924 command. |
918 | 925 |
919 @item C-x @key{RET} r @var{coding} @key{RET} | 926 @item C-x @key{RET} r @var{coding} @key{RET} |
920 Revisit the current file using the coding system @var{coding}. | 927 Revisit the current file using the coding system @var{coding}. |
921 | 928 |
922 @item C-x @key{RET} k @var{coding} @key{RET} | 929 @item M-x recode-region @key{RET} @var{right} @key{RET} @var{wrong} @key{RET} |
923 Use coding system @var{coding} for keyboard input. | 930 Convert a region that was decoded using coding system @var{wrong}, |
924 | 931 decoding it using coding system @var{right} instead. |
925 @item C-x @key{RET} t @var{coding} @key{RET} | |
926 Use coding system @var{coding} for terminal output. | |
927 | |
928 @item C-x @key{RET} p @var{input-coding} @key{RET} @var{output-coding} @key{RET} | |
929 Use coding systems @var{input-coding} and @var{output-coding} for | |
930 subprocess input and output in the current buffer. | |
931 | |
932 @item C-x @key{RET} x @var{coding} @key{RET} | |
933 Use coding system @var{coding} for transferring selections to and from | |
934 other programs through the window system. | |
935 | |
936 @item C-x @key{RET} F @var{coding} @key{RET} | |
937 Use coding system @var{coding} for encoding and decoding file | |
938 @emph{names}. This affects the use of non-ASCII characters in file | |
939 names. It has no effect on reading and writing the @emph{contents} of | |
940 files. | |
941 | |
942 @item C-x @key{RET} X @var{coding} @key{RET} | |
943 Use coding system @var{coding} for transferring @emph{one} | |
944 selection---the next one---to or from the window system. | |
945 | |
946 @item M-x recode-region | |
947 Convert the region from a previous coding system to a new one. | |
948 @end table | 932 @end table |
949 | 933 |
950 @kindex C-x RET f | 934 @kindex C-x RET f |
951 @findex set-buffer-file-coding-system | 935 @findex set-buffer-file-coding-system |
952 The command @kbd{C-x @key{RET} f} | 936 The command @kbd{C-x @key{RET} f} |
976 contains characters that the coding system cannot handle. | 960 contains characters that the coding system cannot handle. |
977 | 961 |
978 Other file commands affected by a specified coding system include | 962 Other file commands affected by a specified coding system include |
979 @kbd{C-x C-i} and @kbd{C-x C-v}, as well as the other-window variants | 963 @kbd{C-x C-i} and @kbd{C-x C-v}, as well as the other-window variants |
980 of @kbd{C-x C-f}. @kbd{C-x @key{RET} c} also affects commands that | 964 of @kbd{C-x C-f}. @kbd{C-x @key{RET} c} also affects commands that |
981 start subprocesses, including @kbd{M-x shell} (@pxref{Shell}). | 965 start subprocesses, including @kbd{M-x shell} (@pxref{Shell}). If the |
982 | 966 immediately following command does not use the coding system, then |
983 If the immediately following command does not use the coding system, | 967 @kbd{C-x @key{RET} c} ultimately has no effect. |
984 then @kbd{C-x @key{RET} c} ultimately has no effect. | |
985 | 968 |
986 An easy way to visit a file with no conversion is with the @kbd{M-x | 969 An easy way to visit a file with no conversion is with the @kbd{M-x |
987 find-file-literally} command. @xref{Visiting}. | 970 find-file-literally} command. @xref{Visiting}. |
988 | 971 |
989 @vindex default-buffer-file-coding-system | 972 @vindex default-buffer-file-coding-system |
998 @findex revert-buffer-with-coding-system | 981 @findex revert-buffer-with-coding-system |
999 If you visit a file with a wrong coding system, you can correct this | 982 If you visit a file with a wrong coding system, you can correct this |
1000 with @kbd{C-x @key{RET} r} (@code{revert-buffer-with-coding-system}). | 983 with @kbd{C-x @key{RET} r} (@code{revert-buffer-with-coding-system}). |
1001 This visits the current file again, using a coding system you specify. | 984 This visits the current file again, using a coding system you specify. |
1002 | 985 |
1003 @kindex C-x RET t | 986 @findex recode-region |
1004 @findex set-terminal-coding-system | 987 If a piece of text has already been inserted into a buffer using the |
1005 The command @kbd{C-x @key{RET} t} (@code{set-terminal-coding-system}) | 988 wrong coding system, you can redo the decoding of it using @kbd{M-x |
1006 specifies the coding system for terminal output. If you specify a | 989 recode-region}. This prompts you for the proper coding system, then |
1007 character code for terminal output, all characters output to the | 990 for the wrong coding system that was actually used, and does the |
1008 terminal are translated into that coding system. | 991 conversion. It first encodes the region using the wrong coding system, |
1009 | 992 then decodes it again using the proper coding system. |
1010 This feature is useful for certain character-only terminals built to | 993 |
1011 support specific languages or character sets---for example, European | 994 @node Communication Coding |
1012 terminals that support one of the ISO Latin character sets. You need to | 995 @section Coding Systems for Interprocess Communication |
1013 specify the terminal coding system when using multibyte text, so that | 996 |
1014 Emacs knows which characters the terminal can actually handle. | 997 This section explains how to specify coding systems for use |
1015 | 998 in communication with other processes. |
1016 By default, output to the terminal is not translated at all, unless | 999 |
1017 Emacs can deduce the proper coding system from your terminal type or | 1000 @table @kbd |
1018 your locale specification (@pxref{Language Environments}). | 1001 @item C-x @key{RET} x @var{coding} @key{RET} |
1019 | 1002 Use coding system @var{coding} for transferring selections to and from |
1020 @kindex C-x RET k | 1003 other programs through the window system. |
1021 @findex set-keyboard-coding-system | 1004 |
1022 @vindex keyboard-coding-system | 1005 @item C-x @key{RET} X @var{coding} @key{RET} |
1023 The command @kbd{C-x @key{RET} k} (@code{set-keyboard-coding-system}) | 1006 Use coding system @var{coding} for transferring @emph{one} |
1024 or the variable @code{keyboard-coding-system} specifies the coding | 1007 selection---the next one---to or from the window system. |
1025 system for keyboard input. Character-code translation of keyboard | 1008 |
1026 input is useful for terminals with keys that send non-@acronym{ASCII} | 1009 @item C-x @key{RET} p @var{input-coding} @key{RET} @var{output-coding} @key{RET} |
1027 graphic characters---for example, some terminals designed for ISO | 1010 Use coding systems @var{input-coding} and @var{output-coding} for |
1028 Latin-1 or subsets of it. | 1011 subprocess input and output in the current buffer. |
1029 | 1012 |
1030 By default, keyboard input is translated based on your system locale | 1013 @item C-x @key{RET} c @var{coding} @key{RET} |
1031 setting. If your terminal does not really support the encoding | 1014 Specify coding system @var{coding} for the immediately following |
1032 implied by your locale (for example, if you find it inserts a | 1015 command. |
1033 non-@acronym{ASCII} character if you type @kbd{M-i}), you will need to set | 1016 @end table |
1034 @code{keyboard-coding-system} to @code{nil} to turn off encoding. | |
1035 You can do this by putting | |
1036 | |
1037 @lisp | |
1038 (set-keyboard-coding-system nil) | |
1039 @end lisp | |
1040 | |
1041 @noindent | |
1042 in your @file{~/.emacs} file. | |
1043 | |
1044 There is a similarity between using a coding system translation for | |
1045 keyboard input, and using an input method: both define sequences of | |
1046 keyboard input that translate into single characters. However, input | |
1047 methods are designed to be convenient for interactive use by humans, and | |
1048 the sequences that are translated are typically sequences of @acronym{ASCII} | |
1049 printing characters. Coding systems typically translate sequences of | |
1050 non-graphic characters. | |
1051 | 1017 |
1052 @kindex C-x RET x | 1018 @kindex C-x RET x |
1053 @kindex C-x RET X | 1019 @kindex C-x RET X |
1054 @findex set-selection-coding-system | 1020 @findex set-selection-coding-system |
1055 @findex set-next-selection-coding-system | 1021 @findex set-next-selection-coding-system |
1056 The command @kbd{C-x @key{RET} x} (@code{set-selection-coding-system}) | 1022 The command @kbd{C-x @key{RET} x} (@code{set-selection-coding-system}) |
1057 specifies the coding system for sending selected text to the window | 1023 specifies the coding system for sending selected text to other windowing |
1058 system, and for receiving the text of selections made in other | 1024 applications, and for receiving the text of selections made in other |
1059 applications. This command applies to all subsequent selections, until | 1025 applications. This command applies to all subsequent selections, until |
1060 you override it by using the command again. The command @kbd{C-x | 1026 you override it by using the command again. The command @kbd{C-x |
1061 @key{RET} X} (@code{set-next-selection-coding-system}) specifies the | 1027 @key{RET} X} (@code{set-next-selection-coding-system}) specifies the |
1062 coding system for the next selection made in Emacs or read by Emacs. | 1028 coding system for the next selection made in Emacs or read by Emacs. |
1063 | 1029 |
1068 command applies to the current buffer; normally, each subprocess has its | 1034 command applies to the current buffer; normally, each subprocess has its |
1069 own buffer, and thus you can use this command to specify translation to | 1035 own buffer, and thus you can use this command to specify translation to |
1070 and from a particular subprocess by giving the command in the | 1036 and from a particular subprocess by giving the command in the |
1071 corresponding buffer. | 1037 corresponding buffer. |
1072 | 1038 |
1039 You can also use @kbd{C-x @key{RET} c} just before the command that | |
1040 runs or starts a subprocess, to specify the coding system to use for | |
1041 communication with that subprocess. | |
1042 | |
1073 The default for translation of process input and output depends on the | 1043 The default for translation of process input and output depends on the |
1074 current language environment. | 1044 current language environment. |
1075 | |
1076 @findex recode-region | |
1077 If a piece of text has already been inserted into a buffer using the | |
1078 wrong coding system, you can decode it again using @kbd{M-x | |
1079 recode-region}. This prompts you for the old coding system and the | |
1080 desired coding system, and acts on the text in the region. | |
1081 | |
1082 @vindex file-name-coding-system | |
1083 @cindex file names with non-@acronym{ASCII} characters | |
1084 @findex set-file-name-coding-system | |
1085 @kindex C-x @key{RET} F | |
1086 The variable @code{file-name-coding-system} specifies a coding | |
1087 system to use for encoding file names. If you set the variable to a | |
1088 coding system name (as a Lisp symbol or a string), Emacs encodes file | |
1089 names using that coding system for all file operations. This makes it | |
1090 possible to use non-@acronym{ASCII} characters in file names---or, at | |
1091 least, those non-@acronym{ASCII} characters which the specified coding | |
1092 system can encode. Use @kbd{C-x @key{RET} F} | |
1093 (@code{set-file-name-coding-system}) to specify this interactively. | |
1094 | |
1095 If @code{file-name-coding-system} is @code{nil}, Emacs uses a default | |
1096 coding system determined by the selected language environment. In the | |
1097 default language environment, any non-@acronym{ASCII} characters in file names are | |
1098 not encoded specially; they appear in the file system using the internal | |
1099 Emacs representation. | |
1100 | |
1101 @strong{Warning:} if you change @code{file-name-coding-system} (or the | |
1102 language environment) in the middle of an Emacs session, problems can | |
1103 result if you have already visited files whose names were encoded using | |
1104 the earlier coding system and cannot be encoded (or are encoded | |
1105 differently) under the new coding system. If you try to save one of | |
1106 these buffers under the visited file name, saving may use the wrong file | |
1107 name, or it may get an error. If such a problem happens, use @kbd{C-x | |
1108 C-w} to specify a new file name for that buffer. | |
1109 | |
1110 @findex recode-file-name | |
1111 If a mistake occurs when encoding a file name, use the command | |
1112 @kbd{M-x recode-file-name} to change the file name's coding | |
1113 system. This prompts for an existing file name, its old coding | |
1114 system, and the coding system to which you wish to convert. | |
1115 | 1045 |
1116 @vindex locale-coding-system | 1046 @vindex locale-coding-system |
1117 @cindex decoding non-@acronym{ASCII} keyboard input on X | 1047 @cindex decoding non-@acronym{ASCII} keyboard input on X |
1118 The variable @code{locale-coding-system} specifies a coding system | 1048 The variable @code{locale-coding-system} specifies a coding system |
1119 to use when encoding and decoding system strings such as system error | 1049 to use when encoding and decoding system strings such as system error |
1124 specified by one of the environment variables @env{LC_ALL}, | 1054 specified by one of the environment variables @env{LC_ALL}, |
1125 @env{LC_CTYPE}, and @env{LANG}. (The first one, in the order | 1055 @env{LC_CTYPE}, and @env{LANG}. (The first one, in the order |
1126 specified above, whose value is nonempty is the one that determines | 1056 specified above, whose value is nonempty is the one that determines |
1127 the text representation.) | 1057 the text representation.) |
1128 | 1058 |
1059 @node File Name Coding | |
1060 @section Coding Systems for File Names | |
1061 | |
1062 @table @kbd | |
1063 @item C-x @key{RET} F @var{coding} @key{RET} | |
1064 Use coding system @var{coding} for encoding and decoding file | |
1065 @emph{names}. | |
1066 @end table | |
1067 | |
1068 @vindex file-name-coding-system | |
1069 @cindex file names with non-@acronym{ASCII} characters | |
1070 The variable @code{file-name-coding-system} specifies a coding | |
1071 system to use for encoding file names. It has no effect on reading | |
1072 and writing the @emph{contents} of files. | |
1073 | |
1074 @findex set-file-name-coding-system | |
1075 @kindex C-x @key{RET} F | |
1076 If you set the variable to a coding system name (as a Lisp symbol or | |
1077 a string), Emacs encodes file names using that coding system for all | |
1078 file operations. This makes it possible to use non-@acronym{ASCII} | |
1079 characters in file names---or, at least, those non-@acronym{ASCII} | |
1080 characters which the specified coding system can encode. Use @kbd{C-x | |
1081 @key{RET} F} (@code{set-file-name-coding-system}) to specify this | |
1082 interactively. | |
1083 | |
1084 If @code{file-name-coding-system} is @code{nil}, Emacs uses a | |
1085 default coding system determined by the selected language environment. | |
1086 In the default language environment, any non-@acronym{ASCII} | |
1087 characters in file names are not encoded specially; they appear in the | |
1088 file system using the internal Emacs representation. | |
1089 | |
1090 @strong{Warning:} if you change @code{file-name-coding-system} (or the | |
1091 language environment) in the middle of an Emacs session, problems can | |
1092 result if you have already visited files whose names were encoded using | |
1093 the earlier coding system and cannot be encoded (or are encoded | |
1094 differently) under the new coding system. If you try to save one of | |
1095 these buffers under the visited file name, saving may use the wrong file | |
1096 name, or it may get an error. If such a problem happens, use @kbd{C-x | |
1097 C-w} to specify a new file name for that buffer. | |
1098 | |
1099 @findex recode-file-name | |
1100 If a mistake occurs when encoding a file name, use the command | |
1101 @kbd{M-x recode-file-name} to change the file name's coding | |
1102 system. This prompts for an existing file name, its old coding | |
1103 system, and the coding system to which you wish to convert. | |
1104 | |
1105 @node Terminal Coding | |
1106 @section Coding Systems for Terminal I/O | |
1107 | |
1108 @table @kbd | |
1109 @item C-x @key{RET} k @var{coding} @key{RET} | |
1110 Use coding system @var{coding} for keyboard input. | |
1111 | |
1112 @item C-x @key{RET} t @var{coding} @key{RET} | |
1113 Use coding system @var{coding} for terminal output. | |
1114 @end table | |
1115 | |
1116 @kindex C-x RET t | |
1117 @findex set-terminal-coding-system | |
1118 The command @kbd{C-x @key{RET} t} (@code{set-terminal-coding-system}) | |
1119 specifies the coding system for terminal output. If you specify a | |
1120 character code for terminal output, all characters output to the | |
1121 terminal are translated into that coding system. | |
1122 | |
1123 This feature is useful for certain character-only terminals built to | |
1124 support specific languages or character sets---for example, European | |
1125 terminals that support one of the ISO Latin character sets. You need to | |
1126 specify the terminal coding system when using multibyte text, so that | |
1127 Emacs knows which characters the terminal can actually handle. | |
1128 | |
1129 By default, output to the terminal is not translated at all, unless | |
1130 Emacs can deduce the proper coding system from your terminal type or | |
1131 your locale specification (@pxref{Language Environments}). | |
1132 | |
1133 @kindex C-x RET k | |
1134 @findex set-keyboard-coding-system | |
1135 @vindex keyboard-coding-system | |
1136 The command @kbd{C-x @key{RET} k} (@code{set-keyboard-coding-system}) | |
1137 or the variable @code{keyboard-coding-system} specifies the coding | |
1138 system for keyboard input. Character-code translation of keyboard | |
1139 input is useful for terminals with keys that send non-@acronym{ASCII} | |
1140 graphic characters---for example, some terminals designed for ISO | |
1141 Latin-1 or subsets of it. | |
1142 | |
1143 By default, keyboard input is translated based on your system locale | |
1144 setting. If your terminal does not really support the encoding | |
1145 implied by your locale (for example, if you find it inserts a | |
1146 non-@acronym{ASCII} character if you type @kbd{M-i}), you will need to set | |
1147 @code{keyboard-coding-system} to @code{nil} to turn off encoding. | |
1148 You can do this by putting | |
1149 | |
1150 @lisp | |
1151 (set-keyboard-coding-system nil) | |
1152 @end lisp | |
1153 | |
1154 @noindent | |
1155 in your @file{~/.emacs} file. | |
1156 | |
1157 There is a similarity between using a coding system translation for | |
1158 keyboard input, and using an input method: both define sequences of | |
1159 keyboard input that translate into single characters. However, input | |
1160 methods are designed to be convenient for interactive use by humans, and | |
1161 the sequences that are translated are typically sequences of @acronym{ASCII} | |
1162 printing characters. Coding systems typically translate sequences of | |
1163 non-graphic characters. | |
1164 | |
1129 @node Fontsets | 1165 @node Fontsets |
1130 @section Fontsets | 1166 @section Fontsets |
1131 @cindex fontsets | 1167 @cindex fontsets |
1132 | 1168 |
1133 A font for X typically defines shapes for a single alphabet or script. | 1169 A font for X Windows typically defines shapes for a single alphabet |
1134 Therefore, displaying the entire range of scripts that Emacs supports | 1170 or script. Therefore, displaying the entire range of scripts that |
1135 requires a collection of many fonts. In Emacs, such a collection is | 1171 Emacs supports requires a collection of many fonts. In Emacs, such a |
1136 called a @dfn{fontset}. A fontset is defined by a list of fonts, each | 1172 collection is called a @dfn{fontset}. A fontset is defined by a list |
1137 assigned to handle a range of character codes. | 1173 of fonts, each assigned to handle a range of character codes. |
1138 | 1174 |
1139 Each fontset has a name, like a font. The available X fonts are | 1175 Each fontset has a name, like a font. The available X fonts are |
1140 defined by the X server; fontsets, however, are defined within Emacs | 1176 defined by the X server; fontsets, however, are defined within Emacs |
1141 itself. Once you have defined a fontset, you can use it within Emacs by | 1177 itself. Once you have defined a fontset, you can use it within Emacs by |
1142 specifying its name, anywhere that you could use a single font. Of | 1178 specifying its name, anywhere that you could use a single font. Of |
1146 characters.@footnote{The Emacs installation instructions have information on | 1182 characters.@footnote{The Emacs installation instructions have information on |
1147 additional font support.} | 1183 additional font support.} |
1148 | 1184 |
1149 Emacs creates two fontsets automatically: the @dfn{standard fontset} | 1185 Emacs creates two fontsets automatically: the @dfn{standard fontset} |
1150 and the @dfn{startup fontset}. The standard fontset is most likely to | 1186 and the @dfn{startup fontset}. The standard fontset is most likely to |
1151 have fonts for a wide variety of non-@acronym{ASCII} characters; however, this is | 1187 have fonts for a wide variety of non-@acronym{ASCII} characters; |
1152 not the default for Emacs to use. (By default, Emacs tries to find a | 1188 however, this is not the default for Emacs to use. (By default, Emacs |
1153 font that has bold and italic variants.) You can specify use of the | 1189 tries to find a font that has bold and italic variants.) You can |
1154 standard fontset with the @samp{-fn} option, or with the @samp{Font} X | 1190 specify use of the standard fontset with the @samp{-fn} option, or |
1155 resource (@pxref{Font X}). For example, | 1191 with the @samp{Font} X resource (@pxref{Font X}). For example, |
1156 | 1192 |
1157 @example | 1193 @example |
1158 emacs -fn fontset-standard | 1194 emacs -fn fontset-standard |
1159 @end example | 1195 @end example |
1160 | 1196 |
1293 | 1329 |
1294 @node Undisplayable Characters | 1330 @node Undisplayable Characters |
1295 @section Undisplayable Characters | 1331 @section Undisplayable Characters |
1296 | 1332 |
1297 There may be a some non-@acronym{ASCII} characters that your terminal cannot | 1333 There may be a some non-@acronym{ASCII} characters that your terminal cannot |
1298 display. Most non-windowing terminals support just a single character | 1334 display. Most text-only terminals support just a single character |
1299 set (use the variable @code{default-terminal-coding-system} | 1335 set (use the variable @code{default-terminal-coding-system} |
1300 (@pxref{Specify Coding}) to tell Emacs which one); characters which | 1336 (@pxref{Terminal Coding}) to tell Emacs which one); characters which |
1301 can't be encoded in that coding system are displayed as @samp{?} by | 1337 can't be encoded in that coding system are displayed as @samp{?} by |
1302 default. | 1338 default. |
1303 | 1339 |
1304 Windowing terminals can display a broader range of characters, but | 1340 Graphical displays can display a broader range of characters, but |
1305 you may not have fonts installed for all of them; characters that have | 1341 you may not have fonts installed for all of them; characters that have |
1306 no font appear as a hollow box. | 1342 no font appear as a hollow box. |
1307 | 1343 |
1308 If you use Latin-1 characters but your terminal can't display | 1344 If you use Latin-1 characters but your terminal can't display |
1309 Latin-1, you can arrange to display mnemonic @acronym{ASCII} sequences | 1345 Latin-1, you can arrange to display mnemonic @acronym{ASCII} sequences |
1333 set-language-environment} and specify a suitable language environment | 1369 set-language-environment} and specify a suitable language environment |
1334 such as @samp{Latin-@var{n}}. | 1370 such as @samp{Latin-@var{n}}. |
1335 | 1371 |
1336 For more information about unibyte operation, see @ref{Enabling | 1372 For more information about unibyte operation, see @ref{Enabling |
1337 Multibyte}. Note particularly that you probably want to ensure that | 1373 Multibyte}. Note particularly that you probably want to ensure that |
1338 your initialization files are read as unibyte if they contain non-@acronym{ASCII} | 1374 your initialization files are read as unibyte if they contain |
1339 characters. | 1375 non-@acronym{ASCII} characters. |
1340 | 1376 |
1341 @vindex unibyte-display-via-language-environment | 1377 @vindex unibyte-display-via-language-environment |
1342 Emacs can also display those characters, provided the terminal or font | 1378 Emacs can also display those characters, provided the terminal or font |
1343 in use supports them. This works automatically. Alternatively, if you | 1379 in use supports them. This works automatically. Alternatively, if you |
1344 are using a window system, Emacs can also display single-byte characters | 1380 are using a window system, Emacs can also display single-byte characters |
1375 @item | 1411 @item |
1376 If your keyboard can generate character codes 128 (decimal) and up, | 1412 If your keyboard can generate character codes 128 (decimal) and up, |
1377 representing non-@acronym{ASCII} characters, you can type those character codes | 1413 representing non-@acronym{ASCII} characters, you can type those character codes |
1378 directly. | 1414 directly. |
1379 | 1415 |
1380 On a window system, you should not need to do anything special to use | 1416 On a graphical display, you should not need to do anything special to use |
1381 these keys; they should simply work. On a text-only terminal, you | 1417 these keys; they should simply work. On a text-only terminal, you |
1382 should use the command @code{M-x set-keyboard-coding-system} or the | 1418 should use the command @code{M-x set-keyboard-coding-system} or the |
1383 variable @code{keyboard-coding-system} to specify which coding system | 1419 variable @code{keyboard-coding-system} to specify which coding system |
1384 your keyboard uses (@pxref{Specify Coding}). Enabling this feature | 1420 your keyboard uses (@pxref{Terminal Coding}). Enabling this feature |
1385 will probably require you to use @kbd{ESC} to type Meta characters; | 1421 will probably require you to use @kbd{ESC} to type Meta characters; |
1386 however, on a console terminal or in @code{xterm}, you can arrange for | 1422 however, on a console terminal or in @code{xterm}, you can arrange for |
1387 Meta to be converted to @kbd{ESC} and still be able type 8-bit | 1423 Meta to be converted to @kbd{ESC} and still be able type 8-bit |
1388 characters present directly on the keyboard or using @kbd{Compose} or | 1424 characters present directly on the keyboard or using @kbd{Compose} or |
1389 @kbd{AltGr} keys. @xref{User Input}. | 1425 @kbd{AltGr} keys. @xref{User Input}. |
1415 @cindex charsets | 1451 @cindex charsets |
1416 | 1452 |
1417 Emacs groups all supported characters into disjoint @dfn{charsets}. | 1453 Emacs groups all supported characters into disjoint @dfn{charsets}. |
1418 Each character code belongs to one and only one charset. For | 1454 Each character code belongs to one and only one charset. For |
1419 historical reasons, Emacs typically divides an 8-bit character code | 1455 historical reasons, Emacs typically divides an 8-bit character code |
1420 for an extended version of @acronym{ASCII} into two charsets: @acronym{ASCII}, which | 1456 for an extended version of @acronym{ASCII} into two charsets: |
1421 covers the codes 0 through 127, plus another charset which covers the | 1457 @acronym{ASCII}, which covers the codes 0 through 127, plus another |
1422 ``right-hand part'' (the codes 128 and up). For instance, the | 1458 charset which covers the ``right-hand part'' (the codes 128 and up). |
1423 characters of Latin-1 include the Emacs charset @code{ascii} plus the | 1459 For instance, the characters of Latin-1 include the Emacs charset |
1424 Emacs charset @code{latin-iso8859-1}. | 1460 @code{ascii} plus the Emacs charset @code{latin-iso8859-1}. |
1425 | 1461 |
1426 Emacs characters belonging to different charsets may look the same, | 1462 Emacs characters belonging to different charsets may look the same, |
1427 but they are still different characters. For example, the letter | 1463 but they are still different characters. For example, the letter |
1428 @samp{o} with acute accent in charset @code{latin-iso8859-1}, used for | 1464 @samp{o} with acute accent in charset @code{latin-iso8859-1}, used for |
1429 Latin-1, is different from the letter @samp{o} with acute accent in | 1465 Latin-1, is different from the letter @samp{o} with acute accent in |