comparison man/mule.texi @ 68549:9aa281f8a64b

Minor clarifications. Reduce the specific references to X Windows. Refer to "graphical" terminals, rather than window systems. (Text Coding): Renamed from Specify Coding. (Communication Coding, File Name Coding, Terminal Coding): New nodes split out from Text Coding.
author Richard M. Stallman <rms@gnu.org>
date Thu, 02 Feb 2006 04:40:52 +0000
parents 3723093a21fd
children 99dedfb3d00e
comparison
equal deleted inserted replaced
68548:cd4235065942 68549:9aa281f8a64b
38 Emacs supports a wide variety of international character sets, 38 Emacs supports a wide variety of international character sets,
39 including European and Vietnamese variants of the Latin alphabet, as 39 including European and Vietnamese variants of the Latin alphabet, as
40 well as Cyrillic, Devanagari (for Hindi and Marathi), Ethiopic, Greek, 40 well as Cyrillic, Devanagari (for Hindi and Marathi), Ethiopic, Greek,
41 Han (for Chinese and Japanese), Hangul (for Korean), Hebrew, IPA, 41 Han (for Chinese and Japanese), Hangul (for Korean), Hebrew, IPA,
42 Kannada, Lao, Malayalam, Tamil, Thai, Tibetan, and Vietnamese scripts. 42 Kannada, Lao, Malayalam, Tamil, Thai, Tibetan, and Vietnamese scripts.
43 These features have been merged from the modified version of Emacs 43 Emacs also supports various encodings of these characters used by
44 known as MULE (for ``MULti-lingual Enhancement to GNU Emacs'')
45
46 Emacs also supports various encodings of these characters used by
47 other internationalized software, such as word processors and mailers. 44 other internationalized software, such as word processors and mailers.
48 45
49 Emacs allows editing text with international characters by supporting 46 Emacs allows editing text with international characters by supporting
50 all the related activities: 47 all the related activities:
51 48
55 pass non-@acronym{ASCII} text between Emacs and programs it invokes (such as 52 pass non-@acronym{ASCII} text between Emacs and programs it invokes (such as
56 compilers, spell-checkers, and mailers). Setting your language 53 compilers, spell-checkers, and mailers). Setting your language
57 environment (@pxref{Language Environments}) takes care of setting up the 54 environment (@pxref{Language Environments}) takes care of setting up the
58 coding systems and other options for a specific language or culture. 55 coding systems and other options for a specific language or culture.
59 Alternatively, you can specify how Emacs should encode or decode text 56 Alternatively, you can specify how Emacs should encode or decode text
60 for each command; see @ref{Specify Coding}. 57 for each command; see @ref{Text Coding}.
61 58
62 @item 59 @item
63 You can display non-@acronym{ASCII} characters encoded by the various scripts. 60 You can display non-@acronym{ASCII} characters encoded by the various
64 This works by using appropriate fonts on X and similar graphics 61 scripts. This works by using appropriate fonts on graphics displays
65 displays (@pxref{Defining Fontsets}), and by sending special codes to 62 (@pxref{Defining Fontsets}), and by sending special codes to text-only
66 text-only displays (@pxref{Specify Coding}). If some characters are 63 displays (@pxref{Terminal Coding}). If some characters are displayed
67 displayed incorrectly, refer to @ref{Undisplayable Characters}, which 64 incorrectly, refer to @ref{Undisplayable Characters}, which describes
68 describes possible problems and explains how to solve them. 65 possible problems and explains how to solve them.
69 66
70 @item 67 @item
71 You can insert non-@acronym{ASCII} characters or search for them. To do that, 68 You can insert non-@acronym{ASCII} characters or search for them. To do that,
72 you can specify an input method (@pxref{Select Input Method}) suitable 69 you can specify an input method (@pxref{Select Input Method}) suitable
73 for your language, or use the default input method set up when you set 70 for your language, or use the default input method set up when you set
74 your language environment. If 71 your language environment. If
75 your keyboard can produce non-@acronym{ASCII} characters, you can select an 72 your keyboard can produce non-@acronym{ASCII} characters, you can select an
76 appropriate keyboard coding system (@pxref{Specify Coding}), and Emacs 73 appropriate keyboard coding system (@pxref{Terminal Coding}), and Emacs
77 will accept those characters. Latin-1 characters can also be input by 74 will accept those characters. Latin-1 characters can also be input by
78 using the @kbd{C-x 8} prefix, see @ref{Single-Byte Character Support, 75 using the @kbd{C-x 8} prefix, see @ref{Single-Byte Character Support,
79 C-x 8}. On X Window systems, your locale should be set to an 76 C-x 8}.
80 appropriate value to make sure Emacs interprets keyboard input 77
81 correctly; see @ref{Language Environments, locales}. 78 On X Window systems, your locale should be set to an appropriate value
79 to make sure Emacs interprets keyboard input correctly; see
80 @ref{Language Environments, locales}.
82 @end itemize 81 @end itemize
83 82
84 The rest of this chapter describes these issues in detail. 83 The rest of this chapter describes these issues in detail.
85 84
86 @menu 85 @menu
91 * Select Input Method:: Specifying your choice of input methods. 90 * Select Input Method:: Specifying your choice of input methods.
92 * Multibyte Conversion:: How single-byte characters convert to multibyte. 91 * Multibyte Conversion:: How single-byte characters convert to multibyte.
93 * Coding Systems:: Character set conversion when you read and 92 * Coding Systems:: Character set conversion when you read and
94 write files, and so on. 93 write files, and so on.
95 * Recognize Coding:: How Emacs figures out which conversion to use. 94 * Recognize Coding:: How Emacs figures out which conversion to use.
96 * Specify Coding:: Various ways to choose which conversion to use. 95 * Text Coding:: Choosing conversion to use for file text.
96 * Communications Coding:: Coding systems for interprocess communication.
97 * File Name Coding:: Coding systems for file @emph{names}.
98 * Terminal Coding:: Specifying coding systems for converting
99 terminal input and output.
97 * Fontsets:: Fontsets are collections of fonts 100 * Fontsets:: Fontsets are collections of fonts
98 that cover the whole spectrum of characters. 101 that cover the whole spectrum of characters.
99 * Defining Fontsets:: Defining a new fontset. 102 * Defining Fontsets:: Defining a new fontset.
100 * Undisplayable Characters:: When characters don't display. 103 * Undisplayable Characters:: When characters don't display.
101 * Single-Byte Character Support:: You can pick one European character set 104 * Single-Byte Character Support:: You can pick one European character set
104 @end menu 107 @end menu
105 108
106 @node International Chars 109 @node International Chars
107 @section Introduction to International Character Sets 110 @section Introduction to International Character Sets
108 111
109 The users of international character sets and scripts have established 112 The users of international character sets and scripts have
110 many more-or-less standard coding systems for storing files. Emacs 113 established many more-or-less standard coding systems for storing
111 internally uses a single multibyte character encoding, so that it can 114 files. Emacs internally uses a single multibyte character encoding,
112 intermix characters from all these scripts in a single buffer or string. 115 so that it can intermix characters from all these scripts in a single
113 This encoding represents each non-@acronym{ASCII} character as a sequence of bytes 116 buffer or string. This encoding represents each non-@acronym{ASCII}
114 in the range 0200 through 0377. Emacs translates between the multibyte 117 character as a sequence of bytes in the range 0200 through 0377.
115 character encoding and various other coding systems when reading and 118 Emacs translates between the multibyte character encoding and various
116 writing files, when exchanging data with subprocesses, and (in some 119 other coding systems when reading and writing files, when exchanging
117 cases) in the @kbd{C-q} command (@pxref{Multibyte Conversion}). 120 data with subprocesses, and (in some cases) in the @kbd{C-q} command
121 (@pxref{Multibyte Conversion}).
118 122
119 @kindex C-h h 123 @kindex C-h h
120 @findex view-hello-file 124 @findex view-hello-file
121 @cindex undisplayable characters 125 @cindex undisplayable characters
122 @cindex @samp{?} in display 126 @cindex @samp{?} in display
136 to multibyte characters, coding systems, and input methods. 140 to multibyte characters, coding systems, and input methods.
137 141
138 @node Enabling Multibyte 142 @node Enabling Multibyte
139 @section Enabling Multibyte Characters 143 @section Enabling Multibyte Characters
140 144
145 By default, Emacs starts in multibyte mode, because that allows you to
146 use all the supported languages and scripts without limitations.
147
141 @cindex turn multibyte support on or off 148 @cindex turn multibyte support on or off
142 You can enable or disable multibyte character support, either for 149 You can enable or disable multibyte character support, either for
143 Emacs as a whole, or for a single buffer. When multibyte characters are 150 Emacs as a whole, or for a single buffer. When multibyte characters
144 disabled in a buffer, then each byte in that buffer represents a 151 are disabled in a buffer, we call that @dfn{unibyte mode}. Then each
145 character, even codes 0200 through 0377. The old features for 152 byte in that buffer represents a character, even codes 0200 through
146 supporting the European character sets, ISO Latin-1 and ISO Latin-2, 153 0377.
147 work as they did in Emacs 19 and also work for the other ISO 8859 154
148 character sets. 155 The old features for supporting the European character sets, ISO
149 156 Latin-1 and ISO Latin-2, work in unibyte mode as they did in Emacs 19
150 However, there is no need to turn off multibyte character support to 157 and also work for the other ISO 8859 character sets. However, there
151 use ISO Latin; the Emacs multibyte character set includes all the 158 is no need to turn off multibyte character support to use ISO Latin;
152 characters in these character sets, and Emacs can translate 159 the Emacs multibyte character set includes all the characters in these
153 automatically to and from the ISO codes. 160 character sets, and Emacs can translate automatically to and from the
154 161 ISO codes.
155 By default, Emacs starts in multibyte mode, because that allows you to
156 use all the supported languages and scripts without limitations.
157 162
158 To edit a particular file in unibyte representation, visit it using 163 To edit a particular file in unibyte representation, visit it using
159 @code{find-file-literally}. @xref{Visiting}. To convert a buffer in 164 @code{find-file-literally}. @xref{Visiting}. To convert a buffer in
160 multibyte representation into a single-byte representation of the same 165 multibyte representation into a single-byte representation of the same
161 characters, the easiest way is to save the contents in a file, kill the 166 characters, the easiest way is to save the contents in a file, kill the
162 buffer, and find the file again with @code{find-file-literally}. You 167 buffer, and find the file again with @code{find-file-literally}. You
163 can also use @kbd{C-x @key{RET} c} 168 can also use @kbd{C-x @key{RET} c}
164 (@code{universal-coding-system-argument}) and specify @samp{raw-text} as 169 (@code{universal-coding-system-argument}) and specify @samp{raw-text} as
165 the coding system with which to find or save a file. @xref{Specify 170 the coding system with which to find or save a file. @xref{Text
166 Coding}. Finding a file as @samp{raw-text} doesn't disable format 171 Coding}. Finding a file as @samp{raw-text} doesn't disable format
167 conversion, uncompression and auto mode selection as 172 conversion, uncompression and auto mode selection as
168 @code{find-file-literally} does. 173 @code{find-file-literally} does.
169 174
170 @vindex enable-multibyte-characters 175 @vindex enable-multibyte-characters
207 @key{RET} c raw-text @key{RET}} immediately before loading it. 212 @key{RET} c raw-text @key{RET}} immediately before loading it.
208 213
209 The mode line indicates whether multibyte character support is enabled 214 The mode line indicates whether multibyte character support is enabled
210 in the current buffer. If it is, there are two or more characters (most 215 in the current buffer. If it is, there are two or more characters (most
211 often two dashes) before the colon near the beginning of the mode line. 216 often two dashes) before the colon near the beginning of the mode line.
212 When multibyte characters are not enabled, just one dash precedes the 217 When multibyte characters are not enabled, nothing precedes the colon
213 colon. 218 except a single dash.
214 219
215 @node Language Environments 220 @node Language Environments
216 @section Language Environments 221 @section Language Environments
217 @cindex language environments 222 @cindex language environments
218 223
312 317
313 @kindex C-h L 318 @kindex C-h L
314 @findex describe-language-environment 319 @findex describe-language-environment
315 To display information about the effects of a certain language 320 To display information about the effects of a certain language
316 environment @var{lang-env}, use the command @kbd{C-h L @var{lang-env} 321 environment @var{lang-env}, use the command @kbd{C-h L @var{lang-env}
317 @key{RET}} (@code{describe-language-environment}). This tells you which 322 @key{RET}} (@code{describe-language-environment}). This tells you
318 languages this language environment is useful for, and lists the 323 which languages this language environment is useful for, and lists the
319 character sets, coding systems, and input methods that go with it. It 324 character sets, coding systems, and input methods that go with it. It
320 also shows some sample text to illustrate scripts used in this language 325 also shows some sample text to illustrate scripts used in this
321 environment. By default, this command describes the chosen language 326 language environment. If you give an empty input for @var{lang-env},
322 environment. 327 this command describes the chosen language environment.
323 328
324 @vindex set-language-environment-hook 329 @vindex set-language-environment-hook
325 You can customize any language environment with the normal hook 330 You can customize any language environment with the normal hook
326 @code{set-language-environment-hook}. The command 331 @code{set-language-environment-hook}. The command
327 @code{set-language-environment} runs that hook after setting up the new 332 @code{set-language-environment} runs that hook after setting up the new
481 language environment that it is meant to be used with. The variable 486 language environment that it is meant to be used with. The variable
482 @code{current-input-method} records which input method is selected. 487 @code{current-input-method} records which input method is selected.
483 488
484 @findex toggle-input-method 489 @findex toggle-input-method
485 @kindex C-\ 490 @kindex C-\
486 Input methods use various sequences of @acronym{ASCII} characters to stand for 491 Input methods use various sequences of @acronym{ASCII} characters to
487 non-@acronym{ASCII} characters. Sometimes it is useful to turn off the input 492 stand for non-@acronym{ASCII} characters. Sometimes it is useful to
488 method temporarily. To do this, type @kbd{C-\} 493 turn off the input method temporarily. To do this, type @kbd{C-\}
489 (@code{toggle-input-method}). To reenable the input method, type 494 (@code{toggle-input-method}). To reenable the input method, type
490 @kbd{C-\} again. 495 @kbd{C-\} again.
491 496
492 If you type @kbd{C-\} and you have not yet selected an input method, 497 If you type @kbd{C-\} and you have not yet selected an input method,
493 it prompts for you to specify one. This has the same effect as using 498 it prompts for you to specify one. This has the same effect as using
672 predictable. For example, the coding system @code{iso-latin-1} has 677 predictable. For example, the coding system @code{iso-latin-1} has
673 variants @code{iso-latin-1-unix}, @code{iso-latin-1-dos} and 678 variants @code{iso-latin-1-unix}, @code{iso-latin-1-dos} and
674 @code{iso-latin-1-mac}. 679 @code{iso-latin-1-mac}.
675 680
676 The coding system @code{raw-text} is good for a file which is mainly 681 The coding system @code{raw-text} is good for a file which is mainly
677 @acronym{ASCII} text, but may contain byte values above 127 which are not meant to 682 @acronym{ASCII} text, but may contain byte values above 127 which are
678 encode non-@acronym{ASCII} characters. With @code{raw-text}, Emacs copies those 683 not meant to encode non-@acronym{ASCII} characters. With
679 byte values unchanged, and sets @code{enable-multibyte-characters} to 684 @code{raw-text}, Emacs copies those byte values unchanged, and sets
680 @code{nil} in the current buffer so that they will be interpreted 685 @code{enable-multibyte-characters} to @code{nil} in the current buffer
681 properly. @code{raw-text} handles end-of-line conversion in the usual 686 so that they will be interpreted properly. @code{raw-text} handles
682 way, based on the data encountered, and has the usual three variants to 687 end-of-line conversion in the usual way, based on the data
683 specify the kind of end-of-line conversion to use. 688 encountered, and has the usual three variants to specify the kind of
689 end-of-line conversion to use.
684 690
685 In contrast, the coding system @code{no-conversion} specifies no 691 In contrast, the coding system @code{no-conversion} specifies no
686 character code conversion at all---none for non-@acronym{ASCII} byte values and 692 character code conversion at all---none for non-@acronym{ASCII} byte values and
687 none for end of line. This is useful for reading or writing binary 693 none for end of line. This is useful for reading or writing binary
688 files, tar files, and other files that must be examined verbatim. It, 694 files, tar files, and other files that must be examined verbatim. It,
820 pattern, are decoded correctly. One of the builtin 826 pattern, are decoded correctly. One of the builtin
821 @code{auto-coding-functions} detects the encoding for XML files. 827 @code{auto-coding-functions} detects the encoding for XML files.
822 828
823 If Emacs recognizes the encoding of a file incorrectly, you can 829 If Emacs recognizes the encoding of a file incorrectly, you can
824 reread the file using the correct coding system by typing @kbd{C-x 830 reread the file using the correct coding system by typing @kbd{C-x
825 @key{RET} r @var{coding-system} 831 @key{RET} r @var{coding-system} @key{RET}}. To see what coding system
826 @key{RET}}. To see what coding system Emacs actually used to decode 832 Emacs actually used to decode the file, look at the coding system
827 the file, look at the coding system mnemonic letter near the left edge 833 mnemonic letter near the left edge of the mode line (@pxref{Mode
828 of the mode line (@pxref{Mode Line}), or type @kbd{C-h C @key{RET}}. 834 Line}), or type @kbd{C-h C @key{RET}}.
829 835
830 @findex unify-8859-on-decoding-mode 836 @findex unify-8859-on-decoding-mode
831 The command @code{unify-8859-on-decoding-mode} enables a mode that 837 The command @code{unify-8859-on-decoding-mode} enables a mode that
832 ``unifies'' the Latin alphabets when decoding text. This works by 838 ``unifies'' the Latin alphabets when decoding text. This works by
833 converting all non-@acronym{ASCII} Latin-@var{n} characters to either Latin-1 or 839 converting all non-@acronym{ASCII} Latin-@var{n} characters to either
834 Unicode characters. This way it is easier to use various 840 Latin-1 or Unicode characters. This way it is easier to use various
835 Latin-@var{n} alphabets together. In a future Emacs version we hope 841 Latin-@var{n} alphabets together. In a future Emacs version we hope
836 to move towards full Unicode support and complete unification of 842 to move towards full Unicode support and complete unification of
837 character sets. 843 character sets.
838 844
839 @vindex buffer-file-coding-system 845 @vindex buffer-file-coding-system
841 coding system in @code{buffer-file-coding-system} and uses that coding 847 coding system in @code{buffer-file-coding-system} and uses that coding
842 system, by default, for operations that write from this buffer into a 848 system, by default, for operations that write from this buffer into a
843 file. This includes the commands @code{save-buffer} and 849 file. This includes the commands @code{save-buffer} and
844 @code{write-region}. If you want to write files from this buffer using 850 @code{write-region}. If you want to write files from this buffer using
845 a different coding system, you can specify a different coding system for 851 a different coding system, you can specify a different coding system for
846 the buffer using @code{set-buffer-file-coding-system} (@pxref{Specify 852 the buffer using @code{set-buffer-file-coding-system} (@pxref{Text
847 Coding}). 853 Coding}).
848 854
849 You can insert any possible character into any Emacs buffer, but 855 You can insert any possible character into any Emacs buffer, but
850 most coding systems can only handle some of the possible characters. 856 most coding systems can only handle some of the possible characters.
851 This means that it is possible for you to insert characters that 857 This means that it is possible for you to insert characters that
899 system specified by the variable @code{rmail-file-coding-system}. The 905 system specified by the variable @code{rmail-file-coding-system}. The
900 default value is @code{nil}, which means that Rmail files are not 906 default value is @code{nil}, which means that Rmail files are not
901 translated (they are read and written in the Emacs internal character 907 translated (they are read and written in the Emacs internal character
902 code). 908 code).
903 909
904 @node Specify Coding 910 @node Text Coding
905 @section Specifying a Coding System 911 @section Specifying a Coding System for File Text
906 912
907 In cases where Emacs does not automatically choose the right coding 913 In cases where Emacs does not automatically choose the right coding
908 system, you can use these commands to specify one: 914 system for a file's contents, you can use these commands to specify
915 one:
909 916
910 @table @kbd 917 @table @kbd
911 @item C-x @key{RET} f @var{coding} @key{RET} 918 @item C-x @key{RET} f @var{coding} @key{RET}
912 Use coding system @var{coding} for saving or revisiting the visited 919 Use coding system @var{coding} for saving or revisiting the visited
913 file in the current buffer. 920 file in the current buffer.
917 command. 924 command.
918 925
919 @item C-x @key{RET} r @var{coding} @key{RET} 926 @item C-x @key{RET} r @var{coding} @key{RET}
920 Revisit the current file using the coding system @var{coding}. 927 Revisit the current file using the coding system @var{coding}.
921 928
922 @item C-x @key{RET} k @var{coding} @key{RET} 929 @item M-x recode-region @key{RET} @var{right} @key{RET} @var{wrong} @key{RET}
923 Use coding system @var{coding} for keyboard input. 930 Convert a region that was decoded using coding system @var{wrong},
924 931 decoding it using coding system @var{right} instead.
925 @item C-x @key{RET} t @var{coding} @key{RET}
926 Use coding system @var{coding} for terminal output.
927
928 @item C-x @key{RET} p @var{input-coding} @key{RET} @var{output-coding} @key{RET}
929 Use coding systems @var{input-coding} and @var{output-coding} for
930 subprocess input and output in the current buffer.
931
932 @item C-x @key{RET} x @var{coding} @key{RET}
933 Use coding system @var{coding} for transferring selections to and from
934 other programs through the window system.
935
936 @item C-x @key{RET} F @var{coding} @key{RET}
937 Use coding system @var{coding} for encoding and decoding file
938 @emph{names}. This affects the use of non-ASCII characters in file
939 names. It has no effect on reading and writing the @emph{contents} of
940 files.
941
942 @item C-x @key{RET} X @var{coding} @key{RET}
943 Use coding system @var{coding} for transferring @emph{one}
944 selection---the next one---to or from the window system.
945
946 @item M-x recode-region
947 Convert the region from a previous coding system to a new one.
948 @end table 932 @end table
949 933
950 @kindex C-x RET f 934 @kindex C-x RET f
951 @findex set-buffer-file-coding-system 935 @findex set-buffer-file-coding-system
952 The command @kbd{C-x @key{RET} f} 936 The command @kbd{C-x @key{RET} f}
976 contains characters that the coding system cannot handle. 960 contains characters that the coding system cannot handle.
977 961
978 Other file commands affected by a specified coding system include 962 Other file commands affected by a specified coding system include
979 @kbd{C-x C-i} and @kbd{C-x C-v}, as well as the other-window variants 963 @kbd{C-x C-i} and @kbd{C-x C-v}, as well as the other-window variants
980 of @kbd{C-x C-f}. @kbd{C-x @key{RET} c} also affects commands that 964 of @kbd{C-x C-f}. @kbd{C-x @key{RET} c} also affects commands that
981 start subprocesses, including @kbd{M-x shell} (@pxref{Shell}). 965 start subprocesses, including @kbd{M-x shell} (@pxref{Shell}). If the
982 966 immediately following command does not use the coding system, then
983 If the immediately following command does not use the coding system, 967 @kbd{C-x @key{RET} c} ultimately has no effect.
984 then @kbd{C-x @key{RET} c} ultimately has no effect.
985 968
986 An easy way to visit a file with no conversion is with the @kbd{M-x 969 An easy way to visit a file with no conversion is with the @kbd{M-x
987 find-file-literally} command. @xref{Visiting}. 970 find-file-literally} command. @xref{Visiting}.
988 971
989 @vindex default-buffer-file-coding-system 972 @vindex default-buffer-file-coding-system
998 @findex revert-buffer-with-coding-system 981 @findex revert-buffer-with-coding-system
999 If you visit a file with a wrong coding system, you can correct this 982 If you visit a file with a wrong coding system, you can correct this
1000 with @kbd{C-x @key{RET} r} (@code{revert-buffer-with-coding-system}). 983 with @kbd{C-x @key{RET} r} (@code{revert-buffer-with-coding-system}).
1001 This visits the current file again, using a coding system you specify. 984 This visits the current file again, using a coding system you specify.
1002 985
1003 @kindex C-x RET t 986 @findex recode-region
1004 @findex set-terminal-coding-system 987 If a piece of text has already been inserted into a buffer using the
1005 The command @kbd{C-x @key{RET} t} (@code{set-terminal-coding-system}) 988 wrong coding system, you can redo the decoding of it using @kbd{M-x
1006 specifies the coding system for terminal output. If you specify a 989 recode-region}. This prompts you for the proper coding system, then
1007 character code for terminal output, all characters output to the 990 for the wrong coding system that was actually used, and does the
1008 terminal are translated into that coding system. 991 conversion. It first encodes the region using the wrong coding system,
1009 992 then decodes it again using the proper coding system.
1010 This feature is useful for certain character-only terminals built to 993
1011 support specific languages or character sets---for example, European 994 @node Communication Coding
1012 terminals that support one of the ISO Latin character sets. You need to 995 @section Coding Systems for Interprocess Communication
1013 specify the terminal coding system when using multibyte text, so that 996
1014 Emacs knows which characters the terminal can actually handle. 997 This section explains how to specify coding systems for use
1015 998 in communication with other processes.
1016 By default, output to the terminal is not translated at all, unless 999
1017 Emacs can deduce the proper coding system from your terminal type or 1000 @table @kbd
1018 your locale specification (@pxref{Language Environments}). 1001 @item C-x @key{RET} x @var{coding} @key{RET}
1019 1002 Use coding system @var{coding} for transferring selections to and from
1020 @kindex C-x RET k 1003 other programs through the window system.
1021 @findex set-keyboard-coding-system 1004
1022 @vindex keyboard-coding-system 1005 @item C-x @key{RET} X @var{coding} @key{RET}
1023 The command @kbd{C-x @key{RET} k} (@code{set-keyboard-coding-system}) 1006 Use coding system @var{coding} for transferring @emph{one}
1024 or the variable @code{keyboard-coding-system} specifies the coding 1007 selection---the next one---to or from the window system.
1025 system for keyboard input. Character-code translation of keyboard 1008
1026 input is useful for terminals with keys that send non-@acronym{ASCII} 1009 @item C-x @key{RET} p @var{input-coding} @key{RET} @var{output-coding} @key{RET}
1027 graphic characters---for example, some terminals designed for ISO 1010 Use coding systems @var{input-coding} and @var{output-coding} for
1028 Latin-1 or subsets of it. 1011 subprocess input and output in the current buffer.
1029 1012
1030 By default, keyboard input is translated based on your system locale 1013 @item C-x @key{RET} c @var{coding} @key{RET}
1031 setting. If your terminal does not really support the encoding 1014 Specify coding system @var{coding} for the immediately following
1032 implied by your locale (for example, if you find it inserts a 1015 command.
1033 non-@acronym{ASCII} character if you type @kbd{M-i}), you will need to set 1016 @end table
1034 @code{keyboard-coding-system} to @code{nil} to turn off encoding.
1035 You can do this by putting
1036
1037 @lisp
1038 (set-keyboard-coding-system nil)
1039 @end lisp
1040
1041 @noindent
1042 in your @file{~/.emacs} file.
1043
1044 There is a similarity between using a coding system translation for
1045 keyboard input, and using an input method: both define sequences of
1046 keyboard input that translate into single characters. However, input
1047 methods are designed to be convenient for interactive use by humans, and
1048 the sequences that are translated are typically sequences of @acronym{ASCII}
1049 printing characters. Coding systems typically translate sequences of
1050 non-graphic characters.
1051 1017
1052 @kindex C-x RET x 1018 @kindex C-x RET x
1053 @kindex C-x RET X 1019 @kindex C-x RET X
1054 @findex set-selection-coding-system 1020 @findex set-selection-coding-system
1055 @findex set-next-selection-coding-system 1021 @findex set-next-selection-coding-system
1056 The command @kbd{C-x @key{RET} x} (@code{set-selection-coding-system}) 1022 The command @kbd{C-x @key{RET} x} (@code{set-selection-coding-system})
1057 specifies the coding system for sending selected text to the window 1023 specifies the coding system for sending selected text to other windowing
1058 system, and for receiving the text of selections made in other 1024 applications, and for receiving the text of selections made in other
1059 applications. This command applies to all subsequent selections, until 1025 applications. This command applies to all subsequent selections, until
1060 you override it by using the command again. The command @kbd{C-x 1026 you override it by using the command again. The command @kbd{C-x
1061 @key{RET} X} (@code{set-next-selection-coding-system}) specifies the 1027 @key{RET} X} (@code{set-next-selection-coding-system}) specifies the
1062 coding system for the next selection made in Emacs or read by Emacs. 1028 coding system for the next selection made in Emacs or read by Emacs.
1063 1029
1068 command applies to the current buffer; normally, each subprocess has its 1034 command applies to the current buffer; normally, each subprocess has its
1069 own buffer, and thus you can use this command to specify translation to 1035 own buffer, and thus you can use this command to specify translation to
1070 and from a particular subprocess by giving the command in the 1036 and from a particular subprocess by giving the command in the
1071 corresponding buffer. 1037 corresponding buffer.
1072 1038
1039 You can also use @kbd{C-x @key{RET} c} just before the command that
1040 runs or starts a subprocess, to specify the coding system to use for
1041 communication with that subprocess.
1042
1073 The default for translation of process input and output depends on the 1043 The default for translation of process input and output depends on the
1074 current language environment. 1044 current language environment.
1075
1076 @findex recode-region
1077 If a piece of text has already been inserted into a buffer using the
1078 wrong coding system, you can decode it again using @kbd{M-x
1079 recode-region}. This prompts you for the old coding system and the
1080 desired coding system, and acts on the text in the region.
1081
1082 @vindex file-name-coding-system
1083 @cindex file names with non-@acronym{ASCII} characters
1084 @findex set-file-name-coding-system
1085 @kindex C-x @key{RET} F
1086 The variable @code{file-name-coding-system} specifies a coding
1087 system to use for encoding file names. If you set the variable to a
1088 coding system name (as a Lisp symbol or a string), Emacs encodes file
1089 names using that coding system for all file operations. This makes it
1090 possible to use non-@acronym{ASCII} characters in file names---or, at
1091 least, those non-@acronym{ASCII} characters which the specified coding
1092 system can encode. Use @kbd{C-x @key{RET} F}
1093 (@code{set-file-name-coding-system}) to specify this interactively.
1094
1095 If @code{file-name-coding-system} is @code{nil}, Emacs uses a default
1096 coding system determined by the selected language environment. In the
1097 default language environment, any non-@acronym{ASCII} characters in file names are
1098 not encoded specially; they appear in the file system using the internal
1099 Emacs representation.
1100
1101 @strong{Warning:} if you change @code{file-name-coding-system} (or the
1102 language environment) in the middle of an Emacs session, problems can
1103 result if you have already visited files whose names were encoded using
1104 the earlier coding system and cannot be encoded (or are encoded
1105 differently) under the new coding system. If you try to save one of
1106 these buffers under the visited file name, saving may use the wrong file
1107 name, or it may get an error. If such a problem happens, use @kbd{C-x
1108 C-w} to specify a new file name for that buffer.
1109
1110 @findex recode-file-name
1111 If a mistake occurs when encoding a file name, use the command
1112 @kbd{M-x recode-file-name} to change the file name's coding
1113 system. This prompts for an existing file name, its old coding
1114 system, and the coding system to which you wish to convert.
1115 1045
1116 @vindex locale-coding-system 1046 @vindex locale-coding-system
1117 @cindex decoding non-@acronym{ASCII} keyboard input on X 1047 @cindex decoding non-@acronym{ASCII} keyboard input on X
1118 The variable @code{locale-coding-system} specifies a coding system 1048 The variable @code{locale-coding-system} specifies a coding system
1119 to use when encoding and decoding system strings such as system error 1049 to use when encoding and decoding system strings such as system error
1124 specified by one of the environment variables @env{LC_ALL}, 1054 specified by one of the environment variables @env{LC_ALL},
1125 @env{LC_CTYPE}, and @env{LANG}. (The first one, in the order 1055 @env{LC_CTYPE}, and @env{LANG}. (The first one, in the order
1126 specified above, whose value is nonempty is the one that determines 1056 specified above, whose value is nonempty is the one that determines
1127 the text representation.) 1057 the text representation.)
1128 1058
1059 @node File Name Coding
1060 @section Coding Systems for File Names
1061
1062 @table @kbd
1063 @item C-x @key{RET} F @var{coding} @key{RET}
1064 Use coding system @var{coding} for encoding and decoding file
1065 @emph{names}.
1066 @end table
1067
1068 @vindex file-name-coding-system
1069 @cindex file names with non-@acronym{ASCII} characters
1070 The variable @code{file-name-coding-system} specifies a coding
1071 system to use for encoding file names. It has no effect on reading
1072 and writing the @emph{contents} of files.
1073
1074 @findex set-file-name-coding-system
1075 @kindex C-x @key{RET} F
1076 If you set the variable to a coding system name (as a Lisp symbol or
1077 a string), Emacs encodes file names using that coding system for all
1078 file operations. This makes it possible to use non-@acronym{ASCII}
1079 characters in file names---or, at least, those non-@acronym{ASCII}
1080 characters which the specified coding system can encode. Use @kbd{C-x
1081 @key{RET} F} (@code{set-file-name-coding-system}) to specify this
1082 interactively.
1083
1084 If @code{file-name-coding-system} is @code{nil}, Emacs uses a
1085 default coding system determined by the selected language environment.
1086 In the default language environment, any non-@acronym{ASCII}
1087 characters in file names are not encoded specially; they appear in the
1088 file system using the internal Emacs representation.
1089
1090 @strong{Warning:} if you change @code{file-name-coding-system} (or the
1091 language environment) in the middle of an Emacs session, problems can
1092 result if you have already visited files whose names were encoded using
1093 the earlier coding system and cannot be encoded (or are encoded
1094 differently) under the new coding system. If you try to save one of
1095 these buffers under the visited file name, saving may use the wrong file
1096 name, or it may get an error. If such a problem happens, use @kbd{C-x
1097 C-w} to specify a new file name for that buffer.
1098
1099 @findex recode-file-name
1100 If a mistake occurs when encoding a file name, use the command
1101 @kbd{M-x recode-file-name} to change the file name's coding
1102 system. This prompts for an existing file name, its old coding
1103 system, and the coding system to which you wish to convert.
1104
1105 @node Terminal Coding
1106 @section Coding Systems for Terminal I/O
1107
1108 @table @kbd
1109 @item C-x @key{RET} k @var{coding} @key{RET}
1110 Use coding system @var{coding} for keyboard input.
1111
1112 @item C-x @key{RET} t @var{coding} @key{RET}
1113 Use coding system @var{coding} for terminal output.
1114 @end table
1115
1116 @kindex C-x RET t
1117 @findex set-terminal-coding-system
1118 The command @kbd{C-x @key{RET} t} (@code{set-terminal-coding-system})
1119 specifies the coding system for terminal output. If you specify a
1120 character code for terminal output, all characters output to the
1121 terminal are translated into that coding system.
1122
1123 This feature is useful for certain character-only terminals built to
1124 support specific languages or character sets---for example, European
1125 terminals that support one of the ISO Latin character sets. You need to
1126 specify the terminal coding system when using multibyte text, so that
1127 Emacs knows which characters the terminal can actually handle.
1128
1129 By default, output to the terminal is not translated at all, unless
1130 Emacs can deduce the proper coding system from your terminal type or
1131 your locale specification (@pxref{Language Environments}).
1132
1133 @kindex C-x RET k
1134 @findex set-keyboard-coding-system
1135 @vindex keyboard-coding-system
1136 The command @kbd{C-x @key{RET} k} (@code{set-keyboard-coding-system})
1137 or the variable @code{keyboard-coding-system} specifies the coding
1138 system for keyboard input. Character-code translation of keyboard
1139 input is useful for terminals with keys that send non-@acronym{ASCII}
1140 graphic characters---for example, some terminals designed for ISO
1141 Latin-1 or subsets of it.
1142
1143 By default, keyboard input is translated based on your system locale
1144 setting. If your terminal does not really support the encoding
1145 implied by your locale (for example, if you find it inserts a
1146 non-@acronym{ASCII} character if you type @kbd{M-i}), you will need to set
1147 @code{keyboard-coding-system} to @code{nil} to turn off encoding.
1148 You can do this by putting
1149
1150 @lisp
1151 (set-keyboard-coding-system nil)
1152 @end lisp
1153
1154 @noindent
1155 in your @file{~/.emacs} file.
1156
1157 There is a similarity between using a coding system translation for
1158 keyboard input, and using an input method: both define sequences of
1159 keyboard input that translate into single characters. However, input
1160 methods are designed to be convenient for interactive use by humans, and
1161 the sequences that are translated are typically sequences of @acronym{ASCII}
1162 printing characters. Coding systems typically translate sequences of
1163 non-graphic characters.
1164
1129 @node Fontsets 1165 @node Fontsets
1130 @section Fontsets 1166 @section Fontsets
1131 @cindex fontsets 1167 @cindex fontsets
1132 1168
1133 A font for X typically defines shapes for a single alphabet or script. 1169 A font for X Windows typically defines shapes for a single alphabet
1134 Therefore, displaying the entire range of scripts that Emacs supports 1170 or script. Therefore, displaying the entire range of scripts that
1135 requires a collection of many fonts. In Emacs, such a collection is 1171 Emacs supports requires a collection of many fonts. In Emacs, such a
1136 called a @dfn{fontset}. A fontset is defined by a list of fonts, each 1172 collection is called a @dfn{fontset}. A fontset is defined by a list
1137 assigned to handle a range of character codes. 1173 of fonts, each assigned to handle a range of character codes.
1138 1174
1139 Each fontset has a name, like a font. The available X fonts are 1175 Each fontset has a name, like a font. The available X fonts are
1140 defined by the X server; fontsets, however, are defined within Emacs 1176 defined by the X server; fontsets, however, are defined within Emacs
1141 itself. Once you have defined a fontset, you can use it within Emacs by 1177 itself. Once you have defined a fontset, you can use it within Emacs by
1142 specifying its name, anywhere that you could use a single font. Of 1178 specifying its name, anywhere that you could use a single font. Of
1146 characters.@footnote{The Emacs installation instructions have information on 1182 characters.@footnote{The Emacs installation instructions have information on
1147 additional font support.} 1183 additional font support.}
1148 1184
1149 Emacs creates two fontsets automatically: the @dfn{standard fontset} 1185 Emacs creates two fontsets automatically: the @dfn{standard fontset}
1150 and the @dfn{startup fontset}. The standard fontset is most likely to 1186 and the @dfn{startup fontset}. The standard fontset is most likely to
1151 have fonts for a wide variety of non-@acronym{ASCII} characters; however, this is 1187 have fonts for a wide variety of non-@acronym{ASCII} characters;
1152 not the default for Emacs to use. (By default, Emacs tries to find a 1188 however, this is not the default for Emacs to use. (By default, Emacs
1153 font that has bold and italic variants.) You can specify use of the 1189 tries to find a font that has bold and italic variants.) You can
1154 standard fontset with the @samp{-fn} option, or with the @samp{Font} X 1190 specify use of the standard fontset with the @samp{-fn} option, or
1155 resource (@pxref{Font X}). For example, 1191 with the @samp{Font} X resource (@pxref{Font X}). For example,
1156 1192
1157 @example 1193 @example
1158 emacs -fn fontset-standard 1194 emacs -fn fontset-standard
1159 @end example 1195 @end example
1160 1196
1293 1329
1294 @node Undisplayable Characters 1330 @node Undisplayable Characters
1295 @section Undisplayable Characters 1331 @section Undisplayable Characters
1296 1332
1297 There may be a some non-@acronym{ASCII} characters that your terminal cannot 1333 There may be a some non-@acronym{ASCII} characters that your terminal cannot
1298 display. Most non-windowing terminals support just a single character 1334 display. Most text-only terminals support just a single character
1299 set (use the variable @code{default-terminal-coding-system} 1335 set (use the variable @code{default-terminal-coding-system}
1300 (@pxref{Specify Coding}) to tell Emacs which one); characters which 1336 (@pxref{Terminal Coding}) to tell Emacs which one); characters which
1301 can't be encoded in that coding system are displayed as @samp{?} by 1337 can't be encoded in that coding system are displayed as @samp{?} by
1302 default. 1338 default.
1303 1339
1304 Windowing terminals can display a broader range of characters, but 1340 Graphical displays can display a broader range of characters, but
1305 you may not have fonts installed for all of them; characters that have 1341 you may not have fonts installed for all of them; characters that have
1306 no font appear as a hollow box. 1342 no font appear as a hollow box.
1307 1343
1308 If you use Latin-1 characters but your terminal can't display 1344 If you use Latin-1 characters but your terminal can't display
1309 Latin-1, you can arrange to display mnemonic @acronym{ASCII} sequences 1345 Latin-1, you can arrange to display mnemonic @acronym{ASCII} sequences
1333 set-language-environment} and specify a suitable language environment 1369 set-language-environment} and specify a suitable language environment
1334 such as @samp{Latin-@var{n}}. 1370 such as @samp{Latin-@var{n}}.
1335 1371
1336 For more information about unibyte operation, see @ref{Enabling 1372 For more information about unibyte operation, see @ref{Enabling
1337 Multibyte}. Note particularly that you probably want to ensure that 1373 Multibyte}. Note particularly that you probably want to ensure that
1338 your initialization files are read as unibyte if they contain non-@acronym{ASCII} 1374 your initialization files are read as unibyte if they contain
1339 characters. 1375 non-@acronym{ASCII} characters.
1340 1376
1341 @vindex unibyte-display-via-language-environment 1377 @vindex unibyte-display-via-language-environment
1342 Emacs can also display those characters, provided the terminal or font 1378 Emacs can also display those characters, provided the terminal or font
1343 in use supports them. This works automatically. Alternatively, if you 1379 in use supports them. This works automatically. Alternatively, if you
1344 are using a window system, Emacs can also display single-byte characters 1380 are using a window system, Emacs can also display single-byte characters
1375 @item 1411 @item
1376 If your keyboard can generate character codes 128 (decimal) and up, 1412 If your keyboard can generate character codes 128 (decimal) and up,
1377 representing non-@acronym{ASCII} characters, you can type those character codes 1413 representing non-@acronym{ASCII} characters, you can type those character codes
1378 directly. 1414 directly.
1379 1415
1380 On a window system, you should not need to do anything special to use 1416 On a graphical display, you should not need to do anything special to use
1381 these keys; they should simply work. On a text-only terminal, you 1417 these keys; they should simply work. On a text-only terminal, you
1382 should use the command @code{M-x set-keyboard-coding-system} or the 1418 should use the command @code{M-x set-keyboard-coding-system} or the
1383 variable @code{keyboard-coding-system} to specify which coding system 1419 variable @code{keyboard-coding-system} to specify which coding system
1384 your keyboard uses (@pxref{Specify Coding}). Enabling this feature 1420 your keyboard uses (@pxref{Terminal Coding}). Enabling this feature
1385 will probably require you to use @kbd{ESC} to type Meta characters; 1421 will probably require you to use @kbd{ESC} to type Meta characters;
1386 however, on a console terminal or in @code{xterm}, you can arrange for 1422 however, on a console terminal or in @code{xterm}, you can arrange for
1387 Meta to be converted to @kbd{ESC} and still be able type 8-bit 1423 Meta to be converted to @kbd{ESC} and still be able type 8-bit
1388 characters present directly on the keyboard or using @kbd{Compose} or 1424 characters present directly on the keyboard or using @kbd{Compose} or
1389 @kbd{AltGr} keys. @xref{User Input}. 1425 @kbd{AltGr} keys. @xref{User Input}.
1415 @cindex charsets 1451 @cindex charsets
1416 1452
1417 Emacs groups all supported characters into disjoint @dfn{charsets}. 1453 Emacs groups all supported characters into disjoint @dfn{charsets}.
1418 Each character code belongs to one and only one charset. For 1454 Each character code belongs to one and only one charset. For
1419 historical reasons, Emacs typically divides an 8-bit character code 1455 historical reasons, Emacs typically divides an 8-bit character code
1420 for an extended version of @acronym{ASCII} into two charsets: @acronym{ASCII}, which 1456 for an extended version of @acronym{ASCII} into two charsets:
1421 covers the codes 0 through 127, plus another charset which covers the 1457 @acronym{ASCII}, which covers the codes 0 through 127, plus another
1422 ``right-hand part'' (the codes 128 and up). For instance, the 1458 charset which covers the ``right-hand part'' (the codes 128 and up).
1423 characters of Latin-1 include the Emacs charset @code{ascii} plus the 1459 For instance, the characters of Latin-1 include the Emacs charset
1424 Emacs charset @code{latin-iso8859-1}. 1460 @code{ascii} plus the Emacs charset @code{latin-iso8859-1}.
1425 1461
1426 Emacs characters belonging to different charsets may look the same, 1462 Emacs characters belonging to different charsets may look the same,
1427 but they are still different characters. For example, the letter 1463 but they are still different characters. For example, the letter
1428 @samp{o} with acute accent in charset @code{latin-iso8859-1}, used for 1464 @samp{o} with acute accent in charset @code{latin-iso8859-1}, used for
1429 Latin-1, is different from the letter @samp{o} with acute accent in 1465 Latin-1, is different from the letter @samp{o} with acute accent in