comparison man/mule.texi @ 89909:68c22ea6027c

Sync to HEAD
author Kenichi Handa <handa@m17n.org>
date Fri, 16 Apr 2004 12:51:06 +0000
parents 375f2633d815
children f2ebccfa87d4
comparison
equal deleted inserted replaced
89908:ee1402f7b568 89909:68c22ea6027c
47 Emacs allows editing text with international characters by supporting 47 Emacs allows editing text with international characters by supporting
48 all the related activities: 48 all the related activities:
49 49
50 @itemize @bullet 50 @itemize @bullet
51 @item 51 @item
52 You can visit files with non-ASCII characters, save non-ASCII text, and 52 You can visit files with non-@acronym{ASCII} characters, save non-@acronym{ASCII} text, and
53 pass non-ASCII text between Emacs and programs it invokes (such as 53 pass non-@acronym{ASCII} text between Emacs and programs it invokes (such as
54 compilers, spell-checkers, and mailers). Setting your language 54 compilers, spell-checkers, and mailers). Setting your language
55 environment (@pxref{Language Environments}) takes care of setting up the 55 environment (@pxref{Language Environments}) takes care of setting up the
56 coding systems and other options for a specific language or culture. 56 coding systems and other options for a specific language or culture.
57 Alternatively, you can specify how Emacs should encode or decode text 57 Alternatively, you can specify how Emacs should encode or decode text
58 for each command; see @ref{Specify Coding}. 58 for each command; see @ref{Specify Coding}.
59 59
60 @item 60 @item
61 You can display non-ASCII characters encoded by the various scripts. 61 You can display non-@acronym{ASCII} characters encoded by the various scripts.
62 This works by using appropriate fonts on X and similar graphics 62 This works by using appropriate fonts on X and similar graphics
63 displays (@pxref{Defining Fontsets}), and by sending special codes to 63 displays (@pxref{Defining Fontsets}), and by sending special codes to
64 text-only displays (@pxref{Specify Coding}). If some characters are 64 text-only displays (@pxref{Specify Coding}). If some characters are
65 displayed incorrectly, refer to @ref{Undisplayable Characters}, which 65 displayed incorrectly, refer to @ref{Undisplayable Characters}, which
66 describes possible problems and explains how to solve them. 66 describes possible problems and explains how to solve them.
67 67
68 @item 68 @item
69 You can insert non-ASCII characters or search for them. To do that, 69 You can insert non-@acronym{ASCII} characters or search for them. To do that,
70 you can specify an input method (@pxref{Select Input Method}) suitable 70 you can specify an input method (@pxref{Select Input Method}) suitable
71 for your language, or use the default input method set up when you set 71 for your language, or use the default input method set up when you set
72 your language environment. (Emacs input methods are part of the Leim 72 your language environment. (Emacs input methods are part of the Leim
73 package, which must be installed for you to be able to use them.) If 73 package, which must be installed for you to be able to use them.) If
74 your keyboard can produce non-ASCII characters, you can select an 74 your keyboard can produce non-@acronym{ASCII} characters, you can select an
75 appropriate keyboard coding system (@pxref{Specify Coding}), and Emacs 75 appropriate keyboard coding system (@pxref{Specify Coding}), and Emacs
76 will accept those characters. Latin-1 characters can also be input by 76 will accept those characters. Latin-1 characters can also be input by
77 using the @kbd{C-x 8} prefix, see @ref{Single-Byte Character Support, 77 using the @kbd{C-x 8} prefix, see @ref{Single-Byte Character Support,
78 C-x 8}. On X Window systems, your locale should be set to an 78 C-x 8}. On X Window systems, your locale should be set to an
79 appropriate value to make sure Emacs interprets keyboard input 79 appropriate value to make sure Emacs interprets keyboard input
108 108
109 The users of international character sets and scripts have established 109 The users of international character sets and scripts have established
110 many more-or-less standard coding systems for storing files. Emacs 110 many more-or-less standard coding systems for storing files. Emacs
111 internally uses a single multibyte character encoding, so that it can 111 internally uses a single multibyte character encoding, so that it can
112 intermix characters from all these scripts in a single buffer or string. 112 intermix characters from all these scripts in a single buffer or string.
113 This encoding represents each non-ASCII character as a sequence of bytes 113 This encoding represents each non-@acronym{ASCII} character as a sequence of bytes
114 in the range 0200 through 0377. Emacs translates between the multibyte 114 in the range 0200 through 0377. Emacs translates between the multibyte
115 character encoding and various other coding systems when reading and 115 character encoding and various other coding systems when reading and
116 writing files, when exchanging data with subprocesses, and (in some 116 writing files, when exchanging data with subprocesses, and (in some
117 cases) in the @kbd{C-q} command (@pxref{Multibyte Conversion}). 117 cases) in the @kbd{C-q} command (@pxref{Multibyte Conversion}).
118 118
185 in that buffer. 185 in that buffer.
186 186
187 @cindex Lisp files, and multibyte operation 187 @cindex Lisp files, and multibyte operation
188 @cindex multibyte operation, and Lisp files 188 @cindex multibyte operation, and Lisp files
189 @cindex unibyte operation, and Lisp files 189 @cindex unibyte operation, and Lisp files
190 @cindex init file, and non-ASCII characters 190 @cindex init file, and non-@acronym{ASCII} characters
191 @cindex environment variables, and non-ASCII characters 191 @cindex environment variables, and non-@acronym{ASCII} characters
192 With @samp{--unibyte}, multibyte strings are not created during 192 With @samp{--unibyte}, multibyte strings are not created during
193 initialization from the values of environment variables, 193 initialization from the values of environment variables,
194 @file{/etc/passwd} entries etc.@: that contain non-ASCII 8-bit 194 @file{/etc/passwd} entries etc.@: that contain non-@acronym{ASCII} 8-bit
195 characters. 195 characters.
196 196
197 Emacs normally loads Lisp files as multibyte, regardless of whether 197 Emacs normally loads Lisp files as multibyte, regardless of whether
198 you used @samp{--unibyte}. This includes the Emacs initialization 198 you used @samp{--unibyte}. This includes the Emacs initialization
199 file, @file{.emacs}, and the initialization files of Emacs packages 199 file, @file{.emacs}, and the initialization files of Emacs packages
280 @code{locale-charset-language-names} and @code{locale-language-names}, 280 @code{locale-charset-language-names} and @code{locale-language-names},
281 and selects the corresponding language environment if a match is found. 281 and selects the corresponding language environment if a match is found.
282 (The former variable overrides the latter.) It also adjusts the display 282 (The former variable overrides the latter.) It also adjusts the display
283 table and terminal coding system, the locale coding system, the 283 table and terminal coding system, the locale coding system, the
284 preferred coding system as needed for the locale, and---last but not 284 preferred coding system as needed for the locale, and---last but not
285 least---the way Emacs decodes non-ASCII characters sent by your keyboard. 285 least---the way Emacs decodes non-@acronym{ASCII} characters sent by your keyboard.
286 286
287 If you modify the @env{LC_ALL}, @env{LC_CTYPE}, or @env{LANG} 287 If you modify the @env{LC_ALL}, @env{LC_CTYPE}, or @env{LANG}
288 environment variables while running Emacs, you may want to invoke the 288 environment variables while running Emacs, you may want to invoke the
289 @code{set-locale-environment} function afterwards to readjust the 289 @code{set-locale-environment} function afterwards to readjust the
290 language environment from the new locale. 290 language environment from the new locale.
344 specifically for interactive input. In Emacs, typically each language 344 specifically for interactive input. In Emacs, typically each language
345 has its own input method; sometimes several languages which use the same 345 has its own input method; sometimes several languages which use the same
346 characters can share one input method. A few languages support several 346 characters can share one input method. A few languages support several
347 input methods. 347 input methods.
348 348
349 The simplest kind of input method works by mapping ASCII letters 349 The simplest kind of input method works by mapping @acronym{ASCII} letters
350 into another alphabet; this allows you to use one other alphabet 350 into another alphabet; this allows you to use one other alphabet
351 instead of ASCII. The Greek and Russian input methods 351 instead of @acronym{ASCII}. The Greek and Russian input methods
352 work this way. 352 work this way.
353 353
354 A more powerful technique is composition: converting sequences of 354 A more powerful technique is composition: converting sequences of
355 characters into one letter. Many European input methods use composition 355 characters into one letter. Many European input methods use composition
356 to produce a single non-ASCII letter from a sequence that consists of a 356 to produce a single non-@acronym{ASCII} letter from a sequence that consists of a
357 letter followed by accent characters (or vice versa). For example, some 357 letter followed by accent characters (or vice versa). For example, some
358 methods convert the sequence @kbd{a'} into a single accented letter. 358 methods convert the sequence @kbd{a'} into a single accented letter.
359 These input methods have no special commands of their own; all they do 359 These input methods have no special commands of their own; all they do
360 is compose sequences of printing characters. 360 is compose sequences of printing characters.
361 361
478 language environment that it is meant to be used with. The variable 478 language environment that it is meant to be used with. The variable
479 @code{current-input-method} records which input method is selected. 479 @code{current-input-method} records which input method is selected.
480 480
481 @findex toggle-input-method 481 @findex toggle-input-method
482 @kindex C-\ 482 @kindex C-\
483 Input methods use various sequences of ASCII characters to stand for 483 Input methods use various sequences of @acronym{ASCII} characters to stand for
484 non-ASCII characters. Sometimes it is useful to turn off the input 484 non-@acronym{ASCII} characters. Sometimes it is useful to turn off the input
485 method temporarily. To do this, type @kbd{C-\} 485 method temporarily. To do this, type @kbd{C-\}
486 (@code{toggle-input-method}). To reenable the input method, type 486 (@code{toggle-input-method}). To reenable the input method, type
487 @kbd{C-\} again. 487 @kbd{C-\} again.
488 488
489 If you type @kbd{C-\} and you have not yet selected an input method, 489 If you type @kbd{C-\} and you have not yet selected an input method,
532 To display a list of all the supported input methods, type @kbd{M-x 532 To display a list of all the supported input methods, type @kbd{M-x
533 list-input-methods}. The list gives information about each input 533 list-input-methods}. The list gives information about each input
534 method, including the string that stands for it in the mode line. 534 method, including the string that stands for it in the mode line.
535 535
536 @node Multibyte Conversion 536 @node Multibyte Conversion
537 @section Unibyte and Multibyte Non-ASCII characters 537 @section Unibyte and Multibyte Non-@acronym{ASCII} characters
538 538
539 When multibyte characters are enabled, character codes 0240 (octal) 539 When multibyte characters are enabled, character codes 0240 (octal)
540 through 0377 (octal) are not really legitimate in the buffer. The valid 540 through 0377 (octal) are not really legitimate in the buffer. The valid
541 non-ASCII printing characters have codes that start from 0400. 541 non-@acronym{ASCII} printing characters have codes that start from 0400.
542 542
543 If you type a self-inserting character in the range 0240 through 543 If you type a self-inserting character in the range 0240 through
544 0377, or if you use @kbd{C-q} to insert one, Emacs assumes you 544 0377, or if you use @kbd{C-q} to insert one, Emacs assumes you
545 intended to use one of the ISO Latin-@var{n} character sets, and 545 intended to use one of the ISO Latin-@var{n} character sets, and
546 converts it to the Emacs code representing that Latin-@var{n} 546 converts it to the Emacs code representing that Latin-@var{n}
588 creating the coding system for the codepage, you can use it as any 588 creating the coding system for the codepage, you can use it as any
589 other coding system. For example, to visit a file encoded in codepage 589 other coding system. For example, to visit a file encoded in codepage
590 850, type @kbd{C-x @key{RET} c cp850 @key{RET} C-x C-f @var{filename} 590 850, type @kbd{C-x @key{RET} c cp850 @key{RET} C-x C-f @var{filename}
591 @key{RET}}. 591 @key{RET}}.
592 592
593 In addition to converting various representations of non-ASCII 593 In addition to converting various representations of non-@acronym{ASCII}
594 characters, a coding system can perform end-of-line conversion. Emacs 594 characters, a coding system can perform end-of-line conversion. Emacs
595 handles three different conventions for how to separate lines in a file: 595 handles three different conventions for how to separate lines in a file:
596 newline, carriage-return linefeed, and just carriage-return. 596 newline, carriage-return linefeed, and just carriage-return.
597 597
598 @table @kbd 598 @table @kbd
659 predictable. For example, the coding system @code{iso-latin-1} has 659 predictable. For example, the coding system @code{iso-latin-1} has
660 variants @code{iso-latin-1-unix}, @code{iso-latin-1-dos} and 660 variants @code{iso-latin-1-unix}, @code{iso-latin-1-dos} and
661 @code{iso-latin-1-mac}. 661 @code{iso-latin-1-mac}.
662 662
663 The coding system @code{raw-text} is good for a file which is mainly 663 The coding system @code{raw-text} is good for a file which is mainly
664 ASCII text, but may contain byte values above 127 which are not meant to 664 @acronym{ASCII} text, but may contain byte values above 127 which are not meant to
665 encode non-ASCII characters. With @code{raw-text}, Emacs copies those 665 encode non-@acronym{ASCII} characters. With @code{raw-text}, Emacs copies those
666 byte values unchanged, and sets @code{enable-multibyte-characters} to 666 byte values unchanged, and sets @code{enable-multibyte-characters} to
667 @code{nil} in the current buffer so that they will be interpreted 667 @code{nil} in the current buffer so that they will be interpreted
668 properly. @code{raw-text} handles end-of-line conversion in the usual 668 properly. @code{raw-text} handles end-of-line conversion in the usual
669 way, based on the data encountered, and has the usual three variants to 669 way, based on the data encountered, and has the usual three variants to
670 specify the kind of end-of-line conversion to use. 670 specify the kind of end-of-line conversion to use.
671 671
672 In contrast, the coding system @code{no-conversion} specifies no 672 In contrast, the coding system @code{no-conversion} specifies no
673 character code conversion at all---none for non-ASCII byte values and 673 character code conversion at all---none for non-@acronym{ASCII} byte values and
674 none for end of line. This is useful for reading or writing binary 674 none for end of line. This is useful for reading or writing binary
675 files, tar files, and other files that must be examined verbatim. It, 675 files, tar files, and other files that must be examined verbatim. It,
676 too, sets @code{enable-multibyte-characters} to @code{nil}. 676 too, sets @code{enable-multibyte-characters} to @code{nil}.
677 677
678 The easiest way to edit a file with no conversion of any kind is with 678 The easiest way to edit a file with no conversion of any kind is with
679 the @kbd{M-x find-file-literally} command. This uses 679 the @kbd{M-x find-file-literally} command. This uses
680 @code{no-conversion}, and also suppresses other Emacs features that 680 @code{no-conversion}, and also suppresses other Emacs features that
681 might convert the file contents before you see them. @xref{Visiting}. 681 might convert the file contents before you see them. @xref{Visiting}.
682 682
683 The coding system @code{emacs-mule} means that the file contains 683 The coding system @code{emacs-mule} means that the file contains
684 non-ASCII characters stored with the internal Emacs encoding. It 684 non-@acronym{ASCII} characters stored with the internal Emacs encoding. It
685 handles end-of-line conversion based on the data encountered, and has 685 handles end-of-line conversion based on the data encountered, and has
686 the usual three variants to specify the kind of end-of-line conversion. 686 the usual three variants to specify the kind of end-of-line conversion.
687 687
688 @node Recognize Coding 688 @node Recognize Coding
689 @section Recognizing Coding Systems 689 @section Recognizing Coding Systems
772 the buffer. 772 the buffer.
773 773
774 The default value of @code{inhibit-iso-escape-detection} is 774 The default value of @code{inhibit-iso-escape-detection} is
775 @code{nil}. We recommend that you not change it permanently, only for 775 @code{nil}. We recommend that you not change it permanently, only for
776 one specific operation. That's because many Emacs Lisp source files 776 one specific operation. That's because many Emacs Lisp source files
777 in the Emacs distribution contain non-ASCII characters encoded in the 777 in the Emacs distribution contain non-@acronym{ASCII} characters encoded in the
778 coding system @code{iso-2022-7bit}, and they won't be 778 coding system @code{iso-2022-7bit}, and they won't be
779 decoded correctly when you visit those files if you suppress the 779 decoded correctly when you visit those files if you suppress the
780 escape sequence detection. 780 escape sequence detection.
781 781
782 @vindex coding 782 @vindex coding
815 of the mode line (@pxref{Mode Line}), or type @kbd{C-h C @key{RET}}. 815 of the mode line (@pxref{Mode Line}), or type @kbd{C-h C @key{RET}}.
816 816
817 @findex unify-8859-on-decoding-mode 817 @findex unify-8859-on-decoding-mode
818 The command @code{unify-8859-on-decoding-mode} enables a mode that 818 The command @code{unify-8859-on-decoding-mode} enables a mode that
819 ``unifies'' the Latin alphabets when decoding text. This works by 819 ``unifies'' the Latin alphabets when decoding text. This works by
820 converting all non-ASCII Latin-@var{n} characters to either Latin-1 or 820 converting all non-@acronym{ASCII} Latin-@var{n} characters to either Latin-1 or
821 Unicode characters. This way it is easier to use various 821 Unicode characters. This way it is easier to use various
822 Latin-@var{n} alphabets together. In a future Emacs version we hope 822 Latin-@var{n} alphabets together. In a future Emacs version we hope
823 to move towards full Unicode support and complete unification of 823 to move towards full Unicode support and complete unification of
824 character sets. 824 character sets.
825 825
835 835
836 You can insert any possible character into any Emacs buffer, but 836 You can insert any possible character into any Emacs buffer, but
837 most coding systems can only handle some of the possible characters. 837 most coding systems can only handle some of the possible characters.
838 This means that it is possible for you to insert characters that 838 This means that it is possible for you to insert characters that
839 cannot be encoded with the coding system that will be used to save the 839 cannot be encoded with the coding system that will be used to save the
840 buffer. For example, you could start with an ASCII file and insert a 840 buffer. For example, you could start with an @acronym{ASCII} file and insert a
841 few Latin-1 characters into it, or you could edit a text file in 841 few Latin-1 characters into it, or you could edit a text file in
842 Polish encoded in @code{iso-8859-2} and add some Russian words to it. 842 Polish encoded in @code{iso-8859-2} and add some Russian words to it.
843 When you save the buffer, Emacs cannot use the current value of 843 When you save the buffer, Emacs cannot use the current value of
844 @code{buffer-file-coding-system}, because the characters you added 844 @code{buffer-file-coding-system}, because the characters you added
845 cannot be encoded by that coding system. 845 cannot be encoded by that coding system.
915 915
916 @item C-x @key{RET} x @var{coding} @key{RET} 916 @item C-x @key{RET} x @var{coding} @key{RET}
917 Use coding system @var{coding} for transferring selections to and from 917 Use coding system @var{coding} for transferring selections to and from
918 other programs through the window system. 918 other programs through the window system.
919 919
920 @item C-x @key{RET} F @var{coding} @key{RET}
921 Use coding system @var{coding} for encoding and decoding file
922 @emph{names}. This affects the use of non-ASCII characters in file
923 names. It has no effect on reading and writing the @emph{contents} of
924 files.
925
920 @item C-x @key{RET} X @var{coding} @key{RET} 926 @item C-x @key{RET} X @var{coding} @key{RET}
921 Use coding system @var{coding} for transferring @emph{one} 927 Use coding system @var{coding} for transferring @emph{one}
922 selection---the next one---to or from the window system. 928 selection---the next one---to or from the window system.
923 @end table 929 @end table
924 930
991 @vindex keyboard-coding-system 997 @vindex keyboard-coding-system
992 The command @kbd{C-x @key{RET} k} (@code{set-keyboard-coding-system}) 998 The command @kbd{C-x @key{RET} k} (@code{set-keyboard-coding-system})
993 or the Custom option @code{keyboard-coding-system} 999 or the Custom option @code{keyboard-coding-system}
994 specifies the coding system for keyboard input. Character-code 1000 specifies the coding system for keyboard input. Character-code
995 translation of keyboard input is useful for terminals with keys that 1001 translation of keyboard input is useful for terminals with keys that
996 send non-ASCII graphic characters---for example, some terminals designed 1002 send non-@acronym{ASCII} graphic characters---for example, some terminals designed
997 for ISO Latin-1 or subsets of it. 1003 for ISO Latin-1 or subsets of it.
998 1004
999 By default, keyboard input is translated based on your system locale 1005 By default, keyboard input is translated based on your system locale
1000 setting. If your terminal does not really support the encoding 1006 setting. If your terminal does not really support the encoding
1001 implied by your locale (for example, if you find it inserts a 1007 implied by your locale (for example, if you find it inserts a
1002 non-ASCII character if you type @kbd{M-i}), you will need to set 1008 non-@acronym{ASCII} character if you type @kbd{M-i}), you will need to set
1003 @code{keyboard-coding-system} to @code{nil} to turn off encoding. 1009 @code{keyboard-coding-system} to @code{nil} to turn off encoding.
1004 You can do this by putting 1010 You can do this by putting
1005 1011
1006 @lisp 1012 @lisp
1007 (set-keyboard-coding-system nil) 1013 (set-keyboard-coding-system nil)
1012 1018
1013 There is a similarity between using a coding system translation for 1019 There is a similarity between using a coding system translation for
1014 keyboard input, and using an input method: both define sequences of 1020 keyboard input, and using an input method: both define sequences of
1015 keyboard input that translate into single characters. However, input 1021 keyboard input that translate into single characters. However, input
1016 methods are designed to be convenient for interactive use by humans, and 1022 methods are designed to be convenient for interactive use by humans, and
1017 the sequences that are translated are typically sequences of ASCII 1023 the sequences that are translated are typically sequences of @acronym{ASCII}
1018 printing characters. Coding systems typically translate sequences of 1024 printing characters. Coding systems typically translate sequences of
1019 non-graphic characters. 1025 non-graphic characters.
1020 1026
1021 @kindex C-x RET x 1027 @kindex C-x RET x
1022 @kindex C-x RET X 1028 @kindex C-x RET X
1041 1047
1042 The default for translation of process input and output depends on the 1048 The default for translation of process input and output depends on the
1043 current language environment. 1049 current language environment.
1044 1050
1045 @vindex file-name-coding-system 1051 @vindex file-name-coding-system
1046 @cindex file names with non-ASCII characters 1052 @cindex file names with non-@acronym{ASCII} characters
1047 The variable @code{file-name-coding-system} specifies a coding system 1053 @findex set-file-name-coding-system
1048 to use for encoding file names. If you set the variable to a coding 1054 @kindex C-x @key{RET} F
1049 system name (as a Lisp symbol or a string), Emacs encodes file names 1055 The variable @code{file-name-coding-system} specifies a coding
1050 using that coding system for all file operations. This makes it 1056 system to use for encoding file names. If you set the variable to a
1051 possible to use non-ASCII characters in file names---or, at least, those 1057 coding system name (as a Lisp symbol or a string), Emacs encodes file
1052 non-ASCII characters which the specified coding system can encode. 1058 names using that coding system for all file operations. This makes it
1059 possible to use non-@acronym{ASCII} characters in file names---or, at
1060 least, those non-@acronym{ASCII} characters which the specified coding
1061 system can encode. Use @kbd{C-x @key{RET} F}
1062 (@code{set-file-name-coding-system}) to specify this interactively.
1053 1063
1054 If @code{file-name-coding-system} is @code{nil}, Emacs uses a default 1064 If @code{file-name-coding-system} is @code{nil}, Emacs uses a default
1055 coding system determined by the selected language environment. In the 1065 coding system determined by the selected language environment. In the
1056 default language environment, any non-ASCII characters in file names are 1066 default language environment, any non-@acronym{ASCII} characters in file names are
1057 not encoded specially; they appear in the file system using the internal 1067 not encoded specially; they appear in the file system using the internal
1058 Emacs representation. 1068 Emacs representation.
1059 1069
1060 @strong{Warning:} if you change @code{file-name-coding-system} (or the 1070 @strong{Warning:} if you change @code{file-name-coding-system} (or the
1061 language environment) in the middle of an Emacs session, problems can 1071 language environment) in the middle of an Emacs session, problems can
1065 these buffers under the visited file name, saving may use the wrong file 1075 these buffers under the visited file name, saving may use the wrong file
1066 name, or it may get an error. If such a problem happens, use @kbd{C-x 1076 name, or it may get an error. If such a problem happens, use @kbd{C-x
1067 C-w} to specify a new file name for that buffer. 1077 C-w} to specify a new file name for that buffer.
1068 1078
1069 @vindex locale-coding-system 1079 @vindex locale-coding-system
1070 @cindex decoding non-ASCII keyboard input on X 1080 @cindex decoding non-@acronym{ASCII} keyboard input on X
1071 The variable @code{locale-coding-system} specifies a coding system 1081 The variable @code{locale-coding-system} specifies a coding system
1072 to use when encoding and decoding system strings such as system error 1082 to use when encoding and decoding system strings such as system error
1073 messages and @code{format-time-string} formats and time stamps. That 1083 messages and @code{format-time-string} formats and time stamps. That
1074 coding system is also used for decoding non-ASCII keyboard input on X 1084 coding system is also used for decoding non-@acronym{ASCII} keyboard input on X
1075 Window systems. You should choose a coding system that is compatible 1085 Window systems. You should choose a coding system that is compatible
1076 with the underlying system's text representation, which is normally 1086 with the underlying system's text representation, which is normally
1077 specified by one of the environment variables @env{LC_ALL}, 1087 specified by one of the environment variables @env{LC_ALL},
1078 @env{LC_CTYPE}, and @env{LANG}. (The first one, in the order 1088 @env{LC_CTYPE}, and @env{LANG}. (The first one, in the order
1079 specified above, whose value is nonempty is the one that determines 1089 specified above, whose value is nonempty is the one that determines
1099 characters.@footnote{The Emacs installation instructions have information on 1109 characters.@footnote{The Emacs installation instructions have information on
1100 additional font support.} 1110 additional font support.}
1101 1111
1102 Emacs creates two fontsets automatically: the @dfn{standard fontset} 1112 Emacs creates two fontsets automatically: the @dfn{standard fontset}
1103 and the @dfn{startup fontset}. The standard fontset is most likely to 1113 and the @dfn{startup fontset}. The standard fontset is most likely to
1104 have fonts for a wide variety of non-ASCII characters; however, this is 1114 have fonts for a wide variety of non-@acronym{ASCII} characters; however, this is
1105 not the default for Emacs to use. (By default, Emacs tries to find a 1115 not the default for Emacs to use. (By default, Emacs tries to find a
1106 font that has bold and italic variants.) You can specify use of the 1116 font that has bold and italic variants.) You can specify use of the
1107 standard fontset with the @samp{-fn} option, or with the @samp{Font} X 1117 standard fontset with the @samp{-fn} option, or with the @samp{Font} X
1108 resource (@pxref{Font X}). For example, 1118 resource (@pxref{Font X}). For example,
1109 1119
1135 Bold, italic, and bold-italic variants of the standard fontset are 1145 Bold, italic, and bold-italic variants of the standard fontset are
1136 created automatically. Their names have @samp{bold} instead of 1146 created automatically. Their names have @samp{bold} instead of
1137 @samp{medium}, or @samp{i} instead of @samp{r}, or both. 1147 @samp{medium}, or @samp{i} instead of @samp{r}, or both.
1138 1148
1139 @cindex startup fontset 1149 @cindex startup fontset
1140 If you specify a default ASCII font with the @samp{Font} resource or 1150 If you specify a default @acronym{ASCII} font with the @samp{Font} resource or
1141 the @samp{-fn} argument, Emacs generates a fontset from it 1151 the @samp{-fn} argument, Emacs generates a fontset from it
1142 automatically. This is the @dfn{startup fontset} and its name is 1152 automatically. This is the @dfn{startup fontset} and its name is
1143 @code{fontset-startup}. It does this by replacing the @var{foundry}, 1153 @code{fontset-startup}. It does this by replacing the @var{foundry},
1144 @var{family}, @var{add_style}, and @var{average_width} fields of the 1154 @var{family}, @var{add_style}, and @var{average_width} fields of the
1145 font name with @samp{*}, replacing @var{charset_registry} field with 1155 font name with @samp{*}, replacing @var{charset_registry} field with
1189 font to use for that character set. You can use this construct any 1199 font to use for that character set. You can use this construct any
1190 number of times in defining one fontset. 1200 number of times in defining one fontset.
1191 1201
1192 For the other character sets, Emacs chooses a font based on 1202 For the other character sets, Emacs chooses a font based on
1193 @var{fontpattern}. It replaces @samp{fontset-@var{alias}} with values 1203 @var{fontpattern}. It replaces @samp{fontset-@var{alias}} with values
1194 that describe the character set. For the ASCII character font, 1204 that describe the character set. For the @acronym{ASCII} character font,
1195 @samp{fontset-@var{alias}} is replaced with @samp{ISO8859-1}. 1205 @samp{fontset-@var{alias}} is replaced with @samp{ISO8859-1}.
1196 1206
1197 In addition, when several consecutive fields are wildcards, Emacs 1207 In addition, when several consecutive fields are wildcards, Emacs
1198 collapses them into a single wildcard. This is to prevent use of 1208 collapses them into a single wildcard. This is to prevent use of
1199 auto-scaled fonts. Fonts made by scaling larger fonts are not usable 1209 auto-scaled fonts. Fonts made by scaling larger fonts are not usable
1206 @example 1216 @example
1207 -*-fixed-medium-r-normal-*-24-*-*-*-*-*-fontset-24 1217 -*-fixed-medium-r-normal-*-24-*-*-*-*-*-fontset-24
1208 @end example 1218 @end example
1209 1219
1210 @noindent 1220 @noindent
1211 the font specification for ASCII characters would be this: 1221 the font specification for @acronym{ASCII} characters would be this:
1212 1222
1213 @example 1223 @example
1214 -*-fixed-medium-r-normal-*-24-*-ISO8859-1 1224 -*-fixed-medium-r-normal-*-24-*-ISO8859-1
1215 @end example 1225 @end example
1216 1226
1245 @xref{Font X}, for more information about font naming in X. 1255 @xref{Font X}, for more information about font naming in X.
1246 1256
1247 @node Undisplayable Characters 1257 @node Undisplayable Characters
1248 @section Undisplayable Characters 1258 @section Undisplayable Characters
1249 1259
1250 There may be a some non-ASCII characters that your terminal cannot 1260 There may be a some non-@acronym{ASCII} characters that your terminal cannot
1251 display. Most non-windowing terminals support just a single character 1261 display. Most non-windowing terminals support just a single character
1252 set (use the variable @code{default-terminal-coding-system} 1262 set (use the variable @code{default-terminal-coding-system}
1253 (@pxref{Specify Coding}) to tell Emacs which one); characters which 1263 (@pxref{Specify Coding}) to tell Emacs which one); characters which
1254 can't be encoded in that coding system are displayed as @samp{?} by 1264 can't be encoded in that coding system are displayed as @samp{?} by
1255 default. 1265 default.
1257 Windowing terminals can display a broader range of characters, but 1267 Windowing terminals can display a broader range of characters, but
1258 you may not have fonts installed for all of them; characters that have 1268 you may not have fonts installed for all of them; characters that have
1259 no font appear as a hollow box. 1269 no font appear as a hollow box.
1260 1270
1261 If you use Latin-1 characters but your terminal can't display 1271 If you use Latin-1 characters but your terminal can't display
1262 Latin-1, you can arrange to display mnemonic ASCII sequences 1272 Latin-1, you can arrange to display mnemonic @acronym{ASCII} sequences
1263 instead, e.g.@: @samp{"o} for o-umlaut. Load the library 1273 instead, e.g.@: @samp{"o} for o-umlaut. Load the library
1264 @file{iso-ascii} to do this. 1274 @file{iso-ascii} to do this.
1265 1275
1266 @vindex latin1-display 1276 @vindex latin1-display
1267 If your terminal can display Latin-1, you can display characters 1277 If your terminal can display Latin-1, you can display characters
1268 from other European character sets using a mixture of equivalent 1278 from other European character sets using a mixture of equivalent
1269 Latin-1 characters and ASCII mnemonics. Use the Custom option 1279 Latin-1 characters and @acronym{ASCII} mnemonics. Use the Custom option
1270 @code{latin1-display} to enable this. The mnemonic ASCII 1280 @code{latin1-display} to enable this. The mnemonic @acronym{ASCII}
1271 sequences mostly correspond to those of the prefix input methods. 1281 sequences mostly correspond to those of the prefix input methods.
1272 1282
1273 @node Single-Byte Character Support 1283 @node Single-Byte Character Support
1274 @section Single-byte Character Set Support 1284 @section Single-byte Character Set Support
1275 1285
1286 set-language-environment} and specify a suitable language environment 1296 set-language-environment} and specify a suitable language environment
1287 such as @samp{Latin-@var{n}}. 1297 such as @samp{Latin-@var{n}}.
1288 1298
1289 For more information about unibyte operation, see @ref{Enabling 1299 For more information about unibyte operation, see @ref{Enabling
1290 Multibyte}. Note particularly that you probably want to ensure that 1300 Multibyte}. Note particularly that you probably want to ensure that
1291 your initialization files are read as unibyte if they contain non-ASCII 1301 your initialization files are read as unibyte if they contain non-@acronym{ASCII}
1292 characters. 1302 characters.
1293 1303
1294 @vindex unibyte-display-via-language-environment 1304 @vindex unibyte-display-via-language-environment
1295 Emacs can also display those characters, provided the terminal or font 1305 Emacs can also display those characters, provided the terminal or font
1296 in use supports them. This works automatically. Alternatively, if you 1306 in use supports them. This works automatically. Alternatively, if you
1300 this, set the variable @code{unibyte-display-via-language-environment} 1310 this, set the variable @code{unibyte-display-via-language-environment}
1301 to a non-@code{nil} value. 1311 to a non-@code{nil} value.
1302 1312
1303 @cindex @code{iso-ascii} library 1313 @cindex @code{iso-ascii} library
1304 If your terminal does not support display of the Latin-1 character 1314 If your terminal does not support display of the Latin-1 character
1305 set, Emacs can display these characters as ASCII sequences which at 1315 set, Emacs can display these characters as @acronym{ASCII} sequences which at
1306 least give you a clear idea of what the characters are. To do this, 1316 least give you a clear idea of what the characters are. To do this,
1307 load the library @code{iso-ascii}. Similar libraries for other 1317 load the library @code{iso-ascii}. Similar libraries for other
1308 Latin-@var{n} character sets could be implemented, but we don't have 1318 Latin-@var{n} character sets could be implemented, but we don't have
1309 them yet. 1319 them yet.
1310 1320
1313 Normally non-ISO-8859 characters (decimal codes between 128 and 159 1323 Normally non-ISO-8859 characters (decimal codes between 128 and 159
1314 inclusive) are displayed as octal escapes. You can change this for 1324 inclusive) are displayed as octal escapes. You can change this for
1315 non-standard ``extended'' versions of ISO-8859 character sets by using the 1325 non-standard ``extended'' versions of ISO-8859 character sets by using the
1316 function @code{standard-display-8bit} in the @code{disp-table} library. 1326 function @code{standard-display-8bit} in the @code{disp-table} library.
1317 1327
1318 There are several ways you can input single-byte non-ASCII 1328 There are several ways you can input single-byte non-@acronym{ASCII}
1319 characters: 1329 characters:
1320 1330
1321 @itemize @bullet 1331 @itemize @bullet
1322 @cindex 8-bit input 1332 @cindex 8-bit input
1323 @item 1333 @item
1324 If your keyboard can generate character codes 128 (decimal) and up, 1334 If your keyboard can generate character codes 128 (decimal) and up,
1325 representing non-ASCII characters, you can type those character codes 1335 representing non-@acronym{ASCII} characters, you can type those character codes
1326 directly. 1336 directly.
1327 1337
1328 On a windowing terminal, you should not need to do anything special to 1338 On a windowing terminal, you should not need to do anything special to
1329 use these keys; they should simply work. On a text-only terminal, you 1339 use these keys; they should simply work. On a text-only terminal, you
1330 should use the command @code{M-x set-keyboard-coding-system} or the 1340 should use the command @code{M-x set-keyboard-coding-system} or the
1337 @kbd{Compose} or @kbd{AltGr} keys. @xref{User Input}. 1347 @kbd{Compose} or @kbd{AltGr} keys. @xref{User Input}.
1338 1348
1339 @item 1349 @item
1340 You can use an input method for the selected language environment. 1350 You can use an input method for the selected language environment.
1341 @xref{Input Methods}. When you use an input method in a unibyte buffer, 1351 @xref{Input Methods}. When you use an input method in a unibyte buffer,
1342 the non-ASCII character you specify with it is converted to unibyte. 1352 the non-@acronym{ASCII} character you specify with it is converted to unibyte.
1343 1353
1344 @kindex C-x 8 1354 @kindex C-x 8
1345 @cindex @code{iso-transl} library 1355 @cindex @code{iso-transl} library
1346 @cindex compose character 1356 @cindex compose character
1347 @cindex dead character 1357 @cindex dead character
1348 @item 1358 @item
1349 For Latin-1 only, you can use the 1359 For Latin-1 only, you can use the
1350 key @kbd{C-x 8} as a ``compose character'' prefix for entry of 1360 key @kbd{C-x 8} as a ``compose character'' prefix for entry of
1351 non-ASCII Latin-1 printing characters. @kbd{C-x 8} is good for 1361 non-@acronym{ASCII} Latin-1 printing characters. @kbd{C-x 8} is good for
1352 insertion (in the minibuffer as well as other buffers), for searching, 1362 insertion (in the minibuffer as well as other buffers), for searching,
1353 and in any other context where a key sequence is allowed. 1363 and in any other context where a key sequence is allowed.
1354 1364
1355 @kbd{C-x 8} works by loading the @code{iso-transl} library. Once that 1365 @kbd{C-x 8} works by loading the @code{iso-transl} library. Once that
1356 library is loaded, the @key{ALT} modifier key, if you have one, serves 1366 library is loaded, the @key{ALT} modifier key, if you have one, serves
1378 @cindex charsets 1388 @cindex charsets
1379 1389
1380 Emacs groups all supported characters into disjoint @dfn{charsets}. 1390 Emacs groups all supported characters into disjoint @dfn{charsets}.
1381 Each character code belongs to one and only one charset. For 1391 Each character code belongs to one and only one charset. For
1382 historical reasons, Emacs typically divides an 8-bit character code 1392 historical reasons, Emacs typically divides an 8-bit character code
1383 for an extended version of ASCII into two charsets: ASCII, which 1393 for an extended version of @acronym{ASCII} into two charsets: @acronym{ASCII}, which
1384 covers the codes 0 through 127, plus another charset which covers the 1394 covers the codes 0 through 127, plus another charset which covers the
1385 ``right-hand part'' (the codes 128 and up). For instance, the 1395 ``right-hand part'' (the codes 128 and up). For instance, the
1386 characters of Latin-1 include the Emacs charset @code{ascii} plus the 1396 characters of Latin-1 include the Emacs charset @code{ascii} plus the
1387 Emacs charset @code{latin-iso8859-1}. 1397 Emacs charset @code{latin-iso8859-1}.
1388 1398
1402 charset name and displays information about that charset, including 1412 charset name and displays information about that charset, including
1403 its internal representation within Emacs. 1413 its internal representation within Emacs.
1404 1414
1405 To find out which charset a character in the buffer belongs to, 1415 To find out which charset a character in the buffer belongs to,
1406 put point before it and type @kbd{C-u C-x =}. 1416 put point before it and type @kbd{C-u C-x =}.
1417
1418 @ignore
1419 arch-tag: 310ba60d-31ef-4ce7-91f1-f282dd57b6b3
1420 @end ignore