Mercurial > emacs
comparison man/mule.texi @ 89909:68c22ea6027c
Sync to HEAD
author | Kenichi Handa <handa@m17n.org> |
---|---|
date | Fri, 16 Apr 2004 12:51:06 +0000 |
parents | 375f2633d815 |
children | f2ebccfa87d4 |
comparison
equal
deleted
inserted
replaced
89908:ee1402f7b568 | 89909:68c22ea6027c |
---|---|
47 Emacs allows editing text with international characters by supporting | 47 Emacs allows editing text with international characters by supporting |
48 all the related activities: | 48 all the related activities: |
49 | 49 |
50 @itemize @bullet | 50 @itemize @bullet |
51 @item | 51 @item |
52 You can visit files with non-ASCII characters, save non-ASCII text, and | 52 You can visit files with non-@acronym{ASCII} characters, save non-@acronym{ASCII} text, and |
53 pass non-ASCII text between Emacs and programs it invokes (such as | 53 pass non-@acronym{ASCII} text between Emacs and programs it invokes (such as |
54 compilers, spell-checkers, and mailers). Setting your language | 54 compilers, spell-checkers, and mailers). Setting your language |
55 environment (@pxref{Language Environments}) takes care of setting up the | 55 environment (@pxref{Language Environments}) takes care of setting up the |
56 coding systems and other options for a specific language or culture. | 56 coding systems and other options for a specific language or culture. |
57 Alternatively, you can specify how Emacs should encode or decode text | 57 Alternatively, you can specify how Emacs should encode or decode text |
58 for each command; see @ref{Specify Coding}. | 58 for each command; see @ref{Specify Coding}. |
59 | 59 |
60 @item | 60 @item |
61 You can display non-ASCII characters encoded by the various scripts. | 61 You can display non-@acronym{ASCII} characters encoded by the various scripts. |
62 This works by using appropriate fonts on X and similar graphics | 62 This works by using appropriate fonts on X and similar graphics |
63 displays (@pxref{Defining Fontsets}), and by sending special codes to | 63 displays (@pxref{Defining Fontsets}), and by sending special codes to |
64 text-only displays (@pxref{Specify Coding}). If some characters are | 64 text-only displays (@pxref{Specify Coding}). If some characters are |
65 displayed incorrectly, refer to @ref{Undisplayable Characters}, which | 65 displayed incorrectly, refer to @ref{Undisplayable Characters}, which |
66 describes possible problems and explains how to solve them. | 66 describes possible problems and explains how to solve them. |
67 | 67 |
68 @item | 68 @item |
69 You can insert non-ASCII characters or search for them. To do that, | 69 You can insert non-@acronym{ASCII} characters or search for them. To do that, |
70 you can specify an input method (@pxref{Select Input Method}) suitable | 70 you can specify an input method (@pxref{Select Input Method}) suitable |
71 for your language, or use the default input method set up when you set | 71 for your language, or use the default input method set up when you set |
72 your language environment. (Emacs input methods are part of the Leim | 72 your language environment. (Emacs input methods are part of the Leim |
73 package, which must be installed for you to be able to use them.) If | 73 package, which must be installed for you to be able to use them.) If |
74 your keyboard can produce non-ASCII characters, you can select an | 74 your keyboard can produce non-@acronym{ASCII} characters, you can select an |
75 appropriate keyboard coding system (@pxref{Specify Coding}), and Emacs | 75 appropriate keyboard coding system (@pxref{Specify Coding}), and Emacs |
76 will accept those characters. Latin-1 characters can also be input by | 76 will accept those characters. Latin-1 characters can also be input by |
77 using the @kbd{C-x 8} prefix, see @ref{Single-Byte Character Support, | 77 using the @kbd{C-x 8} prefix, see @ref{Single-Byte Character Support, |
78 C-x 8}. On X Window systems, your locale should be set to an | 78 C-x 8}. On X Window systems, your locale should be set to an |
79 appropriate value to make sure Emacs interprets keyboard input | 79 appropriate value to make sure Emacs interprets keyboard input |
108 | 108 |
109 The users of international character sets and scripts have established | 109 The users of international character sets and scripts have established |
110 many more-or-less standard coding systems for storing files. Emacs | 110 many more-or-less standard coding systems for storing files. Emacs |
111 internally uses a single multibyte character encoding, so that it can | 111 internally uses a single multibyte character encoding, so that it can |
112 intermix characters from all these scripts in a single buffer or string. | 112 intermix characters from all these scripts in a single buffer or string. |
113 This encoding represents each non-ASCII character as a sequence of bytes | 113 This encoding represents each non-@acronym{ASCII} character as a sequence of bytes |
114 in the range 0200 through 0377. Emacs translates between the multibyte | 114 in the range 0200 through 0377. Emacs translates between the multibyte |
115 character encoding and various other coding systems when reading and | 115 character encoding and various other coding systems when reading and |
116 writing files, when exchanging data with subprocesses, and (in some | 116 writing files, when exchanging data with subprocesses, and (in some |
117 cases) in the @kbd{C-q} command (@pxref{Multibyte Conversion}). | 117 cases) in the @kbd{C-q} command (@pxref{Multibyte Conversion}). |
118 | 118 |
185 in that buffer. | 185 in that buffer. |
186 | 186 |
187 @cindex Lisp files, and multibyte operation | 187 @cindex Lisp files, and multibyte operation |
188 @cindex multibyte operation, and Lisp files | 188 @cindex multibyte operation, and Lisp files |
189 @cindex unibyte operation, and Lisp files | 189 @cindex unibyte operation, and Lisp files |
190 @cindex init file, and non-ASCII characters | 190 @cindex init file, and non-@acronym{ASCII} characters |
191 @cindex environment variables, and non-ASCII characters | 191 @cindex environment variables, and non-@acronym{ASCII} characters |
192 With @samp{--unibyte}, multibyte strings are not created during | 192 With @samp{--unibyte}, multibyte strings are not created during |
193 initialization from the values of environment variables, | 193 initialization from the values of environment variables, |
194 @file{/etc/passwd} entries etc.@: that contain non-ASCII 8-bit | 194 @file{/etc/passwd} entries etc.@: that contain non-@acronym{ASCII} 8-bit |
195 characters. | 195 characters. |
196 | 196 |
197 Emacs normally loads Lisp files as multibyte, regardless of whether | 197 Emacs normally loads Lisp files as multibyte, regardless of whether |
198 you used @samp{--unibyte}. This includes the Emacs initialization | 198 you used @samp{--unibyte}. This includes the Emacs initialization |
199 file, @file{.emacs}, and the initialization files of Emacs packages | 199 file, @file{.emacs}, and the initialization files of Emacs packages |
280 @code{locale-charset-language-names} and @code{locale-language-names}, | 280 @code{locale-charset-language-names} and @code{locale-language-names}, |
281 and selects the corresponding language environment if a match is found. | 281 and selects the corresponding language environment if a match is found. |
282 (The former variable overrides the latter.) It also adjusts the display | 282 (The former variable overrides the latter.) It also adjusts the display |
283 table and terminal coding system, the locale coding system, the | 283 table and terminal coding system, the locale coding system, the |
284 preferred coding system as needed for the locale, and---last but not | 284 preferred coding system as needed for the locale, and---last but not |
285 least---the way Emacs decodes non-ASCII characters sent by your keyboard. | 285 least---the way Emacs decodes non-@acronym{ASCII} characters sent by your keyboard. |
286 | 286 |
287 If you modify the @env{LC_ALL}, @env{LC_CTYPE}, or @env{LANG} | 287 If you modify the @env{LC_ALL}, @env{LC_CTYPE}, or @env{LANG} |
288 environment variables while running Emacs, you may want to invoke the | 288 environment variables while running Emacs, you may want to invoke the |
289 @code{set-locale-environment} function afterwards to readjust the | 289 @code{set-locale-environment} function afterwards to readjust the |
290 language environment from the new locale. | 290 language environment from the new locale. |
344 specifically for interactive input. In Emacs, typically each language | 344 specifically for interactive input. In Emacs, typically each language |
345 has its own input method; sometimes several languages which use the same | 345 has its own input method; sometimes several languages which use the same |
346 characters can share one input method. A few languages support several | 346 characters can share one input method. A few languages support several |
347 input methods. | 347 input methods. |
348 | 348 |
349 The simplest kind of input method works by mapping ASCII letters | 349 The simplest kind of input method works by mapping @acronym{ASCII} letters |
350 into another alphabet; this allows you to use one other alphabet | 350 into another alphabet; this allows you to use one other alphabet |
351 instead of ASCII. The Greek and Russian input methods | 351 instead of @acronym{ASCII}. The Greek and Russian input methods |
352 work this way. | 352 work this way. |
353 | 353 |
354 A more powerful technique is composition: converting sequences of | 354 A more powerful technique is composition: converting sequences of |
355 characters into one letter. Many European input methods use composition | 355 characters into one letter. Many European input methods use composition |
356 to produce a single non-ASCII letter from a sequence that consists of a | 356 to produce a single non-@acronym{ASCII} letter from a sequence that consists of a |
357 letter followed by accent characters (or vice versa). For example, some | 357 letter followed by accent characters (or vice versa). For example, some |
358 methods convert the sequence @kbd{a'} into a single accented letter. | 358 methods convert the sequence @kbd{a'} into a single accented letter. |
359 These input methods have no special commands of their own; all they do | 359 These input methods have no special commands of their own; all they do |
360 is compose sequences of printing characters. | 360 is compose sequences of printing characters. |
361 | 361 |
478 language environment that it is meant to be used with. The variable | 478 language environment that it is meant to be used with. The variable |
479 @code{current-input-method} records which input method is selected. | 479 @code{current-input-method} records which input method is selected. |
480 | 480 |
481 @findex toggle-input-method | 481 @findex toggle-input-method |
482 @kindex C-\ | 482 @kindex C-\ |
483 Input methods use various sequences of ASCII characters to stand for | 483 Input methods use various sequences of @acronym{ASCII} characters to stand for |
484 non-ASCII characters. Sometimes it is useful to turn off the input | 484 non-@acronym{ASCII} characters. Sometimes it is useful to turn off the input |
485 method temporarily. To do this, type @kbd{C-\} | 485 method temporarily. To do this, type @kbd{C-\} |
486 (@code{toggle-input-method}). To reenable the input method, type | 486 (@code{toggle-input-method}). To reenable the input method, type |
487 @kbd{C-\} again. | 487 @kbd{C-\} again. |
488 | 488 |
489 If you type @kbd{C-\} and you have not yet selected an input method, | 489 If you type @kbd{C-\} and you have not yet selected an input method, |
532 To display a list of all the supported input methods, type @kbd{M-x | 532 To display a list of all the supported input methods, type @kbd{M-x |
533 list-input-methods}. The list gives information about each input | 533 list-input-methods}. The list gives information about each input |
534 method, including the string that stands for it in the mode line. | 534 method, including the string that stands for it in the mode line. |
535 | 535 |
536 @node Multibyte Conversion | 536 @node Multibyte Conversion |
537 @section Unibyte and Multibyte Non-ASCII characters | 537 @section Unibyte and Multibyte Non-@acronym{ASCII} characters |
538 | 538 |
539 When multibyte characters are enabled, character codes 0240 (octal) | 539 When multibyte characters are enabled, character codes 0240 (octal) |
540 through 0377 (octal) are not really legitimate in the buffer. The valid | 540 through 0377 (octal) are not really legitimate in the buffer. The valid |
541 non-ASCII printing characters have codes that start from 0400. | 541 non-@acronym{ASCII} printing characters have codes that start from 0400. |
542 | 542 |
543 If you type a self-inserting character in the range 0240 through | 543 If you type a self-inserting character in the range 0240 through |
544 0377, or if you use @kbd{C-q} to insert one, Emacs assumes you | 544 0377, or if you use @kbd{C-q} to insert one, Emacs assumes you |
545 intended to use one of the ISO Latin-@var{n} character sets, and | 545 intended to use one of the ISO Latin-@var{n} character sets, and |
546 converts it to the Emacs code representing that Latin-@var{n} | 546 converts it to the Emacs code representing that Latin-@var{n} |
588 creating the coding system for the codepage, you can use it as any | 588 creating the coding system for the codepage, you can use it as any |
589 other coding system. For example, to visit a file encoded in codepage | 589 other coding system. For example, to visit a file encoded in codepage |
590 850, type @kbd{C-x @key{RET} c cp850 @key{RET} C-x C-f @var{filename} | 590 850, type @kbd{C-x @key{RET} c cp850 @key{RET} C-x C-f @var{filename} |
591 @key{RET}}. | 591 @key{RET}}. |
592 | 592 |
593 In addition to converting various representations of non-ASCII | 593 In addition to converting various representations of non-@acronym{ASCII} |
594 characters, a coding system can perform end-of-line conversion. Emacs | 594 characters, a coding system can perform end-of-line conversion. Emacs |
595 handles three different conventions for how to separate lines in a file: | 595 handles three different conventions for how to separate lines in a file: |
596 newline, carriage-return linefeed, and just carriage-return. | 596 newline, carriage-return linefeed, and just carriage-return. |
597 | 597 |
598 @table @kbd | 598 @table @kbd |
659 predictable. For example, the coding system @code{iso-latin-1} has | 659 predictable. For example, the coding system @code{iso-latin-1} has |
660 variants @code{iso-latin-1-unix}, @code{iso-latin-1-dos} and | 660 variants @code{iso-latin-1-unix}, @code{iso-latin-1-dos} and |
661 @code{iso-latin-1-mac}. | 661 @code{iso-latin-1-mac}. |
662 | 662 |
663 The coding system @code{raw-text} is good for a file which is mainly | 663 The coding system @code{raw-text} is good for a file which is mainly |
664 ASCII text, but may contain byte values above 127 which are not meant to | 664 @acronym{ASCII} text, but may contain byte values above 127 which are not meant to |
665 encode non-ASCII characters. With @code{raw-text}, Emacs copies those | 665 encode non-@acronym{ASCII} characters. With @code{raw-text}, Emacs copies those |
666 byte values unchanged, and sets @code{enable-multibyte-characters} to | 666 byte values unchanged, and sets @code{enable-multibyte-characters} to |
667 @code{nil} in the current buffer so that they will be interpreted | 667 @code{nil} in the current buffer so that they will be interpreted |
668 properly. @code{raw-text} handles end-of-line conversion in the usual | 668 properly. @code{raw-text} handles end-of-line conversion in the usual |
669 way, based on the data encountered, and has the usual three variants to | 669 way, based on the data encountered, and has the usual three variants to |
670 specify the kind of end-of-line conversion to use. | 670 specify the kind of end-of-line conversion to use. |
671 | 671 |
672 In contrast, the coding system @code{no-conversion} specifies no | 672 In contrast, the coding system @code{no-conversion} specifies no |
673 character code conversion at all---none for non-ASCII byte values and | 673 character code conversion at all---none for non-@acronym{ASCII} byte values and |
674 none for end of line. This is useful for reading or writing binary | 674 none for end of line. This is useful for reading or writing binary |
675 files, tar files, and other files that must be examined verbatim. It, | 675 files, tar files, and other files that must be examined verbatim. It, |
676 too, sets @code{enable-multibyte-characters} to @code{nil}. | 676 too, sets @code{enable-multibyte-characters} to @code{nil}. |
677 | 677 |
678 The easiest way to edit a file with no conversion of any kind is with | 678 The easiest way to edit a file with no conversion of any kind is with |
679 the @kbd{M-x find-file-literally} command. This uses | 679 the @kbd{M-x find-file-literally} command. This uses |
680 @code{no-conversion}, and also suppresses other Emacs features that | 680 @code{no-conversion}, and also suppresses other Emacs features that |
681 might convert the file contents before you see them. @xref{Visiting}. | 681 might convert the file contents before you see them. @xref{Visiting}. |
682 | 682 |
683 The coding system @code{emacs-mule} means that the file contains | 683 The coding system @code{emacs-mule} means that the file contains |
684 non-ASCII characters stored with the internal Emacs encoding. It | 684 non-@acronym{ASCII} characters stored with the internal Emacs encoding. It |
685 handles end-of-line conversion based on the data encountered, and has | 685 handles end-of-line conversion based on the data encountered, and has |
686 the usual three variants to specify the kind of end-of-line conversion. | 686 the usual three variants to specify the kind of end-of-line conversion. |
687 | 687 |
688 @node Recognize Coding | 688 @node Recognize Coding |
689 @section Recognizing Coding Systems | 689 @section Recognizing Coding Systems |
772 the buffer. | 772 the buffer. |
773 | 773 |
774 The default value of @code{inhibit-iso-escape-detection} is | 774 The default value of @code{inhibit-iso-escape-detection} is |
775 @code{nil}. We recommend that you not change it permanently, only for | 775 @code{nil}. We recommend that you not change it permanently, only for |
776 one specific operation. That's because many Emacs Lisp source files | 776 one specific operation. That's because many Emacs Lisp source files |
777 in the Emacs distribution contain non-ASCII characters encoded in the | 777 in the Emacs distribution contain non-@acronym{ASCII} characters encoded in the |
778 coding system @code{iso-2022-7bit}, and they won't be | 778 coding system @code{iso-2022-7bit}, and they won't be |
779 decoded correctly when you visit those files if you suppress the | 779 decoded correctly when you visit those files if you suppress the |
780 escape sequence detection. | 780 escape sequence detection. |
781 | 781 |
782 @vindex coding | 782 @vindex coding |
815 of the mode line (@pxref{Mode Line}), or type @kbd{C-h C @key{RET}}. | 815 of the mode line (@pxref{Mode Line}), or type @kbd{C-h C @key{RET}}. |
816 | 816 |
817 @findex unify-8859-on-decoding-mode | 817 @findex unify-8859-on-decoding-mode |
818 The command @code{unify-8859-on-decoding-mode} enables a mode that | 818 The command @code{unify-8859-on-decoding-mode} enables a mode that |
819 ``unifies'' the Latin alphabets when decoding text. This works by | 819 ``unifies'' the Latin alphabets when decoding text. This works by |
820 converting all non-ASCII Latin-@var{n} characters to either Latin-1 or | 820 converting all non-@acronym{ASCII} Latin-@var{n} characters to either Latin-1 or |
821 Unicode characters. This way it is easier to use various | 821 Unicode characters. This way it is easier to use various |
822 Latin-@var{n} alphabets together. In a future Emacs version we hope | 822 Latin-@var{n} alphabets together. In a future Emacs version we hope |
823 to move towards full Unicode support and complete unification of | 823 to move towards full Unicode support and complete unification of |
824 character sets. | 824 character sets. |
825 | 825 |
835 | 835 |
836 You can insert any possible character into any Emacs buffer, but | 836 You can insert any possible character into any Emacs buffer, but |
837 most coding systems can only handle some of the possible characters. | 837 most coding systems can only handle some of the possible characters. |
838 This means that it is possible for you to insert characters that | 838 This means that it is possible for you to insert characters that |
839 cannot be encoded with the coding system that will be used to save the | 839 cannot be encoded with the coding system that will be used to save the |
840 buffer. For example, you could start with an ASCII file and insert a | 840 buffer. For example, you could start with an @acronym{ASCII} file and insert a |
841 few Latin-1 characters into it, or you could edit a text file in | 841 few Latin-1 characters into it, or you could edit a text file in |
842 Polish encoded in @code{iso-8859-2} and add some Russian words to it. | 842 Polish encoded in @code{iso-8859-2} and add some Russian words to it. |
843 When you save the buffer, Emacs cannot use the current value of | 843 When you save the buffer, Emacs cannot use the current value of |
844 @code{buffer-file-coding-system}, because the characters you added | 844 @code{buffer-file-coding-system}, because the characters you added |
845 cannot be encoded by that coding system. | 845 cannot be encoded by that coding system. |
915 | 915 |
916 @item C-x @key{RET} x @var{coding} @key{RET} | 916 @item C-x @key{RET} x @var{coding} @key{RET} |
917 Use coding system @var{coding} for transferring selections to and from | 917 Use coding system @var{coding} for transferring selections to and from |
918 other programs through the window system. | 918 other programs through the window system. |
919 | 919 |
920 @item C-x @key{RET} F @var{coding} @key{RET} | |
921 Use coding system @var{coding} for encoding and decoding file | |
922 @emph{names}. This affects the use of non-ASCII characters in file | |
923 names. It has no effect on reading and writing the @emph{contents} of | |
924 files. | |
925 | |
920 @item C-x @key{RET} X @var{coding} @key{RET} | 926 @item C-x @key{RET} X @var{coding} @key{RET} |
921 Use coding system @var{coding} for transferring @emph{one} | 927 Use coding system @var{coding} for transferring @emph{one} |
922 selection---the next one---to or from the window system. | 928 selection---the next one---to or from the window system. |
923 @end table | 929 @end table |
924 | 930 |
991 @vindex keyboard-coding-system | 997 @vindex keyboard-coding-system |
992 The command @kbd{C-x @key{RET} k} (@code{set-keyboard-coding-system}) | 998 The command @kbd{C-x @key{RET} k} (@code{set-keyboard-coding-system}) |
993 or the Custom option @code{keyboard-coding-system} | 999 or the Custom option @code{keyboard-coding-system} |
994 specifies the coding system for keyboard input. Character-code | 1000 specifies the coding system for keyboard input. Character-code |
995 translation of keyboard input is useful for terminals with keys that | 1001 translation of keyboard input is useful for terminals with keys that |
996 send non-ASCII graphic characters---for example, some terminals designed | 1002 send non-@acronym{ASCII} graphic characters---for example, some terminals designed |
997 for ISO Latin-1 or subsets of it. | 1003 for ISO Latin-1 or subsets of it. |
998 | 1004 |
999 By default, keyboard input is translated based on your system locale | 1005 By default, keyboard input is translated based on your system locale |
1000 setting. If your terminal does not really support the encoding | 1006 setting. If your terminal does not really support the encoding |
1001 implied by your locale (for example, if you find it inserts a | 1007 implied by your locale (for example, if you find it inserts a |
1002 non-ASCII character if you type @kbd{M-i}), you will need to set | 1008 non-@acronym{ASCII} character if you type @kbd{M-i}), you will need to set |
1003 @code{keyboard-coding-system} to @code{nil} to turn off encoding. | 1009 @code{keyboard-coding-system} to @code{nil} to turn off encoding. |
1004 You can do this by putting | 1010 You can do this by putting |
1005 | 1011 |
1006 @lisp | 1012 @lisp |
1007 (set-keyboard-coding-system nil) | 1013 (set-keyboard-coding-system nil) |
1012 | 1018 |
1013 There is a similarity between using a coding system translation for | 1019 There is a similarity between using a coding system translation for |
1014 keyboard input, and using an input method: both define sequences of | 1020 keyboard input, and using an input method: both define sequences of |
1015 keyboard input that translate into single characters. However, input | 1021 keyboard input that translate into single characters. However, input |
1016 methods are designed to be convenient for interactive use by humans, and | 1022 methods are designed to be convenient for interactive use by humans, and |
1017 the sequences that are translated are typically sequences of ASCII | 1023 the sequences that are translated are typically sequences of @acronym{ASCII} |
1018 printing characters. Coding systems typically translate sequences of | 1024 printing characters. Coding systems typically translate sequences of |
1019 non-graphic characters. | 1025 non-graphic characters. |
1020 | 1026 |
1021 @kindex C-x RET x | 1027 @kindex C-x RET x |
1022 @kindex C-x RET X | 1028 @kindex C-x RET X |
1041 | 1047 |
1042 The default for translation of process input and output depends on the | 1048 The default for translation of process input and output depends on the |
1043 current language environment. | 1049 current language environment. |
1044 | 1050 |
1045 @vindex file-name-coding-system | 1051 @vindex file-name-coding-system |
1046 @cindex file names with non-ASCII characters | 1052 @cindex file names with non-@acronym{ASCII} characters |
1047 The variable @code{file-name-coding-system} specifies a coding system | 1053 @findex set-file-name-coding-system |
1048 to use for encoding file names. If you set the variable to a coding | 1054 @kindex C-x @key{RET} F |
1049 system name (as a Lisp symbol or a string), Emacs encodes file names | 1055 The variable @code{file-name-coding-system} specifies a coding |
1050 using that coding system for all file operations. This makes it | 1056 system to use for encoding file names. If you set the variable to a |
1051 possible to use non-ASCII characters in file names---or, at least, those | 1057 coding system name (as a Lisp symbol or a string), Emacs encodes file |
1052 non-ASCII characters which the specified coding system can encode. | 1058 names using that coding system for all file operations. This makes it |
1059 possible to use non-@acronym{ASCII} characters in file names---or, at | |
1060 least, those non-@acronym{ASCII} characters which the specified coding | |
1061 system can encode. Use @kbd{C-x @key{RET} F} | |
1062 (@code{set-file-name-coding-system}) to specify this interactively. | |
1053 | 1063 |
1054 If @code{file-name-coding-system} is @code{nil}, Emacs uses a default | 1064 If @code{file-name-coding-system} is @code{nil}, Emacs uses a default |
1055 coding system determined by the selected language environment. In the | 1065 coding system determined by the selected language environment. In the |
1056 default language environment, any non-ASCII characters in file names are | 1066 default language environment, any non-@acronym{ASCII} characters in file names are |
1057 not encoded specially; they appear in the file system using the internal | 1067 not encoded specially; they appear in the file system using the internal |
1058 Emacs representation. | 1068 Emacs representation. |
1059 | 1069 |
1060 @strong{Warning:} if you change @code{file-name-coding-system} (or the | 1070 @strong{Warning:} if you change @code{file-name-coding-system} (or the |
1061 language environment) in the middle of an Emacs session, problems can | 1071 language environment) in the middle of an Emacs session, problems can |
1065 these buffers under the visited file name, saving may use the wrong file | 1075 these buffers under the visited file name, saving may use the wrong file |
1066 name, or it may get an error. If such a problem happens, use @kbd{C-x | 1076 name, or it may get an error. If such a problem happens, use @kbd{C-x |
1067 C-w} to specify a new file name for that buffer. | 1077 C-w} to specify a new file name for that buffer. |
1068 | 1078 |
1069 @vindex locale-coding-system | 1079 @vindex locale-coding-system |
1070 @cindex decoding non-ASCII keyboard input on X | 1080 @cindex decoding non-@acronym{ASCII} keyboard input on X |
1071 The variable @code{locale-coding-system} specifies a coding system | 1081 The variable @code{locale-coding-system} specifies a coding system |
1072 to use when encoding and decoding system strings such as system error | 1082 to use when encoding and decoding system strings such as system error |
1073 messages and @code{format-time-string} formats and time stamps. That | 1083 messages and @code{format-time-string} formats and time stamps. That |
1074 coding system is also used for decoding non-ASCII keyboard input on X | 1084 coding system is also used for decoding non-@acronym{ASCII} keyboard input on X |
1075 Window systems. You should choose a coding system that is compatible | 1085 Window systems. You should choose a coding system that is compatible |
1076 with the underlying system's text representation, which is normally | 1086 with the underlying system's text representation, which is normally |
1077 specified by one of the environment variables @env{LC_ALL}, | 1087 specified by one of the environment variables @env{LC_ALL}, |
1078 @env{LC_CTYPE}, and @env{LANG}. (The first one, in the order | 1088 @env{LC_CTYPE}, and @env{LANG}. (The first one, in the order |
1079 specified above, whose value is nonempty is the one that determines | 1089 specified above, whose value is nonempty is the one that determines |
1099 characters.@footnote{The Emacs installation instructions have information on | 1109 characters.@footnote{The Emacs installation instructions have information on |
1100 additional font support.} | 1110 additional font support.} |
1101 | 1111 |
1102 Emacs creates two fontsets automatically: the @dfn{standard fontset} | 1112 Emacs creates two fontsets automatically: the @dfn{standard fontset} |
1103 and the @dfn{startup fontset}. The standard fontset is most likely to | 1113 and the @dfn{startup fontset}. The standard fontset is most likely to |
1104 have fonts for a wide variety of non-ASCII characters; however, this is | 1114 have fonts for a wide variety of non-@acronym{ASCII} characters; however, this is |
1105 not the default for Emacs to use. (By default, Emacs tries to find a | 1115 not the default for Emacs to use. (By default, Emacs tries to find a |
1106 font that has bold and italic variants.) You can specify use of the | 1116 font that has bold and italic variants.) You can specify use of the |
1107 standard fontset with the @samp{-fn} option, or with the @samp{Font} X | 1117 standard fontset with the @samp{-fn} option, or with the @samp{Font} X |
1108 resource (@pxref{Font X}). For example, | 1118 resource (@pxref{Font X}). For example, |
1109 | 1119 |
1135 Bold, italic, and bold-italic variants of the standard fontset are | 1145 Bold, italic, and bold-italic variants of the standard fontset are |
1136 created automatically. Their names have @samp{bold} instead of | 1146 created automatically. Their names have @samp{bold} instead of |
1137 @samp{medium}, or @samp{i} instead of @samp{r}, or both. | 1147 @samp{medium}, or @samp{i} instead of @samp{r}, or both. |
1138 | 1148 |
1139 @cindex startup fontset | 1149 @cindex startup fontset |
1140 If you specify a default ASCII font with the @samp{Font} resource or | 1150 If you specify a default @acronym{ASCII} font with the @samp{Font} resource or |
1141 the @samp{-fn} argument, Emacs generates a fontset from it | 1151 the @samp{-fn} argument, Emacs generates a fontset from it |
1142 automatically. This is the @dfn{startup fontset} and its name is | 1152 automatically. This is the @dfn{startup fontset} and its name is |
1143 @code{fontset-startup}. It does this by replacing the @var{foundry}, | 1153 @code{fontset-startup}. It does this by replacing the @var{foundry}, |
1144 @var{family}, @var{add_style}, and @var{average_width} fields of the | 1154 @var{family}, @var{add_style}, and @var{average_width} fields of the |
1145 font name with @samp{*}, replacing @var{charset_registry} field with | 1155 font name with @samp{*}, replacing @var{charset_registry} field with |
1189 font to use for that character set. You can use this construct any | 1199 font to use for that character set. You can use this construct any |
1190 number of times in defining one fontset. | 1200 number of times in defining one fontset. |
1191 | 1201 |
1192 For the other character sets, Emacs chooses a font based on | 1202 For the other character sets, Emacs chooses a font based on |
1193 @var{fontpattern}. It replaces @samp{fontset-@var{alias}} with values | 1203 @var{fontpattern}. It replaces @samp{fontset-@var{alias}} with values |
1194 that describe the character set. For the ASCII character font, | 1204 that describe the character set. For the @acronym{ASCII} character font, |
1195 @samp{fontset-@var{alias}} is replaced with @samp{ISO8859-1}. | 1205 @samp{fontset-@var{alias}} is replaced with @samp{ISO8859-1}. |
1196 | 1206 |
1197 In addition, when several consecutive fields are wildcards, Emacs | 1207 In addition, when several consecutive fields are wildcards, Emacs |
1198 collapses them into a single wildcard. This is to prevent use of | 1208 collapses them into a single wildcard. This is to prevent use of |
1199 auto-scaled fonts. Fonts made by scaling larger fonts are not usable | 1209 auto-scaled fonts. Fonts made by scaling larger fonts are not usable |
1206 @example | 1216 @example |
1207 -*-fixed-medium-r-normal-*-24-*-*-*-*-*-fontset-24 | 1217 -*-fixed-medium-r-normal-*-24-*-*-*-*-*-fontset-24 |
1208 @end example | 1218 @end example |
1209 | 1219 |
1210 @noindent | 1220 @noindent |
1211 the font specification for ASCII characters would be this: | 1221 the font specification for @acronym{ASCII} characters would be this: |
1212 | 1222 |
1213 @example | 1223 @example |
1214 -*-fixed-medium-r-normal-*-24-*-ISO8859-1 | 1224 -*-fixed-medium-r-normal-*-24-*-ISO8859-1 |
1215 @end example | 1225 @end example |
1216 | 1226 |
1245 @xref{Font X}, for more information about font naming in X. | 1255 @xref{Font X}, for more information about font naming in X. |
1246 | 1256 |
1247 @node Undisplayable Characters | 1257 @node Undisplayable Characters |
1248 @section Undisplayable Characters | 1258 @section Undisplayable Characters |
1249 | 1259 |
1250 There may be a some non-ASCII characters that your terminal cannot | 1260 There may be a some non-@acronym{ASCII} characters that your terminal cannot |
1251 display. Most non-windowing terminals support just a single character | 1261 display. Most non-windowing terminals support just a single character |
1252 set (use the variable @code{default-terminal-coding-system} | 1262 set (use the variable @code{default-terminal-coding-system} |
1253 (@pxref{Specify Coding}) to tell Emacs which one); characters which | 1263 (@pxref{Specify Coding}) to tell Emacs which one); characters which |
1254 can't be encoded in that coding system are displayed as @samp{?} by | 1264 can't be encoded in that coding system are displayed as @samp{?} by |
1255 default. | 1265 default. |
1257 Windowing terminals can display a broader range of characters, but | 1267 Windowing terminals can display a broader range of characters, but |
1258 you may not have fonts installed for all of them; characters that have | 1268 you may not have fonts installed for all of them; characters that have |
1259 no font appear as a hollow box. | 1269 no font appear as a hollow box. |
1260 | 1270 |
1261 If you use Latin-1 characters but your terminal can't display | 1271 If you use Latin-1 characters but your terminal can't display |
1262 Latin-1, you can arrange to display mnemonic ASCII sequences | 1272 Latin-1, you can arrange to display mnemonic @acronym{ASCII} sequences |
1263 instead, e.g.@: @samp{"o} for o-umlaut. Load the library | 1273 instead, e.g.@: @samp{"o} for o-umlaut. Load the library |
1264 @file{iso-ascii} to do this. | 1274 @file{iso-ascii} to do this. |
1265 | 1275 |
1266 @vindex latin1-display | 1276 @vindex latin1-display |
1267 If your terminal can display Latin-1, you can display characters | 1277 If your terminal can display Latin-1, you can display characters |
1268 from other European character sets using a mixture of equivalent | 1278 from other European character sets using a mixture of equivalent |
1269 Latin-1 characters and ASCII mnemonics. Use the Custom option | 1279 Latin-1 characters and @acronym{ASCII} mnemonics. Use the Custom option |
1270 @code{latin1-display} to enable this. The mnemonic ASCII | 1280 @code{latin1-display} to enable this. The mnemonic @acronym{ASCII} |
1271 sequences mostly correspond to those of the prefix input methods. | 1281 sequences mostly correspond to those of the prefix input methods. |
1272 | 1282 |
1273 @node Single-Byte Character Support | 1283 @node Single-Byte Character Support |
1274 @section Single-byte Character Set Support | 1284 @section Single-byte Character Set Support |
1275 | 1285 |
1286 set-language-environment} and specify a suitable language environment | 1296 set-language-environment} and specify a suitable language environment |
1287 such as @samp{Latin-@var{n}}. | 1297 such as @samp{Latin-@var{n}}. |
1288 | 1298 |
1289 For more information about unibyte operation, see @ref{Enabling | 1299 For more information about unibyte operation, see @ref{Enabling |
1290 Multibyte}. Note particularly that you probably want to ensure that | 1300 Multibyte}. Note particularly that you probably want to ensure that |
1291 your initialization files are read as unibyte if they contain non-ASCII | 1301 your initialization files are read as unibyte if they contain non-@acronym{ASCII} |
1292 characters. | 1302 characters. |
1293 | 1303 |
1294 @vindex unibyte-display-via-language-environment | 1304 @vindex unibyte-display-via-language-environment |
1295 Emacs can also display those characters, provided the terminal or font | 1305 Emacs can also display those characters, provided the terminal or font |
1296 in use supports them. This works automatically. Alternatively, if you | 1306 in use supports them. This works automatically. Alternatively, if you |
1300 this, set the variable @code{unibyte-display-via-language-environment} | 1310 this, set the variable @code{unibyte-display-via-language-environment} |
1301 to a non-@code{nil} value. | 1311 to a non-@code{nil} value. |
1302 | 1312 |
1303 @cindex @code{iso-ascii} library | 1313 @cindex @code{iso-ascii} library |
1304 If your terminal does not support display of the Latin-1 character | 1314 If your terminal does not support display of the Latin-1 character |
1305 set, Emacs can display these characters as ASCII sequences which at | 1315 set, Emacs can display these characters as @acronym{ASCII} sequences which at |
1306 least give you a clear idea of what the characters are. To do this, | 1316 least give you a clear idea of what the characters are. To do this, |
1307 load the library @code{iso-ascii}. Similar libraries for other | 1317 load the library @code{iso-ascii}. Similar libraries for other |
1308 Latin-@var{n} character sets could be implemented, but we don't have | 1318 Latin-@var{n} character sets could be implemented, but we don't have |
1309 them yet. | 1319 them yet. |
1310 | 1320 |
1313 Normally non-ISO-8859 characters (decimal codes between 128 and 159 | 1323 Normally non-ISO-8859 characters (decimal codes between 128 and 159 |
1314 inclusive) are displayed as octal escapes. You can change this for | 1324 inclusive) are displayed as octal escapes. You can change this for |
1315 non-standard ``extended'' versions of ISO-8859 character sets by using the | 1325 non-standard ``extended'' versions of ISO-8859 character sets by using the |
1316 function @code{standard-display-8bit} in the @code{disp-table} library. | 1326 function @code{standard-display-8bit} in the @code{disp-table} library. |
1317 | 1327 |
1318 There are several ways you can input single-byte non-ASCII | 1328 There are several ways you can input single-byte non-@acronym{ASCII} |
1319 characters: | 1329 characters: |
1320 | 1330 |
1321 @itemize @bullet | 1331 @itemize @bullet |
1322 @cindex 8-bit input | 1332 @cindex 8-bit input |
1323 @item | 1333 @item |
1324 If your keyboard can generate character codes 128 (decimal) and up, | 1334 If your keyboard can generate character codes 128 (decimal) and up, |
1325 representing non-ASCII characters, you can type those character codes | 1335 representing non-@acronym{ASCII} characters, you can type those character codes |
1326 directly. | 1336 directly. |
1327 | 1337 |
1328 On a windowing terminal, you should not need to do anything special to | 1338 On a windowing terminal, you should not need to do anything special to |
1329 use these keys; they should simply work. On a text-only terminal, you | 1339 use these keys; they should simply work. On a text-only terminal, you |
1330 should use the command @code{M-x set-keyboard-coding-system} or the | 1340 should use the command @code{M-x set-keyboard-coding-system} or the |
1337 @kbd{Compose} or @kbd{AltGr} keys. @xref{User Input}. | 1347 @kbd{Compose} or @kbd{AltGr} keys. @xref{User Input}. |
1338 | 1348 |
1339 @item | 1349 @item |
1340 You can use an input method for the selected language environment. | 1350 You can use an input method for the selected language environment. |
1341 @xref{Input Methods}. When you use an input method in a unibyte buffer, | 1351 @xref{Input Methods}. When you use an input method in a unibyte buffer, |
1342 the non-ASCII character you specify with it is converted to unibyte. | 1352 the non-@acronym{ASCII} character you specify with it is converted to unibyte. |
1343 | 1353 |
1344 @kindex C-x 8 | 1354 @kindex C-x 8 |
1345 @cindex @code{iso-transl} library | 1355 @cindex @code{iso-transl} library |
1346 @cindex compose character | 1356 @cindex compose character |
1347 @cindex dead character | 1357 @cindex dead character |
1348 @item | 1358 @item |
1349 For Latin-1 only, you can use the | 1359 For Latin-1 only, you can use the |
1350 key @kbd{C-x 8} as a ``compose character'' prefix for entry of | 1360 key @kbd{C-x 8} as a ``compose character'' prefix for entry of |
1351 non-ASCII Latin-1 printing characters. @kbd{C-x 8} is good for | 1361 non-@acronym{ASCII} Latin-1 printing characters. @kbd{C-x 8} is good for |
1352 insertion (in the minibuffer as well as other buffers), for searching, | 1362 insertion (in the minibuffer as well as other buffers), for searching, |
1353 and in any other context where a key sequence is allowed. | 1363 and in any other context where a key sequence is allowed. |
1354 | 1364 |
1355 @kbd{C-x 8} works by loading the @code{iso-transl} library. Once that | 1365 @kbd{C-x 8} works by loading the @code{iso-transl} library. Once that |
1356 library is loaded, the @key{ALT} modifier key, if you have one, serves | 1366 library is loaded, the @key{ALT} modifier key, if you have one, serves |
1378 @cindex charsets | 1388 @cindex charsets |
1379 | 1389 |
1380 Emacs groups all supported characters into disjoint @dfn{charsets}. | 1390 Emacs groups all supported characters into disjoint @dfn{charsets}. |
1381 Each character code belongs to one and only one charset. For | 1391 Each character code belongs to one and only one charset. For |
1382 historical reasons, Emacs typically divides an 8-bit character code | 1392 historical reasons, Emacs typically divides an 8-bit character code |
1383 for an extended version of ASCII into two charsets: ASCII, which | 1393 for an extended version of @acronym{ASCII} into two charsets: @acronym{ASCII}, which |
1384 covers the codes 0 through 127, plus another charset which covers the | 1394 covers the codes 0 through 127, plus another charset which covers the |
1385 ``right-hand part'' (the codes 128 and up). For instance, the | 1395 ``right-hand part'' (the codes 128 and up). For instance, the |
1386 characters of Latin-1 include the Emacs charset @code{ascii} plus the | 1396 characters of Latin-1 include the Emacs charset @code{ascii} plus the |
1387 Emacs charset @code{latin-iso8859-1}. | 1397 Emacs charset @code{latin-iso8859-1}. |
1388 | 1398 |
1402 charset name and displays information about that charset, including | 1412 charset name and displays information about that charset, including |
1403 its internal representation within Emacs. | 1413 its internal representation within Emacs. |
1404 | 1414 |
1405 To find out which charset a character in the buffer belongs to, | 1415 To find out which charset a character in the buffer belongs to, |
1406 put point before it and type @kbd{C-u C-x =}. | 1416 put point before it and type @kbd{C-u C-x =}. |
1417 | |
1418 @ignore | |
1419 arch-tag: 310ba60d-31ef-4ce7-91f1-f282dd57b6b3 | |
1420 @end ignore |