Mercurial > emacs
comparison man/mule.texi @ 37584:9a7fd51a92b3
(International): Add an overview of Mule features, with pointers to
detailed description.
(Enabling Multibyte): Describe how to switch a unibyte session to multibyte.
Mention that by default, all sessions are multibyte.
(Coding Systems): Make it clear that cpNNN are coding systems, and should
be used as such.
(Recognize Coding): Explain that Emacs decodes text as part of reading
it. Mention revert-buffer as a means to redecode a file.
author | Eli Zaretskii <eliz@gnu.org> |
---|---|
date | Sun, 06 May 2001 11:27:54 +0000 |
parents | 07200bf360ab |
children | 5a2458f097b0 |
comparison
equal
deleted
inserted
replaced
37583:313d4c5de5ca | 37584:9a7fd51a92b3 |
---|---|
42 ``MULti-lingual Enhancement to GNU Emacs'') | 42 ``MULti-lingual Enhancement to GNU Emacs'') |
43 | 43 |
44 Emacs also supports various encodings of these characters used by | 44 Emacs also supports various encodings of these characters used by |
45 other internationalized software, such as word processors and mailers. | 45 other internationalized software, such as word processors and mailers. |
46 | 46 |
47 Emacs allows editing text with international characters by supporting | |
48 all the related activities: | |
49 | |
50 @itemize @bullet | |
51 @item | |
52 You can visit files with non-ASCII characters, save non-ASCII text, and | |
53 pass non-ASCII text between Emacs and programs it invokes (such as | |
54 compilers, spell-checkers, and mailers). Setting your language | |
55 environment (@pxref{Language Environments}) takes care of setting up the | |
56 coding systems and other options for a specific language or culture. | |
57 Alternatively, you can specify how Emacs should encode or decode text | |
58 for each command; see @ref{Specify Coding}. | |
59 | |
60 @item | |
61 You can display non-ASCII characters encoded by the various scripts. | |
62 This works by using appropriate fonts on X and similar graphics | |
63 displays (@pxref{Defining Fontsets}), and by sending special codes to | |
64 text-only displays (@pxref{Specify Coding}). If some characters are | |
65 displayed incorrectly, refer to @ref{Undisplayable Characters}, which | |
66 describes possible problems and explains how to solve them. | |
67 | |
68 @item | |
69 You can insert non-ASCII characters or search for them. To do that, | |
70 you can specify an input method (@pxref{Select Input Method}) suitable | |
71 for your language, or use the default input method set up when you set | |
72 your language environment. (Emacs input methods are part of the Leim | |
73 package, which must be installed for you to be able to use them.) If | |
74 your keyboard can produce non-ASCII characters, you can select an | |
75 appropriate keyboard coding system (@pxref{Specify Coding}), and Emacs | |
76 will accept those characters. Latin-1 characters can also be input by | |
77 using the @kbd{C-x 8} prefix, see @ref{Single-Byte Character Support, | |
78 C-x 8}. | |
79 @end itemize | |
80 | |
81 The rest of this chapter describes these issues in detail. | |
82 | |
47 @menu | 83 @menu |
48 * International Intro:: Basic concepts of multibyte characters. | 84 * International Intro:: Basic concepts of multibyte characters. |
49 * Enabling Multibyte:: Controlling whether to use multibyte characters. | 85 * Enabling Multibyte:: Controlling whether to use multibyte characters. |
50 * Language Environments:: Setting things up for the language you use. | 86 * Language Environments:: Setting things up for the language you use. |
51 * Input Methods:: Entering text characters not on your keyboard. | 87 * Input Methods:: Entering text characters not on your keyboard. |
119 @end ignore | 155 @end ignore |
120 | 156 |
121 @node Enabling Multibyte | 157 @node Enabling Multibyte |
122 @section Enabling Multibyte Characters | 158 @section Enabling Multibyte Characters |
123 | 159 |
160 @cindex turn multibyte support on or off | |
124 You can enable or disable multibyte character support, either for | 161 You can enable or disable multibyte character support, either for |
125 Emacs as a whole, or for a single buffer. When multibyte characters are | 162 Emacs as a whole, or for a single buffer. When multibyte characters are |
126 disabled in a buffer, then each byte in that buffer represents a | 163 disabled in a buffer, then each byte in that buffer represents a |
127 character, even codes 0200 through 0377. The old features for | 164 character, even codes 0200 through 0377. The old features for |
128 supporting the European character sets, ISO Latin-1 and ISO Latin-2, | 165 supporting the European character sets, ISO Latin-1 and ISO Latin-2, |
131 | 168 |
132 However, there is no need to turn off multibyte character support to | 169 However, there is no need to turn off multibyte character support to |
133 use ISO Latin; the Emacs multibyte character set includes all the | 170 use ISO Latin; the Emacs multibyte character set includes all the |
134 characters in these character sets, and Emacs can translate | 171 characters in these character sets, and Emacs can translate |
135 automatically to and from the ISO codes. | 172 automatically to and from the ISO codes. |
173 | |
174 By default, Emacs starts in multibyte mode, because that allows you to | |
175 use all the supported languages and scripts without limitations. | |
136 | 176 |
137 To edit a particular file in unibyte representation, visit it using | 177 To edit a particular file in unibyte representation, visit it using |
138 @code{find-file-literally}. @xref{Visiting}. To convert a buffer in | 178 @code{find-file-literally}. @xref{Visiting}. To convert a buffer in |
139 multibyte representation into a single-byte representation of the same | 179 multibyte representation into a single-byte representation of the same |
140 characters, the easiest way is to save the contents in a file, kill the | 180 characters, the easiest way is to save the contents in a file, kill the |
150 @vindex default-enable-multibyte-characters | 190 @vindex default-enable-multibyte-characters |
151 To turn off multibyte character support by default, start Emacs with | 191 To turn off multibyte character support by default, start Emacs with |
152 the @samp{--unibyte} option (@pxref{Initial Options}), or set the | 192 the @samp{--unibyte} option (@pxref{Initial Options}), or set the |
153 environment variable @env{EMACS_UNIBYTE}. You can also customize | 193 environment variable @env{EMACS_UNIBYTE}. You can also customize |
154 @code{enable-multibyte-characters} or, equivalently, directly set the | 194 @code{enable-multibyte-characters} or, equivalently, directly set the |
155 variable @code{default-enable-multibyte-characters} in your init file to | 195 variable @code{default-enable-multibyte-characters} to @code{nil} in |
156 have basically the same effect as @samp{--unibyte}. | 196 your init file to have basically the same effect as @samp{--unibyte}. |
197 | |
198 @findex toggle-enable-multibyte-characters | |
199 To convert a unibyte session to a multibyte session, set | |
200 @code{default-enable-multibyte-characters} to @code{t}. Buffers which | |
201 were created in the unibyte session before you turn on multibyte support | |
202 will stay unibyte. You can turn on multibyte support in a specific | |
203 buffer by invoking the command @code{toggle-enable-multibyte-characters} | |
204 in that buffer. | |
157 | 205 |
158 @cindex Lisp files, and multibyte operation | 206 @cindex Lisp files, and multibyte operation |
159 @cindex multibyte operation, and Lisp files | 207 @cindex multibyte operation, and Lisp files |
160 @cindex unibyte operation, and Lisp files | 208 @cindex unibyte operation, and Lisp files |
161 @cindex init file, and non-ASCII characters | 209 @cindex init file, and non-ASCII characters |
525 language name. Some coding systems are used for several languages; | 573 language name. Some coding systems are used for several languages; |
526 their names usually start with @samp{iso}. There are also special | 574 their names usually start with @samp{iso}. There are also special |
527 coding systems @code{no-conversion}, @code{raw-text} and | 575 coding systems @code{no-conversion}, @code{raw-text} and |
528 @code{emacs-mule} which do not convert printing characters at all. | 576 @code{emacs-mule} which do not convert printing characters at all. |
529 | 577 |
578 @cindex international files from DOS/Windows systems | |
530 A special class of coding systems, collectively known as | 579 A special class of coding systems, collectively known as |
531 @dfn{codepages}, is designed to support text encoded by MS-Windows and | 580 @dfn{codepages}, is designed to support text encoded by MS-Windows and |
532 MS-DOS software. To use any of these systems, you need to create it | 581 MS-DOS software. To use any of these systems, you need to create it |
533 with @kbd{M-x codepage-setup}. @xref{MS-DOS and MULE}. | 582 with @kbd{M-x codepage-setup}. @xref{MS-DOS and MULE}. After |
583 creating the coding system for the codepage, you can use it as any | |
584 other coding system. For example, to visit a file encoded in codepage | |
585 850, type @kbd{C-x @key{RET} c cp850 @key{RET} C-x C-f @var{filename} | |
586 @key{RET}}. | |
534 | 587 |
535 In addition to converting various representations of non-ASCII | 588 In addition to converting various representations of non-ASCII |
536 characters, a coding system can perform end-of-line conversion. Emacs | 589 characters, a coding system can perform end-of-line conversion. Emacs |
537 handles three different conventions for how to separate lines in a file: | 590 handles three different conventions for how to separate lines in a file: |
538 newline, carriage-return linefeed, and just carriage-return. | 591 newline, carriage-return linefeed, and just carriage-return. |
628 the usual three variants to specify the kind of end-of-line conversion. | 681 the usual three variants to specify the kind of end-of-line conversion. |
629 | 682 |
630 @node Recognize Coding | 683 @node Recognize Coding |
631 @section Recognizing Coding Systems | 684 @section Recognizing Coding Systems |
632 | 685 |
633 Most of the time, Emacs can recognize which coding system to use for | 686 Emacs tries to recognize which coding system to use for a given text |
634 any given file---once you have specified your preferences. | 687 as an integral part of reading that text. (This applies to files |
688 being read, output from subprocesses, text from X selections, etc.) | |
689 Emacs can select the right coding system automatically most of the | |
690 time---once you have specified your preferences. | |
635 | 691 |
636 Some coding systems can be recognized or distinguished by which byte | 692 Some coding systems can be recognized or distinguished by which byte |
637 sequences appear in the data. However, there are coding systems that | 693 sequences appear in the data. However, there are coding systems that |
638 cannot be distinguished, not even potentially. For example, there is no | 694 cannot be distinguished, not even potentially. For example, there is no |
639 way to distinguish between Latin-1 and Latin-2; they use the same byte | 695 way to distinguish between Latin-1 and Latin-2; they use the same byte |
735 overrides @samp{-*-coding:-*-} tags in the file itself. Emacs uses this | 791 overrides @samp{-*-coding:-*-} tags in the file itself. Emacs uses this |
736 feature for tar and archive files, to prevent Emacs from being confused | 792 feature for tar and archive files, to prevent Emacs from being confused |
737 by a @samp{-*-coding:-*-} tag in a member of the archive and thinking it | 793 by a @samp{-*-coding:-*-} tag in a member of the archive and thinking it |
738 applies to the archive file as a whole. | 794 applies to the archive file as a whole. |
739 | 795 |
796 If Emacs recognizes the encoding of a file incorrectly, you can | |
797 reread the file using the correct coding system by typing @kbd{C-x | |
798 @key{RET} c @var{coding-system} @key{RET} M-x revert-buffer | |
799 @key{RET}}. | |
800 | |
740 @vindex buffer-file-coding-system | 801 @vindex buffer-file-coding-system |
741 Once Emacs has chosen a coding system for a buffer, it stores that | 802 Once Emacs has chosen a coding system for a buffer, it stores that |
742 coding system in @code{buffer-file-coding-system} and uses that coding | 803 coding system in @code{buffer-file-coding-system} and uses that coding |
743 system, by default, for operations that write from this buffer into a | 804 system, by default, for operations that write from this buffer into a |
744 file. This includes the commands @code{save-buffer} and | 805 file. This includes the commands @code{save-buffer} and |