comparison man/mule.texi @ 37584:9a7fd51a92b3

(International): Add an overview of Mule features, with pointers to detailed description. (Enabling Multibyte): Describe how to switch a unibyte session to multibyte. Mention that by default, all sessions are multibyte. (Coding Systems): Make it clear that cpNNN are coding systems, and should be used as such. (Recognize Coding): Explain that Emacs decodes text as part of reading it. Mention revert-buffer as a means to redecode a file.
author Eli Zaretskii <eliz@gnu.org>
date Sun, 06 May 2001 11:27:54 +0000
parents 07200bf360ab
children 5a2458f097b0
comparison
equal deleted inserted replaced
37583:313d4c5de5ca 37584:9a7fd51a92b3
42 ``MULti-lingual Enhancement to GNU Emacs'') 42 ``MULti-lingual Enhancement to GNU Emacs'')
43 43
44 Emacs also supports various encodings of these characters used by 44 Emacs also supports various encodings of these characters used by
45 other internationalized software, such as word processors and mailers. 45 other internationalized software, such as word processors and mailers.
46 46
47 Emacs allows editing text with international characters by supporting
48 all the related activities:
49
50 @itemize @bullet
51 @item
52 You can visit files with non-ASCII characters, save non-ASCII text, and
53 pass non-ASCII text between Emacs and programs it invokes (such as
54 compilers, spell-checkers, and mailers). Setting your language
55 environment (@pxref{Language Environments}) takes care of setting up the
56 coding systems and other options for a specific language or culture.
57 Alternatively, you can specify how Emacs should encode or decode text
58 for each command; see @ref{Specify Coding}.
59
60 @item
61 You can display non-ASCII characters encoded by the various scripts.
62 This works by using appropriate fonts on X and similar graphics
63 displays (@pxref{Defining Fontsets}), and by sending special codes to
64 text-only displays (@pxref{Specify Coding}). If some characters are
65 displayed incorrectly, refer to @ref{Undisplayable Characters}, which
66 describes possible problems and explains how to solve them.
67
68 @item
69 You can insert non-ASCII characters or search for them. To do that,
70 you can specify an input method (@pxref{Select Input Method}) suitable
71 for your language, or use the default input method set up when you set
72 your language environment. (Emacs input methods are part of the Leim
73 package, which must be installed for you to be able to use them.) If
74 your keyboard can produce non-ASCII characters, you can select an
75 appropriate keyboard coding system (@pxref{Specify Coding}), and Emacs
76 will accept those characters. Latin-1 characters can also be input by
77 using the @kbd{C-x 8} prefix, see @ref{Single-Byte Character Support,
78 C-x 8}.
79 @end itemize
80
81 The rest of this chapter describes these issues in detail.
82
47 @menu 83 @menu
48 * International Intro:: Basic concepts of multibyte characters. 84 * International Intro:: Basic concepts of multibyte characters.
49 * Enabling Multibyte:: Controlling whether to use multibyte characters. 85 * Enabling Multibyte:: Controlling whether to use multibyte characters.
50 * Language Environments:: Setting things up for the language you use. 86 * Language Environments:: Setting things up for the language you use.
51 * Input Methods:: Entering text characters not on your keyboard. 87 * Input Methods:: Entering text characters not on your keyboard.
119 @end ignore 155 @end ignore
120 156
121 @node Enabling Multibyte 157 @node Enabling Multibyte
122 @section Enabling Multibyte Characters 158 @section Enabling Multibyte Characters
123 159
160 @cindex turn multibyte support on or off
124 You can enable or disable multibyte character support, either for 161 You can enable or disable multibyte character support, either for
125 Emacs as a whole, or for a single buffer. When multibyte characters are 162 Emacs as a whole, or for a single buffer. When multibyte characters are
126 disabled in a buffer, then each byte in that buffer represents a 163 disabled in a buffer, then each byte in that buffer represents a
127 character, even codes 0200 through 0377. The old features for 164 character, even codes 0200 through 0377. The old features for
128 supporting the European character sets, ISO Latin-1 and ISO Latin-2, 165 supporting the European character sets, ISO Latin-1 and ISO Latin-2,
131 168
132 However, there is no need to turn off multibyte character support to 169 However, there is no need to turn off multibyte character support to
133 use ISO Latin; the Emacs multibyte character set includes all the 170 use ISO Latin; the Emacs multibyte character set includes all the
134 characters in these character sets, and Emacs can translate 171 characters in these character sets, and Emacs can translate
135 automatically to and from the ISO codes. 172 automatically to and from the ISO codes.
173
174 By default, Emacs starts in multibyte mode, because that allows you to
175 use all the supported languages and scripts without limitations.
136 176
137 To edit a particular file in unibyte representation, visit it using 177 To edit a particular file in unibyte representation, visit it using
138 @code{find-file-literally}. @xref{Visiting}. To convert a buffer in 178 @code{find-file-literally}. @xref{Visiting}. To convert a buffer in
139 multibyte representation into a single-byte representation of the same 179 multibyte representation into a single-byte representation of the same
140 characters, the easiest way is to save the contents in a file, kill the 180 characters, the easiest way is to save the contents in a file, kill the
150 @vindex default-enable-multibyte-characters 190 @vindex default-enable-multibyte-characters
151 To turn off multibyte character support by default, start Emacs with 191 To turn off multibyte character support by default, start Emacs with
152 the @samp{--unibyte} option (@pxref{Initial Options}), or set the 192 the @samp{--unibyte} option (@pxref{Initial Options}), or set the
153 environment variable @env{EMACS_UNIBYTE}. You can also customize 193 environment variable @env{EMACS_UNIBYTE}. You can also customize
154 @code{enable-multibyte-characters} or, equivalently, directly set the 194 @code{enable-multibyte-characters} or, equivalently, directly set the
155 variable @code{default-enable-multibyte-characters} in your init file to 195 variable @code{default-enable-multibyte-characters} to @code{nil} in
156 have basically the same effect as @samp{--unibyte}. 196 your init file to have basically the same effect as @samp{--unibyte}.
197
198 @findex toggle-enable-multibyte-characters
199 To convert a unibyte session to a multibyte session, set
200 @code{default-enable-multibyte-characters} to @code{t}. Buffers which
201 were created in the unibyte session before you turn on multibyte support
202 will stay unibyte. You can turn on multibyte support in a specific
203 buffer by invoking the command @code{toggle-enable-multibyte-characters}
204 in that buffer.
157 205
158 @cindex Lisp files, and multibyte operation 206 @cindex Lisp files, and multibyte operation
159 @cindex multibyte operation, and Lisp files 207 @cindex multibyte operation, and Lisp files
160 @cindex unibyte operation, and Lisp files 208 @cindex unibyte operation, and Lisp files
161 @cindex init file, and non-ASCII characters 209 @cindex init file, and non-ASCII characters
525 language name. Some coding systems are used for several languages; 573 language name. Some coding systems are used for several languages;
526 their names usually start with @samp{iso}. There are also special 574 their names usually start with @samp{iso}. There are also special
527 coding systems @code{no-conversion}, @code{raw-text} and 575 coding systems @code{no-conversion}, @code{raw-text} and
528 @code{emacs-mule} which do not convert printing characters at all. 576 @code{emacs-mule} which do not convert printing characters at all.
529 577
578 @cindex international files from DOS/Windows systems
530 A special class of coding systems, collectively known as 579 A special class of coding systems, collectively known as
531 @dfn{codepages}, is designed to support text encoded by MS-Windows and 580 @dfn{codepages}, is designed to support text encoded by MS-Windows and
532 MS-DOS software. To use any of these systems, you need to create it 581 MS-DOS software. To use any of these systems, you need to create it
533 with @kbd{M-x codepage-setup}. @xref{MS-DOS and MULE}. 582 with @kbd{M-x codepage-setup}. @xref{MS-DOS and MULE}. After
583 creating the coding system for the codepage, you can use it as any
584 other coding system. For example, to visit a file encoded in codepage
585 850, type @kbd{C-x @key{RET} c cp850 @key{RET} C-x C-f @var{filename}
586 @key{RET}}.
534 587
535 In addition to converting various representations of non-ASCII 588 In addition to converting various representations of non-ASCII
536 characters, a coding system can perform end-of-line conversion. Emacs 589 characters, a coding system can perform end-of-line conversion. Emacs
537 handles three different conventions for how to separate lines in a file: 590 handles three different conventions for how to separate lines in a file:
538 newline, carriage-return linefeed, and just carriage-return. 591 newline, carriage-return linefeed, and just carriage-return.
628 the usual three variants to specify the kind of end-of-line conversion. 681 the usual three variants to specify the kind of end-of-line conversion.
629 682
630 @node Recognize Coding 683 @node Recognize Coding
631 @section Recognizing Coding Systems 684 @section Recognizing Coding Systems
632 685
633 Most of the time, Emacs can recognize which coding system to use for 686 Emacs tries to recognize which coding system to use for a given text
634 any given file---once you have specified your preferences. 687 as an integral part of reading that text. (This applies to files
688 being read, output from subprocesses, text from X selections, etc.)
689 Emacs can select the right coding system automatically most of the
690 time---once you have specified your preferences.
635 691
636 Some coding systems can be recognized or distinguished by which byte 692 Some coding systems can be recognized or distinguished by which byte
637 sequences appear in the data. However, there are coding systems that 693 sequences appear in the data. However, there are coding systems that
638 cannot be distinguished, not even potentially. For example, there is no 694 cannot be distinguished, not even potentially. For example, there is no
639 way to distinguish between Latin-1 and Latin-2; they use the same byte 695 way to distinguish between Latin-1 and Latin-2; they use the same byte
735 overrides @samp{-*-coding:-*-} tags in the file itself. Emacs uses this 791 overrides @samp{-*-coding:-*-} tags in the file itself. Emacs uses this
736 feature for tar and archive files, to prevent Emacs from being confused 792 feature for tar and archive files, to prevent Emacs from being confused
737 by a @samp{-*-coding:-*-} tag in a member of the archive and thinking it 793 by a @samp{-*-coding:-*-} tag in a member of the archive and thinking it
738 applies to the archive file as a whole. 794 applies to the archive file as a whole.
739 795
796 If Emacs recognizes the encoding of a file incorrectly, you can
797 reread the file using the correct coding system by typing @kbd{C-x
798 @key{RET} c @var{coding-system} @key{RET} M-x revert-buffer
799 @key{RET}}.
800
740 @vindex buffer-file-coding-system 801 @vindex buffer-file-coding-system
741 Once Emacs has chosen a coding system for a buffer, it stores that 802 Once Emacs has chosen a coding system for a buffer, it stores that
742 coding system in @code{buffer-file-coding-system} and uses that coding 803 coding system in @code{buffer-file-coding-system} and uses that coding
743 system, by default, for operations that write from this buffer into a 804 system, by default, for operations that write from this buffer into a
744 file. This includes the commands @code{save-buffer} and 805 file. This includes the commands @code{save-buffer} and