Mercurial > emacs

--- a/doc/emacs/ChangeLog	Wed May 06 03:09:11 2009 +0000
+++ b/doc/emacs/ChangeLog	Wed May 06 03:55:12 2009 +0000
@@ -1,3 +1,19 @@
+2009-05-06  Chong Yidong  <cyd@stupidchicken.com>
+
+	* basic.texi (Inserting Text): Document ucs-insert.
+
+	* mule.texi (International Chars): Define "multibyte".  Note that
+	internal representation is unicode-based.  Simplify definition of raw
+	bytes.  Mention ucs-insert.
+	(Enabling Multibyte): Remove obsolete discussion.  Copyedits.
+	(Language Environments): Add language environments new to Emacs 23.
+	(Multibyte Conversion): Node deleted.
+	(Coding Systems): Remove obsolete unify-8859-on-decoding-mode.  Don't
+	mention obsolete emacs-mule coding system.
+	(Output Coding): Copyedits.
+
+	* emacs.texi (Top): Update node listing.
+
 2009-05-05  Per Starbäck  <per@starback.se>  (tiny change)

 	* trouble.texi (Lossage): Use new binding of view-emacs-problems.
--- a/doc/emacs/basic.texi	Wed May 06 03:09:11 2009 +0000
+++ b/doc/emacs/basic.texi	Wed May 06 03:55:12 2009 +0000
@@ -64,9 +64,11 @@
 For instance, @kbd{DEL} runs the command @code{delete-backward-char}
 by default (some modes bind it to a different command); it does not
 insert a literal @samp{DEL} character (@acronym{ASCII} character code
-127).  To insert a non-graphic character, first @dfn{quote} it by
-typing @kbd{C-q} (@code{quoted-insert}).  There are two ways to use
-@kbd{C-q}:
+127).
+
+  To insert a non-graphic character, or a character that your keyboard
+does not support, first @dfn{quote} it by typing @kbd{C-q}
+(@code{quoted-insert}).  There are two ways to use @kbd{C-q}:

 @itemize @bullet
 @item
@@ -87,32 +89,24 @@
 of overwriting with it.
 @end itemize

-@cindex 8-bit character codes
+@vindex read-quoted-char-radix
 @noindent
-If you specify a code in the octal range 0200 through 0377, @kbd{C-q}
-assumes that you intend to use some ISO 8859-@var{n} character set,
-and converts the specified code to the corresponding Emacs character
-code.  Your choice of language environment determines which of the ISO
-8859 character sets to use (@pxref{Language Environments}).  This
-feature is disabled if multibyte characters are disabled
-(@pxref{Enabling Multibyte}).
+To use decimal or hexadecimal instead of octal, set the variable
+@code{read-quoted-char-radix} to 10 or 16.  If the radix is greater
+than 10, some letters starting with @kbd{a} serve as part of a
+character code, just like digits.

-@vindex read-quoted-char-radix
-To use decimal or hexadecimal instead of octal, set the variable
-@code{read-quoted-char-radix} to 10 or 16.  If the radix is greater than
-10, some letters starting with @kbd{a} serve as part of a character
-code, just like digits.
-
-A numeric argument tells @kbd{C-q} how many copies of the quoted
+  A numeric argument tells @kbd{C-q} how many copies of the quoted
 character to insert (@pxref{Arguments}).

-@findex newline
-@findex self-insert
-  Customization information: @key{DEL} in most modes runs the command
-@code{delete-backward-char}; @key{RET} runs the command
-@code{newline}, and self-inserting printing characters run the command
-@code{self-insert}, which inserts whatever character you typed.  Some
-major modes rebind @key{DEL} to other commands.
+@findex ucs-insert
+@cindex Unicode
+  Instead of @kbd{C-q}, you can use @kbd{C-x 8 @key{RET}}
+(@code{ucs-insert}) to insert a character based on its Unicode name or
+code-point.  This commands prompts for a character to insert, using
+the minibuffer; you can specify the character using either (i) the
+character's name in the Unicode standard, or (ii) the character's
+code-point in the Unicode standard.

 @node Moving Point
 @section Changing the Location of Point
--- a/doc/emacs/emacs.texi	Wed May 06 03:09:11 2009 +0000
+++ b/doc/emacs/emacs.texi	Wed May 06 03:55:12 2009 +0000
@@ -507,7 +507,6 @@
 * Language Environments::   Setting things up for the language you use.
 * Input Methods::           Entering text characters not on your keyboard.
 * Select Input Method::     Specifying your choice of input methods.
-* Multibyte Conversion::    How single-byte characters convert to multibyte.
 * Coding Systems::          Character set conversion when you read and
                               write files, and so on.
 * Recognize Coding::        How Emacs figures out which conversion to use.
--- a/doc/emacs/mule.texi	Wed May 06 03:09:11 2009 +0000
+++ b/doc/emacs/mule.texi	Wed May 06 03:55:12 2009 +0000
@@ -89,7 +89,6 @@
 * Language Environments::   Setting things up for the language you use.
 * Input Methods::           Entering text characters not on your keyboard.
 * Select Input Method::     Specifying your choice of input methods.
-* Multibyte Conversion::    How single-byte characters convert to multibyte.
 * Coding Systems::          Character set conversion when you read and
                               write files, and so on.
 * Recognize Coding::        How Emacs figures out which conversion to use.
@@ -115,14 +114,17 @@

   The users of international character sets and scripts have
 established many more-or-less standard coding systems for storing
-files.  Emacs internally uses a single multibyte character encoding,
-so that it can intermix characters from all these scripts in a single
-buffer or string.  This encoding represents each non-@acronym{ASCII}
-character as a sequence of bytes in the range 0200 through 0377.
-Emacs translates between the multibyte character encoding and various
-other coding systems when reading and writing files, when exchanging
-data with subprocesses, and (in some cases) in the @kbd{C-q} command
-(@pxref{Multibyte Conversion}).
+files.  These coding systems are typically @dfn{multibyte}, meaning
+that sequences of two or more bytes are used to represent individual
+non-@acronym{ASCII} characters.
+
+@cindex Unicode
+  Internally, Emacs uses its own multibyte character encoding, which
+is a superset of the @dfn{Unicode} standard.  This internal encoding
+allows characters from almost every known script to be intermixed in a
+single buffer or string.  Emacs translates between the multibyte
+character encoding and various other coding systems when reading and
+writing files, and when exchanging data with subprocesses.

 @kindex C-h h
 @findex view-hello-file
@@ -134,10 +136,14 @@
 displayed on your terminal, they appear as @samp{?} or as hollow boxes
 (@pxref{Undisplayable Characters}).

-  Keyboards, even in the countries where these character sets are used,
-generally don't have keys for all the characters in them.  So Emacs
-supports various @dfn{input methods}, typically one for each script or
-language, to make it convenient to type them.
+  Keyboards, even in the countries where these character sets are
+used, generally don't have keys for all the characters in them.  You
+can insert characters that your keyboard does not support, using
+@kbd{C-q} (@code{quoted-insert}) or @kbd{C-x 8 @key{RET}}
+(@code{ucs-insert}).  @xref{Inserting Text}.  Emacs also supports
+various @dfn{input methods}, typically one for each script or
+language, which make it easier to type characters in the script.
+@xref{Input Methods}.

 @kindex C-x RET
   The prefix key @kbd{C-x @key{RET}} is used for commands that pertain
@@ -165,12 +171,12 @@
 (@pxref{Coding Systems}).  If the character's encoding is longer than
 one byte, Emacs shows @samp{file ...}.

-  However, if the character displayed is in the range 0200 through
-0377 octal, it may actually stand for an invalid UTF-8 byte read from
-a file.  In Emacs, that byte is represented as a sequence of 8-bit
-characters, but all of them together display as the original invalid
-byte, in octal code.  In this case, @kbd{C-x =} shows @samp{part of
-display ...} instead of @samp{file}.
+  As a special case, if the character lies in the range 128 (0200
+octal) through 159 (0237 octal), it stands for a ``raw'' byte that
+does not correspond to any specific displayable character.  Such a
+``character'' lies within the @code{eight-bit-control} character set,
+and is displayed as an escaped octal character code.  In this case,
+@kbd{C-x =} shows @samp{part of display ...} instead of @samp{file}.

 @cindex character set of character at point
 @cindex font of character at point
@@ -235,74 +241,62 @@
 @node Enabling Multibyte
 @section Enabling Multibyte Characters

-  By default, Emacs starts in multibyte mode, because that allows you to
-use all the supported languages and scripts without limitations.
+  By default, Emacs starts in multibyte mode: it stores the contents
+of buffers and strings using an internal encoding that represents
+non-@acronym{ASCII} characters using multi-byte sequences.  Multibyte
+mode allows you to use all the supported languages and scripts without
+limitations.

 @cindex turn multibyte support on or off
-  You can enable or disable multibyte character support, either for
-Emacs as a whole, or for a single buffer.  When multibyte characters
-are disabled in a buffer, we call that @dfn{unibyte mode}.  Then each
-byte in that buffer represents a character, even codes 0200 through
-0377.
-
-  The old features for supporting the European character sets, ISO
-Latin-1 and ISO Latin-2, work in unibyte mode as they did in Emacs 19
-and also work for the other ISO 8859 character sets.  However, there
-is no need to turn off multibyte character support to use ISO Latin;
-the Emacs multibyte character set includes all the characters in these
-character sets, and Emacs can translate automatically to and from the
-ISO codes.
+  Under very special circumstances, you may want to disable multibyte
+character support, either for Emacs as a whole, or for a single
+buffer.  When multibyte characters are disabled in a buffer, we call
+that @dfn{unibyte mode}.  In unibyte mode, each character in the
+buffer has a character code ranging from 0 through 255 (0377 octal); 0
+through 127 (0177 octal) represent @acronym{ASCII} characters, and 128
+(0200 octal) through 255 (0377 octal) represent non-@acronym{ASCII}
+characters.

   To edit a particular file in unibyte representation, visit it using
-@code{find-file-literally}.  @xref{Visiting}.  To convert a buffer in
-multibyte representation into a single-byte representation of the same
-characters, the easiest way is to save the contents in a file, kill the
-buffer, and find the file again with @code{find-file-literally}.  You
-can also use @kbd{C-x @key{RET} c}
-(@code{universal-coding-system-argument}) and specify @samp{raw-text} as
-the coding system with which to find or save a file.  @xref{Text
-Coding}.  Finding a file as @samp{raw-text} doesn't disable format
-conversion, uncompression and auto mode selection as
-@code{find-file-literally} does.
+@code{find-file-literally}.  @xref{Visiting}.  You can convert a
+multibyte buffer to unibyte by saving it to a file, killing the
+buffer, and visiting the file again with @code{find-file-literally}.
+Alternatively, you can use @kbd{C-x @key{RET} c}
+(@code{universal-coding-system-argument}) and specify @samp{raw-text}
+as the coding system with which to visit or save a file.  @xref{Text
+Coding}.  Unlike @code{find-file-literally}, finding a file as
+@samp{raw-text} doesn't disable format conversion, uncompression, or
+auto mode selection.

 @vindex enable-multibyte-characters
 @vindex default-enable-multibyte-characters
+@cindex environment variables, and non-@acronym{ASCII} characters
   To turn off multibyte character support by default, start Emacs with
 the @samp{--unibyte} option (@pxref{Initial Options}), or set the
 environment variable @env{EMACS_UNIBYTE}.  You can also customize
 @code{enable-multibyte-characters} or, equivalently, directly set the
 variable @code{default-enable-multibyte-characters} to @code{nil} in
 your init file to have basically the same effect as @samp{--unibyte}.
-
-@findex toggle-enable-multibyte-characters
-  To convert a unibyte session to a multibyte session, set
-@code{default-enable-multibyte-characters} to @code{t}.  Buffers which
-were created in the unibyte session before you turn on multibyte support
-will stay unibyte.  You can turn on multibyte support in a specific
-buffer by invoking the command @code{toggle-enable-multibyte-characters}
-in that buffer.
+With @samp{--unibyte}, multibyte strings are not created during
+initialization from the values of environment variables,
+@file{/etc/passwd} entries etc., even if those contain
+non-@acronym{ASCII} characters.

 @cindex Lisp files, and multibyte operation
 @cindex multibyte operation, and Lisp files
 @cindex unibyte operation, and Lisp files
 @cindex init file, and non-@acronym{ASCII} characters
-@cindex environment variables, and non-@acronym{ASCII} characters
-  With @samp{--unibyte}, multibyte strings are not created during
-initialization from the values of environment variables,
-@file{/etc/passwd} entries etc.@: that contain non-@acronym{ASCII} 8-bit
-characters.
-
   Emacs normally loads Lisp files as multibyte, regardless of whether
-you used @samp{--unibyte}.  This includes the Emacs initialization file,
-@file{.emacs}, and the initialization files of Emacs packages such as
-Gnus.  However, you can specify unibyte loading for a particular Lisp
-file, by putting @w{@samp{-*-unibyte: t;-*-}} in a comment on the first
-line (@pxref{File Variables}).  Then that file is always loaded as
-unibyte text, even if you did not start Emacs with @samp{--unibyte}.
-The motivation for these conventions is that it is more reliable to
-always load any particular Lisp file in the same way.  However, you can
-load a Lisp file as unibyte, on any one occasion, by typing @kbd{C-x
-@key{RET} c raw-text @key{RET}} immediately before loading it.
+you used @samp{--unibyte}.  This includes the Emacs initialization
+file, @file{.emacs}, and the initialization files of Emacs packages
+such as Gnus.  However, you can specify unibyte loading for a
+particular Lisp file, by putting @w{@samp{-*-unibyte: t;-*-}} in a
+comment on the first line (@pxref{File Variables}).  Then that file is
+always loaded as unibyte text.  The motivation for these conventions
+is that it is more reliable to always load any particular Lisp file in
+the same way.  However, you can load a Lisp file as unibyte, on any
+one occasion, by typing @kbd{C-x @key{RET} c raw-text @key{RET}}
+immediately before loading it.

   The mode line indicates whether multibyte character support is
 enabled in the current buffer.  If it is, there are two or more
@@ -312,6 +306,14 @@
 are not enabled, nothing precedes the colon except a single dash.
 @xref{Mode Line}, for more details about this.

+@findex toggle-enable-multibyte-characters
+  To convert a unibyte session to a multibyte session, set
+@code{default-enable-multibyte-characters} to @code{t}.  Buffers which
+were created in the unibyte session before you turn on multibyte
+support will stay unibyte.  You can turn on multibyte support in a
+specific buffer by invoking the command
+@code{toggle-enable-multibyte-characters} in that buffer.
+
 @node Language Environments
 @section Language Environments
 @cindex language environments
@@ -319,43 +321,41 @@
   All supported character sets are supported in Emacs buffers whenever
 multibyte characters are enabled; there is no need to select a
 particular language in order to display its characters in an Emacs
-buffer.  However, it is important to select a @dfn{language environment}
-in order to set various defaults.  The language environment really
-represents a choice of preferred script (more or less) rather than a
-choice of language.
+buffer.  However, it is important to select a @dfn{language
+environment} in order to set various defaults.  Roughly speaking, the
+language environment represents a choice of preferred script rather
+than a choice of language.

   The language environment controls which coding systems to recognize
 when reading text (@pxref{Recognize Coding}).  This applies to files,
-incoming mail, netnews, and any other text you read into Emacs.  It may
-also specify the default coding system to use when you create a file.
-Each language environment also specifies a default input method.
+incoming mail, and any other text you read into Emacs.  It may also
+specify the default coding system to use when you create a file.  Each
+language environment also specifies a default input method.

 @findex set-language-environment
 @vindex current-language-environment
-  To select a language environment, you can customize the variable
+  To select a language environment, customize the variable
 @code{current-language-environment} or use the command @kbd{M-x
 set-language-environment}.  It makes no difference which buffer is
-current when you use this command, because the effects apply globally to
-the Emacs session.  The supported language environments include:
+current when you use this command, because the effects apply globally
+to the Emacs session.  The supported language environments include:

 @cindex Euro sign
 @cindex UTF-8
 @quotation
-ASCII, Belarusian, Brazilian Portuguese, Bulgarian, Chinese-BIG5,
-Chinese-CNS, Chinese-EUC-TW, Chinese-GB, Croatian, Cyrillic-ALT,
-Cyrillic-ISO, Cyrillic-KOI8, Czech, Devanagari, Dutch, English,
-Esperanto, Ethiopic, French, Georgian, German, Greek, Hebrew, IPA,
-Italian, Japanese, Kannada, Korean, Lao, Latin-1, Latin-2, Latin-3,
-Latin-4, Latin-5, Latin-6, Latin-7, Latin-8 (Celtic), Latin-9 (updated
-Latin-1 with the Euro sign), Latvian, Lithuanian, Malayalam, Polish,
-Romanian, Russian, Slovak, Slovenian, Spanish, Swedish, Tajik, Tamil,
-Thai, Tibetan, Turkish, UTF-8 (for a setup which prefers Unicode
-characters and files encoded in UTF-8), Ukrainian, Vietnamese, Welsh,
-and Windows-1255 (for a setup which prefers Cyrillic characters and
-files encoded in Windows-1255).
-@tex
-\hbadness=10000\par  % just avoid underfull hbox warning
-@end tex
+ASCII, Belarusian, Bengali, Brazilian Portuguese, Bulgarian,
+Chinese-BIG5, Chinese-CNS, Chinese-EUC-TW, Chinese-GB, Chinese-GBK,
+Chinese-GB18030, Croatian, Cyrillic-ALT, Cyrillic-ISO, Cyrillic-KOI8,
+Czech, Devanagari, Dutch, English, Esperanto, Ethiopic, French,
+Georgian, German, Greek, Gujarati, Hebrew, IPA, Italian, Japanese,
+Kannada, Khmer, Korean, Lao, Latin-1, Latin-2, Latin-3, Latin-4,
+Latin-5, Latin-6, Latin-7, Latin-8 (Celtic), Latin-9 (updated Latin-1
+with the Euro sign), Latvian, Lithuanian, Malayalam, Oriya, Polish,
+Punjabi, Romanian, Russian, Sinhala, Slovak, Slovenian, Spanish,
+Swedish, TaiViet, Tajik, Tamil, Telugu, Thai, Tibetan, Turkish, UTF-8
+(for a setup which prefers Unicode characters and files encoded in
+UTF-8), Ukrainian, Vietnamese, Welsh, and Windows-1255 (for a setup
+which prefers Cyrillic characters and files encoded in Windows-1255).
 @end quotation

 @cindex fonts for various scripts
@@ -657,34 +657,6 @@
 list-input-methods}.  The list gives information about each input
 method, including the string that stands for it in the mode line.

-@node Multibyte Conversion
-@section Unibyte and Multibyte Non-@acronym{ASCII} characters
-
-  When multibyte characters are enabled, character codes 0240 (octal)
-through 0377 (octal) are not really legitimate in the buffer.  The valid
-non-@acronym{ASCII} printing characters have codes that start from 0400.
-
-  If you type a self-inserting character in the range 0240 through
-0377, or if you use @kbd{C-q} to insert one, Emacs assumes you
-intended to use one of the ISO Latin-@var{n} character sets, and
-converts it to the Emacs code representing that Latin-@var{n}
-character.  You select @emph{which} ISO Latin character set to use
-through your choice of language environment
-@iftex
-(see above).
-@end iftex
-@ifnottex
-(@pxref{Language Environments}).
-@end ifnottex
-If you do not specify a choice, the default is Latin-1.
-
-  If you insert a character in the range 0200 through 0237, which
-forms the @code{eight-bit-control} character set, it is inserted
-literally.  You should normally avoid doing this since buffers
-containing such characters have to be written out in either the
-@code{emacs-mule} or @code{raw-text} coding system, which is usually
-not what you want.
-
 @node Coding Systems
 @section Coding Systems
 @cindex coding systems
@@ -698,11 +670,11 @@
 terminal, and in exchanging data with subprocesses.

   Emacs assigns a name to each coding system.  Most coding systems are
-used for one language, and the name of the coding system starts with the
-language name.  Some coding systems are used for several languages;
-their names usually start with @samp{iso}.  There are also special
-coding systems @code{no-conversion}, @code{raw-text} and
-@code{emacs-mule} which do not convert printing characters at all.
+used for one language, and the name of the coding system starts with
+the language name.  Some coding systems are used for several
+languages; their names usually start with @samp{iso}.  There are also
+special coding systems, such as @code{no-conversion}, @code{raw-text},
+and @code{emacs-internal}.

 @cindex international files from DOS/Windows systems
   A special class of coding systems, collectively known as
@@ -814,37 +786,21 @@
 @code{no-conversion}, and also suppresses other Emacs features that
 might convert the file contents before you see them.  @xref{Visiting}.

-  The coding system @code{emacs-mule} means that the file contains
-non-@acronym{ASCII} characters stored with the internal Emacs encoding.  It
-handles end-of-line conversion based on the data encountered, and has
-the usual three variants to specify the kind of end-of-line conversion.
-
-@findex unify-8859-on-decoding-mode
-@anchor{Character Translation}
-  The @dfn{character translation} feature can modify the effect of
-various coding systems, by changing the internal Emacs codes that
-decoding produces.  For instance, the command
-@code{unify-8859-on-decoding-mode} enables a mode that ``unifies'' the
-Latin alphabets when decoding text.  This works by converting all
-non-@acronym{ASCII} Latin-@var{n} characters to either Latin-1 or
-Unicode characters.  This way it is easier to use various
-Latin-@var{n} alphabets together.  (In a future Emacs version we hope
-to move towards full Unicode support and complete unification of
-character sets.)
-
-@vindex enable-character-translation
-  If you set the variable @code{enable-character-translation} to
-@code{nil}, that disables all character translation (including
-@code{unify-8859-on-decoding-mode}).
+  The coding system @code{emacs-internal} (or @code{utf-8-emacs},
+which is equivalent) means that the file contains non-@acronym{ASCII}
+characters stored with the internal Emacs encoding.  This coding
+system handles end-of-line conversion based on the data encountered,
+and has the usual three variants to specify the kind of end-of-line
+conversion.

 @node Recognize Coding
 @section Recognizing Coding Systems

-  Emacs tries to recognize which coding system to use for a given text
-as an integral part of reading that text.  (This applies to files
-being read, output from subprocesses, text from X selections, etc.)
-Emacs can select the right coding system automatically most of the
-time---once you have specified your preferences.
+  Whenever Emacs reads a given piece of text, it tries to recognize
+which coding system to use.  This applies to files being read, output
+from subprocesses, text from X selections, etc.  Emacs can select the
+right coding system automatically most of the time---once you have
+specified your preferences.

   Some coding systems can be recognized or distinguished by which byte
 sequences appear in the data.  However, there are coding systems that
@@ -948,19 +904,17 @@
 @code{auto-coding-functions} detects the encoding for XML files.

 @vindex rmail-decode-mime-charset
+@vindex rmail-file-coding-system
   When you get new mail in Rmail, each message is translated
 automatically from the coding system it is written in, as if it were a
 separate file.  This uses the priority list of coding systems that you
 have specified.  If a MIME message specifies a character set, Rmail
 obeys that specification, unless @code{rmail-decode-mime-charset} is
-@code{nil}.
-
-@vindex rmail-file-coding-system
-  For reading and saving Rmail files themselves, Emacs uses the coding
-system specified by the variable @code{rmail-file-coding-system}.  The
-default value is @code{nil}, which means that Rmail files are not
-translated (they are read and written in the Emacs internal character
-code).
+@code{nil}.  For reading and saving Rmail files themselves, Emacs uses
+the coding system specified by the variable
+@code{rmail-file-coding-system}.  The default value is @code{nil},
+which means that Rmail files are not translated (they are read and
+written in the Emacs internal character code).

 @node Specify Coding
 @section Specifying a File's Coding System
@@ -984,13 +938,6 @@
 the coding explicitly in the file, that overrides
 @code{file-coding-system-alist}.

-  If you add the character @samp{!} at the end of the coding system
-name in @code{coding}, it disables any character translation
-(@pxref{Character Translation}) while decoding the file.  This is
-useful when you need to make sure that the character codes in the
-Emacs buffer will not vary due to changes in user settings; for
-instance, for the sake of strings in Emacs Lisp source files.
-
 @node Output Coding
 @section Choosing Coding Systems for Output

@@ -1004,22 +951,21 @@

   You can insert any character Emacs supports into any Emacs buffer,
 but most coding systems can only handle a subset of these characters.
-Therefore, you can insert characters that cannot be encoded with the
-coding system that will be used to save the buffer.  For example, you
-could start with an @acronym{ASCII} file and insert a few Latin-1
-characters into it, or you could edit a text file in Polish encoded in
-@code{iso-8859-2} and add some Russian words to it.  When you save
+Therefore, it's possible that the characters you insert cannot be
+encoded with the coding system that will be used to save the buffer.
+For example, you could visit a text file in Polish, encoded in
+@code{iso-8859-2}, and add some Russian words to it.  When you save
 that buffer, Emacs cannot use the current value of
 @code{buffer-file-coding-system}, because the characters you added
 cannot be encoded by that coding system.

   When that happens, Emacs tries the most-preferred coding system (set
 by @kbd{M-x prefer-coding-system} or @kbd{M-x
-set-language-environment}), and if that coding system can safely
-encode all of the characters in the buffer, Emacs uses it, and stores
-its value in @code{buffer-file-coding-system}.  Otherwise, Emacs
-displays a list of coding systems suitable for encoding the buffer's
-contents, and asks you to choose one of those coding systems.
+set-language-environment}).  If that coding system can safely encode
+all of the characters in the buffer, Emacs uses it, and stores its
+value in @code{buffer-file-coding-system}.  Otherwise, Emacs displays
+a list of coding systems suitable for encoding the buffer's contents,
+and asks you to choose one of those coding systems.

   If you insert the unsuitable characters in a mail message, Emacs
 behaves a bit differently.  It additionally checks whether the
@@ -1248,9 +1194,9 @@

   If @code{file-name-coding-system} is @code{nil}, Emacs uses a
 default coding system determined by the selected language environment.
-In the default language environment, any non-@acronym{ASCII}
-characters in file names are not encoded specially; they appear in the
-file system using the internal Emacs representation.
+In the default language environment, non-@acronym{ASCII} characters in
+file names are not encoded specially; they appear in the file system
+using the internal Emacs representation.

   @strong{Warning:} if you change @code{file-name-coding-system} (or the
 language environment) in the middle of an Emacs session, problems can
@@ -1317,7 +1263,7 @@
 @end lisp

 @noindent
-in your @file{~/.emacs} file.
+in your init file.

   There is a similarity between using a coding system translation for
 keyboard input, and using an input method: both define sequences of