Mercurial > emacs
changeset 61185:447c3b4db32f
(Coding System Basics): Describe about rondtrip
identity of coding systems.
author | Kenichi Handa <handa@m17n.org> |
---|---|
date | Fri, 01 Apr 2005 00:29:51 +0000 |
parents | fb431b536a04 |
children | a734483076f2 |
files | lispref/nonascii.texi |
diffstat | 1 files changed, 22 insertions(+), 0 deletions(-) [+] |
line wrap: on
line diff
--- a/lispref/nonascii.texi Thu Mar 31 23:17:51 2005 +0000 +++ b/lispref/nonascii.texi Fri Apr 01 00:29:51 2005 +0000 @@ -628,6 +628,28 @@ conversion, but some of them leave the choice unspecified---to be chosen heuristically for each file, based on the data. +In general, a coding system doesn't guarantee a roundtrip identity, +i.e. decoding followed by encoding in the same coding system can +result in the different byte sequence. But there are several coding +systems that go guarantee that the result will be the same as what you +originally decoded. They are: + +@quotation +chinese-big5 chinese-iso-8bit cyrillic-iso-8bit emacs-mule +greek-iso-8bit hebrew-iso-8bit iso-latin-1 iso-latin-2 iso-latin-3 +iso-latin-4 iso-latin-5 iso-latin-8 iso-latin-9 iso-safe +japanese-iso-8bit japanese-shift-jis korean-iso-8bit raw-text +@end quotation + +Likewise, a coding systme doesn't guarantee the other way of roundtrip +identity, i.e. encoding buffer text into a coding system followed by +decoding again with the same coding system will produce the different +buffer text. For instance, when you encode Latin-2 characters by +@code{utf-8} and decode it back by the same coding system, you'll get +Unicode charactes (of charset @code{mule-unicode-0100-24ff}), and when +you encode Unicode characters by @code{iso-latin-2} and decode it back +by the same coding system, you'll get Latin-2 characters. + @cindex end of line conversion @dfn{End of line conversion} handles three different conventions used on various systems for representing end of line in files. The Unix