# HG changeset patch # User Kenichi Handa # Date 1112315391 0 # Node ID 447c3b4db32f79cb22f2573b119fa4dffc59dc94 # Parent fb431b536a04d3041457d3b9f3822725632961ea (Coding System Basics): Describe about rondtrip identity of coding systems. diff -r fb431b536a04 -r 447c3b4db32f lispref/nonascii.texi --- a/lispref/nonascii.texi Thu Mar 31 23:17:51 2005 +0000 +++ b/lispref/nonascii.texi Fri Apr 01 00:29:51 2005 +0000 @@ -628,6 +628,28 @@ conversion, but some of them leave the choice unspecified---to be chosen heuristically for each file, based on the data. +In general, a coding system doesn't guarantee a roundtrip identity, +i.e. decoding followed by encoding in the same coding system can +result in the different byte sequence. But there are several coding +systems that go guarantee that the result will be the same as what you +originally decoded. They are: + +@quotation +chinese-big5 chinese-iso-8bit cyrillic-iso-8bit emacs-mule +greek-iso-8bit hebrew-iso-8bit iso-latin-1 iso-latin-2 iso-latin-3 +iso-latin-4 iso-latin-5 iso-latin-8 iso-latin-9 iso-safe +japanese-iso-8bit japanese-shift-jis korean-iso-8bit raw-text +@end quotation + +Likewise, a coding systme doesn't guarantee the other way of roundtrip +identity, i.e. encoding buffer text into a coding system followed by +decoding again with the same coding system will produce the different +buffer text. For instance, when you encode Latin-2 characters by +@code{utf-8} and decode it back by the same coding system, you'll get +Unicode charactes (of charset @code{mule-unicode-0100-24ff}), and when +you encode Unicode characters by @code{iso-latin-2} and decode it back +by the same coding system, you'll get Latin-2 characters. + @cindex end of line conversion @dfn{End of line conversion} handles three different conventions used on various systems for representing end of line in files. The Unix