comparison etc/PROBLEMS @ 65518:a3cb8f9ce434

Fix the paragraph describing the limitation of UTF-8/16/7.
author Kenichi Handa <handa@m17n.org>
date Thu, 15 Sep 2005 02:54:42 +0000
parents 601c1d04dcb1
children 50f2dd53cf9a fa0da9b57058
comparison
equal deleted inserted replaced
65517:3d5ac74b885b 65518:a3cb8f9ce434
839 mule-unicode-e000-ffff:-gnu-unifont-*-iso10646-1,\ 839 mule-unicode-e000-ffff:-gnu-unifont-*-iso10646-1,\
840 mule-unicode-0100-24ff:-gnu-unifont-*-iso10646-1 840 mule-unicode-0100-24ff:-gnu-unifont-*-iso10646-1
841 841
842 ** The UTF-8/16/7 coding systems don't encode CJK (Far Eastern) characters. 842 ** The UTF-8/16/7 coding systems don't encode CJK (Far Eastern) characters.
843 843
844 Emacs by default only supports the parts of the Unicode BMP whose code 844 Emacs directly supports the Unicode BMP whose code points are in the
845 points are in the ranges 0000-33ff and e000-ffff. This excludes: most 845 ranges 0000-33ff and e000-ffff, and indirectly supports the parts of
846 of CJK, Yi and Hangul, as well as everything outside the BMP. 846 CJK characters belonging to these legacy charsets:
847
848 GB2312, Big5, JISX0208, JISX0212, JISX0213-1, JISX0213-2, KSC5601
849
850 The latter support is done in Utf-Translate-Cjk mode (turned on by
851 default). Which Unicode CJK characters are decoded into which Emacs
852 charset is decided by the current language environment. For instance,
853 in Chinese-GB, most of them are decoded into chinese-gb2312.
847 854
848 If you read UTF-8 data with code points outside these ranges, the 855 If you read UTF-8 data with code points outside these ranges, the
849 characters appear in the buffer as raw bytes of the original UTF-8 856 characters appear in the buffer as raw bytes of the original UTF-8
850 (composed into a single quasi-character) and they will be written back 857 (composed into a single quasi-character) and they will be written back
851 correctly as UTF-8, assuming you don't break the composed sequences. 858 correctly as UTF-8, assuming you don't break the composed sequences.
852 If you read such characters from UTF-16 or UTF-7 data, they are 859 If you read such characters from UTF-16 or UTF-7 data, they are
853 substituted with the Unicode `replacement character', and you lose 860 substituted with the Unicode `replacement character', and you lose
854 information. 861 information.
855
856 To edit such UTF data, turn on Utf-Translate-Cjk mode, which makes
857 many common CJK characters available for encoding and decoding and can
858 be extended by updating the tables it uses. This also allows you to
859 save as UTF buffers containing characters decoded by the chinese-,
860 japanese- and korean- coding systems, e.g. cut and pasted from
861 elsewhere.
862 862
863 ** Mule-UCS loads very slowly. 863 ** Mule-UCS loads very slowly.
864 864
865 Changes to Emacs internals interact badly with Mule-UCS's `un-define' 865 Changes to Emacs internals interact badly with Mule-UCS's `un-define'
866 library, which is the usual interface to Mule-UCS. Apply the 866 library, which is the usual interface to Mule-UCS. Apply the