Mercurial > emacs
comparison etc/PROBLEMS @ 65518:a3cb8f9ce434
Fix the paragraph describing the limitation of
UTF-8/16/7.
author | Kenichi Handa <handa@m17n.org> |
---|---|
date | Thu, 15 Sep 2005 02:54:42 +0000 |
parents | 601c1d04dcb1 |
children | 50f2dd53cf9a fa0da9b57058 |
comparison
equal
deleted
inserted
replaced
65517:3d5ac74b885b | 65518:a3cb8f9ce434 |
---|---|
839 mule-unicode-e000-ffff:-gnu-unifont-*-iso10646-1,\ | 839 mule-unicode-e000-ffff:-gnu-unifont-*-iso10646-1,\ |
840 mule-unicode-0100-24ff:-gnu-unifont-*-iso10646-1 | 840 mule-unicode-0100-24ff:-gnu-unifont-*-iso10646-1 |
841 | 841 |
842 ** The UTF-8/16/7 coding systems don't encode CJK (Far Eastern) characters. | 842 ** The UTF-8/16/7 coding systems don't encode CJK (Far Eastern) characters. |
843 | 843 |
844 Emacs by default only supports the parts of the Unicode BMP whose code | 844 Emacs directly supports the Unicode BMP whose code points are in the |
845 points are in the ranges 0000-33ff and e000-ffff. This excludes: most | 845 ranges 0000-33ff and e000-ffff, and indirectly supports the parts of |
846 of CJK, Yi and Hangul, as well as everything outside the BMP. | 846 CJK characters belonging to these legacy charsets: |
847 | |
848 GB2312, Big5, JISX0208, JISX0212, JISX0213-1, JISX0213-2, KSC5601 | |
849 | |
850 The latter support is done in Utf-Translate-Cjk mode (turned on by | |
851 default). Which Unicode CJK characters are decoded into which Emacs | |
852 charset is decided by the current language environment. For instance, | |
853 in Chinese-GB, most of them are decoded into chinese-gb2312. | |
847 | 854 |
848 If you read UTF-8 data with code points outside these ranges, the | 855 If you read UTF-8 data with code points outside these ranges, the |
849 characters appear in the buffer as raw bytes of the original UTF-8 | 856 characters appear in the buffer as raw bytes of the original UTF-8 |
850 (composed into a single quasi-character) and they will be written back | 857 (composed into a single quasi-character) and they will be written back |
851 correctly as UTF-8, assuming you don't break the composed sequences. | 858 correctly as UTF-8, assuming you don't break the composed sequences. |
852 If you read such characters from UTF-16 or UTF-7 data, they are | 859 If you read such characters from UTF-16 or UTF-7 data, they are |
853 substituted with the Unicode `replacement character', and you lose | 860 substituted with the Unicode `replacement character', and you lose |
854 information. | 861 information. |
855 | |
856 To edit such UTF data, turn on Utf-Translate-Cjk mode, which makes | |
857 many common CJK characters available for encoding and decoding and can | |
858 be extended by updating the tables it uses. This also allows you to | |
859 save as UTF buffers containing characters decoded by the chinese-, | |
860 japanese- and korean- coding systems, e.g. cut and pasted from | |
861 elsewhere. | |
862 | 862 |
863 ** Mule-UCS loads very slowly. | 863 ** Mule-UCS loads very slowly. |
864 | 864 |
865 Changes to Emacs internals interact badly with Mule-UCS's `un-define' | 865 Changes to Emacs internals interact badly with Mule-UCS's `un-define' |
866 library, which is the usual interface to Mule-UCS. Apply the | 866 library, which is the usual interface to Mule-UCS. Apply the |