comparison etc/PROBLEMS @ 90228:fa0da9b57058

Revision: miles@gnu.org--gnu-2005/emacs--unicode--0--patch-82 Merge from emacs--cvs-trunk--0 Patches applied: * emacs--cvs-trunk--0 (patch 542-553) - Update from CVS - Merge from gnus--rel--5.10 * gnus--rel--5.10 (patch 116-121) - Merge from emacs--cvs-trunk--0 - Update from CVS
author Miles Bader <miles@gnu.org>
date Mon, 19 Sep 2005 10:20:33 +0000
parents 10fe5fadaf89 a3cb8f9ce434
children 7beb78bc1f8e
comparison
equal deleted inserted replaced
90227:10fe5fadaf89 90228:fa0da9b57058
843 mule-unicode-e000-ffff:-gnu-unifont-*-iso10646-1,\ 843 mule-unicode-e000-ffff:-gnu-unifont-*-iso10646-1,\
844 mule-unicode-0100-24ff:-gnu-unifont-*-iso10646-1 844 mule-unicode-0100-24ff:-gnu-unifont-*-iso10646-1
845 845
846 ** The UTF-8/16/7 coding systems don't encode CJK (Far Eastern) characters. 846 ** The UTF-8/16/7 coding systems don't encode CJK (Far Eastern) characters.
847 847
848 Emacs by default only supports the parts of the Unicode BMP whose code 848 Emacs directly supports the Unicode BMP whose code points are in the
849 points are in the ranges 0000-33ff and e000-ffff. This excludes: most 849 ranges 0000-33ff and e000-ffff, and indirectly supports the parts of
850 of CJK, Yi and Hangul, as well as everything outside the BMP. 850 CJK characters belonging to these legacy charsets:
851
852 GB2312, Big5, JISX0208, JISX0212, JISX0213-1, JISX0213-2, KSC5601
853
854 The latter support is done in Utf-Translate-Cjk mode (turned on by
855 default). Which Unicode CJK characters are decoded into which Emacs
856 charset is decided by the current language environment. For instance,
857 in Chinese-GB, most of them are decoded into chinese-gb2312.
851 858
852 If you read UTF-8 data with code points outside these ranges, the 859 If you read UTF-8 data with code points outside these ranges, the
853 characters appear in the buffer as raw bytes of the original UTF-8 860 characters appear in the buffer as raw bytes of the original UTF-8
854 (composed into a single quasi-character) and they will be written back 861 (composed into a single quasi-character) and they will be written back
855 correctly as UTF-8, assuming you don't break the composed sequences. 862 correctly as UTF-8, assuming you don't break the composed sequences.
856 If you read such characters from UTF-16 or UTF-7 data, they are 863 If you read such characters from UTF-16 or UTF-7 data, they are
857 substituted with the Unicode `replacement character', and you lose 864 substituted with the Unicode `replacement character', and you lose
858 information. 865 information.
859
860 To edit such UTF data, turn on Utf-Translate-Cjk mode, which makes
861 many common CJK characters available for encoding and decoding and can
862 be extended by updating the tables it uses. This also allows you to
863 save as UTF buffers containing characters decoded by the chinese-,
864 japanese- and korean- coding systems, e.g. cut and pasted from
865 elsewhere.
866 866
867 ** Mule-UCS loads very slowly. 867 ** Mule-UCS loads very slowly.
868 868
869 Changes to Emacs internals interact badly with Mule-UCS's `un-define' 869 Changes to Emacs internals interact badly with Mule-UCS's `un-define'
870 library, which is the usual interface to Mule-UCS. Apply the 870 library, which is the usual interface to Mule-UCS. Apply the