Mercurial > emacs
annotate admin/notes/unicode @ 101976:3162f3263757
*** empty log message ***
author | Jason Rumney <jasonr@gnu.org> |
---|---|
date | Thu, 12 Feb 2009 14:06:24 +0000 |
parents | ce88a631c161 |
children | 1d1d5d9bd884 |
rev | line source |
---|---|
92006 | 1 -*-mode: text; coding: latin-1;-*- |
2 | |
100971 | 3 Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009 |
92006 | 4 Free Software Foundation, Inc. |
5 See the end of the file for license conditions. | |
6 | |
7 Problems, fixmes and other unicode-related issues | |
8 ------------------------------------------------------------- | |
9 | |
10 Notes by fx to record various things of variable importance. handa | |
11 needs to check them -- don't take too seriously, especially with | |
12 regard to completeness. | |
13 | |
14 * SINGLE_BYTE_CHAR_P returns true for Latin-1 characters, which has | |
15 undesirable effects. E.g.: | |
16 (multibyte-string-p (let ((s "x")) (aset s 0 ?£) s)) => nil | |
17 (multibyte-string-p (concat [?£])) => nil | |
18 (text-char-description ?£) => "M-#" | |
19 | |
20 These examples are all fixed by the change of 2002-10-14, but | |
21 there still exist questionable SINGLE_BYTE_CHAR_P in the | |
22 code (keymap.c and print.c). | |
23 | |
24 * Rationalize character syntax and its relationship to the Unicode | |
25 database. (Applies mainly to symbol an punctuation syntax.) | |
26 | |
27 * Fontset handling and customization needs work. We want to relate | |
28 fonts to scripts, probably based on the Unicode blocks. The | |
29 presence of small-repertoire 10646-encoded fonts in XFree 4 is a | |
30 pain, not currently worked round. | |
31 | |
32 With the change on 2002-07-26, multiple fonts can be | |
33 specified in a fontset for a specific range of characters. | |
34 Each range can also be specified by script. Before using | |
35 ISO10646 fonts, Emacs checks their repertories to avoid such | |
36 fonts that don't have a glyph for a specific character. | |
37 | |
38 fx has worked on fontset customization, but was stymied by | |
39 basic problems with the way the default face is dealt with | |
40 (and something else, I think). This needs revisiting. | |
41 | |
42 * Work is also needed on charset and coding system priorities. | |
43 | |
44 * The relevant bits of latin1-disp.el need porting (and probably | |
45 re-naming/updating). See also cyril-util.el. | |
46 | |
47 * Quail files need more work now the encoding is largely irrelevant. | |
48 | |
49 * What to do with the old coding categories stuff? | |
50 | |
51 * The preferred-coding-system property of charsets should probably be | |
52 junked unless it can be made more useful now. | |
53 | |
54 * find-multibyte-characters needs looking at. | |
55 | |
56 * Implement Korean cp949/UHC, BIG5-HKSCS and any other important missing | |
57 charsets. | |
58 | |
59 * Lazy-load tables for unify-charset somehow? | |
60 | |
61 Actually, Emacs clears out all charset maps and unify-map just | |
62 before dumping, and they are loaded again on demand by the | |
63 dumped emacs. But, those maps (char tables) generated while | |
64 temacs is running can't be removed from the dumped emacs. | |
65 | |
66 * Translation tables for {en,de}code currently aren't supported. | |
67 | |
68 This should be fixed by the changes of 2002-10-14. | |
69 | |
70 * Defining CCL coding systems currently doesn't work. | |
71 | |
72 This should be fixed by the changes of 2003-01-30. | |
73 | |
74 * iso-2022 charsets get unified on i/o. | |
75 | |
76 With the change on 2003-01-06, decoding routines put `charset' | |
77 property to decoded text, and iso-2022 encoder pay attention | |
78 to it. Thus, for instance, reading and writing by | |
79 iso-2022-7bit preserve the original designation sequences. | |
80 The property name `preferred-charset' may be better? | |
81 | |
82 We may have to utilize this property to decide a font. | |
83 | |
84 * Revisit locale processing: look at treating the language and | |
85 charset parts separately. (Language should affect things like | |
86 spelling and calendar, but that's not a Unicode issue.) | |
87 | |
88 * Handle Unicode combining characters usefully, e.g. diacritics, and | |
89 handle more scripts specifically (à la Devanagari). There are | |
90 issues with canonicalization. | |
91 | |
92 * Bidi is a separate issue with no support currently. | |
93 | |
94 * We need tabular input methods, e.g. for maths symbols. (Not | |
95 specific to Unicode.) | |
96 | |
97 * Need multibyte text in menus, e.g. for the above. (Not specific to | |
98 Unicode -- see Emacs etc/TODO, but now mostly works with gtk.) | |
99 | |
100 * There's currently no support for Unicode normalization. | |
101 | |
102 * Populate char-width-table correctly for Unicode characters and | |
103 worry about what happens when double-width charsets covering | |
104 non-CJK characters are unified. | |
105 | |
106 * Emacs 20/21 .elc files are currently not loadable. It may or may | |
107 not be possible to do this properly. | |
108 | |
109 With the change on 2002-07-24, elc files generated by Emacs | |
110 20.3 and later are correctly loaded (including those | |
111 containing multibyte characters and compressed). But, elc | |
112 files generated by 20.2 and the primer are still not loadable. | |
113 Is it really worth working on it? | |
114 | |
115 * Rmail won't work with non-ASCII text. Encoding issues for Babyl | |
116 files need sorting out, but rms says Babyl will go before this is | |
117 released. | |
118 | |
119 * Gnus still needs some attention, and we need to get changes | |
120 accepted by Gnus maintainers... | |
121 | |
122 * There are type errors lurking, e.g. in | |
123 Fcheck_coding_systems_region. Define ENABLE_CHECKING to find them. | |
124 | |
125 * You can grep the code for lots of fixmes. | |
126 | |
127 * Old auto-save files, and similar files, such as Gnus drafts, | |
128 containing non-ASCII characters probably won't be re-read correctly. | |
129 | |
130 | |
131 This file is part of GNU Emacs. | |
132 | |
94831
cac099ec0724
Switch to recommended form of GPLv3 permissions notice.
Glenn Morris <rgm@gnu.org>
parents:
92006
diff
changeset
|
133 GNU Emacs is free software: you can redistribute it and/or modify |
92006 | 134 it under the terms of the GNU General Public License as published by |
94831
cac099ec0724
Switch to recommended form of GPLv3 permissions notice.
Glenn Morris <rgm@gnu.org>
parents:
92006
diff
changeset
|
135 the Free Software Foundation, either version 3 of the License, or |
cac099ec0724
Switch to recommended form of GPLv3 permissions notice.
Glenn Morris <rgm@gnu.org>
parents:
92006
diff
changeset
|
136 (at your option) any later version. |
92006 | 137 |
138 GNU Emacs is distributed in the hope that it will be useful, | |
139 but WITHOUT ANY WARRANTY; without even the implied warranty of | |
140 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | |
141 GNU General Public License for more details. | |
142 | |
143 You should have received a copy of the GNU General Public License | |
94831
cac099ec0724
Switch to recommended form of GPLv3 permissions notice.
Glenn Morris <rgm@gnu.org>
parents:
92006
diff
changeset
|
144 along with GNU Emacs. If not, see <http://www.gnu.org/licenses/>. |