Mercurial > emacs
annotate README.unicode @ 91952:c63fbebcd19c
*** empty log message ***
author | Kenichi Handa <handa@m17n.org> |
---|---|
date | Tue, 19 Feb 2008 07:42:51 +0000 |
parents | d9c3dce41f29 |
children |
rev | line source |
---|---|
89496 | 1 -*-mode: text; coding: latin-1;-*- |
2 | |
91564
9ee03576e1b0
Remove out-of-date comments that assume this is on a branch.
Glenn Morris <rgm@gnu.org>
parents:
90927
diff
changeset
|
3 Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 |
9ee03576e1b0
Remove out-of-date comments that assume this is on a branch.
Glenn Morris <rgm@gnu.org>
parents:
90927
diff
changeset
|
4 Free Software Foundation, Inc. |
9ee03576e1b0
Remove out-of-date comments that assume this is on a branch.
Glenn Morris <rgm@gnu.org>
parents:
90927
diff
changeset
|
5 See the end of the file for license conditions. |
9ee03576e1b0
Remove out-of-date comments that assume this is on a branch.
Glenn Morris <rgm@gnu.org>
parents:
90927
diff
changeset
|
6 |
91565 | 7 Problems, fixmes and other unicode-related issues |
89496 | 8 ------------------------------------------------------------- |
9 | |
10 Notes by fx to record various things of variable importance. handa | |
11 needs to check them -- don't take too seriously, especially with | |
12 regard to completeness. | |
13 | |
14 * SINGLE_BYTE_CHAR_P returns true for Latin-1 characters, which has | |
15 undesirable effects. E.g.: | |
16 (multibyte-string-p (let ((s "x")) (aset s 0 ?£) s)) => nil | |
17 (multibyte-string-p (concat [?£])) => nil | |
18 (text-char-description ?£) => "M-#" | |
19 | |
20 These examples are all fixed by the change of 2002-10-14, but | |
91827 | 21 there still exist questionable SINGLE_BYTE_CHAR_P in the |
89837 | 22 code (keymap.c and print.c). |
89496 | 23 |
24 * Rationalize character syntax and its relationship to the Unicode | |
25 database. (Applies mainly to symbol an punctuation syntax.) | |
26 | |
27 * Fontset handling and customization needs work. We want to relate | |
28 fonts to scripts, probably based on the Unicode blocks. The | |
29 presence of small-repertoire 10646-encoded fonts in XFree 4 is a | |
30 pain, not currently worked round. | |
31 | |
32 With the change on 2002-07-26, multiple fonts can be | |
33 specified in a fontset for a specific range of characters. | |
34 Each range can also be specified by script. Before using | |
35 ISO10646 fonts, Emacs checks their repertories to avoid such | |
36 fonts that don't have a glyph for a specific character. | |
37 | |
89525 | 38 fx has worked on fontset customization, but was stymied by |
39 basic problems with the way the default face is dealt with | |
40 (and something else, I think). This needs revisiting. | |
41 | |
89496 | 42 * Work is also needed on charset and coding system priorities. |
43 | |
44 * The relevant bits of latin1-disp.el need porting (and probably | |
45 re-naming/updating). See also cyril-util.el. | |
46 | |
89525 | 47 * Quail files need more work now the encoding is largely irrelevant. |
89496 | 48 |
49 * What to do with the old coding categories stuff? | |
50 | |
51 * The preferred-coding-system property of charsets should probably be | |
52 junked unless it can be made more useful now. | |
53 | |
54 * find-multibyte-characters needs looking at. | |
55 | |
56 * Implement Korean cp949/UHC, BIG5-HKSCS and any other important missing | |
57 charsets. | |
58 | |
59 * Lazy-load tables for unify-charset somehow? | |
60 | |
91827 | 61 Actually, Emacs clears out all charset maps and unify-map just |
62 before dumping, and they are loaded again on demand by the | |
89496 | 63 dumped emacs. But, those maps (char tables) generated while |
91827 | 64 temacs is running can't be removed from the dumped emacs. |
89496 | 65 |
66 * Translation tables for {en,de}code currently aren't supported. | |
67 | |
68 This should be fixed by the changes of 2002-10-14. | |
69 | |
70 * Defining CCL coding systems currently doesn't work. | |
71 | |
72 This should be fixed by the changes of 2003-01-30. | |
73 | |
74 * iso-2022 charsets get unified on i/o. | |
75 | |
76 With the change on 2003-01-06, decoding routines put `charset' | |
77 property to decoded text, and iso-2022 encoder pay attention | |
78 to it. Thus, for instance, reading and writing by | |
79 iso-2022-7bit preserve the original designation sequences. | |
80 The property name `preferred-charset' may be better? | |
81 | |
82 We may have to utilize this property to decide a font. | |
83 | |
84 * Revisit locale processing: look at treating the language and | |
85 charset parts separately. (Language should affect things like | |
91827 | 86 spelling and calendar, but that's not a Unicode issue.) |
89496 | 87 |
88 * Handle Unicode combining characters usefully, e.g. diacritics, and | |
89 handle more scripts specifically (à la Devanagari). There are | |
90 issues with canonicalization. | |
91 | |
92 * Bidi is a separate issue with no support currently. | |
93 | |
94 * We need tabular input methods, e.g. for maths symbols. (Not | |
95 specific to Unicode.) | |
96 | |
97 * Need multibyte text in menus, e.g. for the above. (Not specific to | |
89525 | 98 Unicode -- see Emacs etc/TODO, but now mostly works with gtk.) |
89496 | 99 |
100 * There's currently no support for Unicode normalization. | |
101 | |
91827 | 102 * Populate char-width-table correctly for Unicode characters and |
89496 | 103 worry about what happens when double-width charsets covering |
104 non-CJK characters are unified. | |
105 | |
106 * Emacs 20/21 .elc files are currently not loadable. It may or may | |
107 not be possible to do this properly. | |
108 | |
109 With the change on 2002-07-24, elc files generated by Emacs | |
110 20.3 and later are correctly loaded (including those | |
111 containing multibyte characters and compressed). But, elc | |
112 files generated by 20.2 and the primer are still not loadable. | |
113 Is it really worth working on it? | |
114 | |
115 * Rmail won't work with non-ASCII text. Encoding issues for Babyl | |
116 files need sorting out, but rms says Babyl will go before this is | |
117 released. | |
118 | |
119 * Gnus still needs some attention, and we need to get changes | |
120 accepted by Gnus maintainers... | |
121 | |
122 * There are type errors lurking, e.g. in | |
123 Fcheck_coding_systems_region. Define ENABLE_CHECKING to find them. | |
124 | |
125 * You can grep the code for lots of fixmes. | |
126 | |
127 * Old auto-save files, and similar files, such as Gnus drafts, | |
128 containing non-ASCII characters probably won't be re-read correctly. | |
90424 | 129 |
130 | |
131 | |
132 New font handling mechanism with font backend method | |
133 ---------------------------------------------------- | |
134 | |
91564
9ee03576e1b0
Remove out-of-date comments that assume this is on a branch.
Glenn Morris <rgm@gnu.org>
parents:
90927
diff
changeset
|
135 Emacs now contains new codes for handling fonts by multiple font |
90424 | 136 backends. The old font handling codes still exist completely parallel |
137 to the new codes, and the new codes are used only when you configure | |
90927 | 138 Emacs with the argument "--enable-font-backend". |
90424 | 139 |
90611 | 140 Which font backends to use can be specified by X resource |
141 "FontBackend". For instance, if you want to use Xft fonts only, | |
142 | |
143 Emacs.FontBackend: xft | |
144 | |
145 will work. If this resource is not set, Emacs tries to use all font | |
146 backends available on your graphic device. | |
147 | |
90424 | 148 The configure script, if invoked with "--enable-font-backend", checks |
90927 | 149 if libraries freetype and fontconfig exist. If they are both |
90598 | 150 available, macro "USE_FONT_BACKEND" is defined in src/config.h. In |
151 that case, the existing of Xft library is checked too. | |
90424 | 152 |
153 The new files are: | |
90597 | 154 font.h -- header providing font-backend related structures |
155 (most important ones are "struct font" and "struct | |
156 font_driver"), macros, and etc. | |
90424 | 157 font.c -- main font handling code. |
158 xfont.c -- font-driver on X for X core fonts. | |
90597 | 159 ftfont.c -- generic font-driver for FreeType fonts providing |
160 device-independent methods of struct font_driver. | |
161 xftfont.c -- font-driver on X using Xft for FreeType fonts | |
162 utilizing methods provided by ftfont.c. | |
163 ftxfont.c -- font-driver on X directly using FreeType fonts | |
164 utilizing methods provided by ftfont.c. | |
90912 | 165 w32font.c -- font driver on w32 using Windows native fonts, |
166 corresponding to xfont.c | |
90424 | 167 |
90597 | 168 So we already have codes for X. For the other systems (w32 and mac), |
90424 | 169 it seems that we need these files: |
90597 | 170 atmfont.c -- font-driver on mac using ATM fonts, corresponding |
171 to xfont.c | |
172 As BDF fonts are currently used on w32, we may also implement these: | |
173 bdffont.c -- generic font-driver for BDF fonts, corresponding to | |
174 ftfont.c | |
175 bdfw32font.c -- font-driver on w32 using BDF fonts, | |
176 corresponding to ftxfont.c | |
177 But, as FreeType already supports BDF fonts, if FreeType and | |
178 Fontconfig are also available on w32, what we need may be: | |
179 ftw32font.c -- font-driver on w32 directly using FreeType fonts | |
180 utilizing methods provided by ftfont.c. | |
90424 | 181 |
90912 | 182 And, for those to work, macterm.c and macfns.c must be changed by the |
183 similar way as xterm.c and xfns.c (the parts "#ifdef USE_FONT_BACKEND" | |
184 ... "#endif" should be checked). | |
90597 | 185 |
186 It may be interesting if Emacs supports a frame buffer directly and | |
187 have these font driver. | |
90424 | 188 ftfbfont.c -- font-driver on FB for FreeType fonts. |
189 bdffbfont.c -- font-driver on FB for BDF fonts. | |
90705 | 190 |
91827 | 191 Note: The fontset related codes are not yet matured to work well with |
90705 | 192 the font backend method. So, for instance, even if you start Emacs |
193 as something like this: | |
90927 | 194 % emacs -fn tahoma |
90706 | 195 Non-ASCII Latin characters will not be displayed by the font "tahoma". |
196 In such a case, please try this: | |
90705 | 197 |
198 (set-fontset-font "fontset-default" 'latin '("tahoma" . "unicode-bmp")) | |
91564
9ee03576e1b0
Remove out-of-date comments that assume this is on a branch.
Glenn Morris <rgm@gnu.org>
parents:
90927
diff
changeset
|
199 |
9ee03576e1b0
Remove out-of-date comments that assume this is on a branch.
Glenn Morris <rgm@gnu.org>
parents:
90927
diff
changeset
|
200 |
9ee03576e1b0
Remove out-of-date comments that assume this is on a branch.
Glenn Morris <rgm@gnu.org>
parents:
90927
diff
changeset
|
201 This file is part of GNU Emacs. |
9ee03576e1b0
Remove out-of-date comments that assume this is on a branch.
Glenn Morris <rgm@gnu.org>
parents:
90927
diff
changeset
|
202 |
9ee03576e1b0
Remove out-of-date comments that assume this is on a branch.
Glenn Morris <rgm@gnu.org>
parents:
90927
diff
changeset
|
203 GNU Emacs is free software; you can redistribute it and/or modify |
9ee03576e1b0
Remove out-of-date comments that assume this is on a branch.
Glenn Morris <rgm@gnu.org>
parents:
90927
diff
changeset
|
204 it under the terms of the GNU General Public License as published by |
9ee03576e1b0
Remove out-of-date comments that assume this is on a branch.
Glenn Morris <rgm@gnu.org>
parents:
90927
diff
changeset
|
205 the Free Software Foundation; either version 3, or (at your option) |
9ee03576e1b0
Remove out-of-date comments that assume this is on a branch.
Glenn Morris <rgm@gnu.org>
parents:
90927
diff
changeset
|
206 any later version. |
9ee03576e1b0
Remove out-of-date comments that assume this is on a branch.
Glenn Morris <rgm@gnu.org>
parents:
90927
diff
changeset
|
207 |
9ee03576e1b0
Remove out-of-date comments that assume this is on a branch.
Glenn Morris <rgm@gnu.org>
parents:
90927
diff
changeset
|
208 GNU Emacs is distributed in the hope that it will be useful, |
9ee03576e1b0
Remove out-of-date comments that assume this is on a branch.
Glenn Morris <rgm@gnu.org>
parents:
90927
diff
changeset
|
209 but WITHOUT ANY WARRANTY; without even the implied warranty of |
9ee03576e1b0
Remove out-of-date comments that assume this is on a branch.
Glenn Morris <rgm@gnu.org>
parents:
90927
diff
changeset
|
210 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
9ee03576e1b0
Remove out-of-date comments that assume this is on a branch.
Glenn Morris <rgm@gnu.org>
parents:
90927
diff
changeset
|
211 GNU General Public License for more details. |
9ee03576e1b0
Remove out-of-date comments that assume this is on a branch.
Glenn Morris <rgm@gnu.org>
parents:
90927
diff
changeset
|
212 |
9ee03576e1b0
Remove out-of-date comments that assume this is on a branch.
Glenn Morris <rgm@gnu.org>
parents:
90927
diff
changeset
|
213 You should have received a copy of the GNU General Public License |
9ee03576e1b0
Remove out-of-date comments that assume this is on a branch.
Glenn Morris <rgm@gnu.org>
parents:
90927
diff
changeset
|
214 along with GNU Emacs; see the file COPYING. If not, write to the |
9ee03576e1b0
Remove out-of-date comments that assume this is on a branch.
Glenn Morris <rgm@gnu.org>
parents:
90927
diff
changeset
|
215 Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, |
9ee03576e1b0
Remove out-of-date comments that assume this is on a branch.
Glenn Morris <rgm@gnu.org>
parents:
90927
diff
changeset
|
216 Boston, MA 02110-1301, USA. |