annotate admin/notes/unicode @ 99492:ee792794d888

(isearch-search-fun): Compare the length of the current search string with the length of the string from the previous search state to detect the situation when the user adds or removes characters in the search string. Use word-search-forward-lax and word-search-backward-lax in this case, and otherwise word-search-forward and word-search-backward.
author Juri Linkov <juri@jurta.org>
date Tue, 11 Nov 2008 19:43:09 +0000
parents cac099ec0724
children ce88a631c161
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
92006
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
1 -*-mode: text; coding: latin-1;-*-
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
2
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
3 Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
4 Free Software Foundation, Inc.
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
5 See the end of the file for license conditions.
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
6
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
7 Problems, fixmes and other unicode-related issues
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
8 -------------------------------------------------------------
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
9
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
10 Notes by fx to record various things of variable importance. handa
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
11 needs to check them -- don't take too seriously, especially with
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
12 regard to completeness.
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
13
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
14 * SINGLE_BYTE_CHAR_P returns true for Latin-1 characters, which has
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
15 undesirable effects. E.g.:
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
16 (multibyte-string-p (let ((s "x")) (aset s 0 ?£) s)) => nil
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
17 (multibyte-string-p (concat [?£])) => nil
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
18 (text-char-description ?£) => "M-#"
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
19
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
20 These examples are all fixed by the change of 2002-10-14, but
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
21 there still exist questionable SINGLE_BYTE_CHAR_P in the
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
22 code (keymap.c and print.c).
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
23
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
24 * Rationalize character syntax and its relationship to the Unicode
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
25 database. (Applies mainly to symbol an punctuation syntax.)
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
26
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
27 * Fontset handling and customization needs work. We want to relate
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
28 fonts to scripts, probably based on the Unicode blocks. The
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
29 presence of small-repertoire 10646-encoded fonts in XFree 4 is a
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
30 pain, not currently worked round.
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
31
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
32 With the change on 2002-07-26, multiple fonts can be
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
33 specified in a fontset for a specific range of characters.
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
34 Each range can also be specified by script. Before using
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
35 ISO10646 fonts, Emacs checks their repertories to avoid such
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
36 fonts that don't have a glyph for a specific character.
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
37
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
38 fx has worked on fontset customization, but was stymied by
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
39 basic problems with the way the default face is dealt with
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
40 (and something else, I think). This needs revisiting.
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
41
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
42 * Work is also needed on charset and coding system priorities.
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
43
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
44 * The relevant bits of latin1-disp.el need porting (and probably
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
45 re-naming/updating). See also cyril-util.el.
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
46
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
47 * Quail files need more work now the encoding is largely irrelevant.
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
48
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
49 * What to do with the old coding categories stuff?
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
50
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
51 * The preferred-coding-system property of charsets should probably be
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
52 junked unless it can be made more useful now.
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
53
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
54 * find-multibyte-characters needs looking at.
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
55
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
56 * Implement Korean cp949/UHC, BIG5-HKSCS and any other important missing
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
57 charsets.
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
58
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
59 * Lazy-load tables for unify-charset somehow?
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
60
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
61 Actually, Emacs clears out all charset maps and unify-map just
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
62 before dumping, and they are loaded again on demand by the
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
63 dumped emacs. But, those maps (char tables) generated while
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
64 temacs is running can't be removed from the dumped emacs.
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
65
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
66 * Translation tables for {en,de}code currently aren't supported.
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
67
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
68 This should be fixed by the changes of 2002-10-14.
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
69
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
70 * Defining CCL coding systems currently doesn't work.
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
71
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
72 This should be fixed by the changes of 2003-01-30.
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
73
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
74 * iso-2022 charsets get unified on i/o.
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
75
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
76 With the change on 2003-01-06, decoding routines put `charset'
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
77 property to decoded text, and iso-2022 encoder pay attention
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
78 to it. Thus, for instance, reading and writing by
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
79 iso-2022-7bit preserve the original designation sequences.
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
80 The property name `preferred-charset' may be better?
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
81
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
82 We may have to utilize this property to decide a font.
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
83
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
84 * Revisit locale processing: look at treating the language and
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
85 charset parts separately. (Language should affect things like
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
86 spelling and calendar, but that's not a Unicode issue.)
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
87
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
88 * Handle Unicode combining characters usefully, e.g. diacritics, and
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
89 handle more scripts specifically (à la Devanagari). There are
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
90 issues with canonicalization.
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
91
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
92 * Bidi is a separate issue with no support currently.
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
93
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
94 * We need tabular input methods, e.g. for maths symbols. (Not
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
95 specific to Unicode.)
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
96
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
97 * Need multibyte text in menus, e.g. for the above. (Not specific to
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
98 Unicode -- see Emacs etc/TODO, but now mostly works with gtk.)
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
99
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
100 * There's currently no support for Unicode normalization.
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
101
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
102 * Populate char-width-table correctly for Unicode characters and
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
103 worry about what happens when double-width charsets covering
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
104 non-CJK characters are unified.
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
105
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
106 * Emacs 20/21 .elc files are currently not loadable. It may or may
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
107 not be possible to do this properly.
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
108
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
109 With the change on 2002-07-24, elc files generated by Emacs
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
110 20.3 and later are correctly loaded (including those
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
111 containing multibyte characters and compressed). But, elc
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
112 files generated by 20.2 and the primer are still not loadable.
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
113 Is it really worth working on it?
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
114
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
115 * Rmail won't work with non-ASCII text. Encoding issues for Babyl
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
116 files need sorting out, but rms says Babyl will go before this is
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
117 released.
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
118
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
119 * Gnus still needs some attention, and we need to get changes
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
120 accepted by Gnus maintainers...
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
121
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
122 * There are type errors lurking, e.g. in
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
123 Fcheck_coding_systems_region. Define ENABLE_CHECKING to find them.
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
124
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
125 * You can grep the code for lots of fixmes.
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
126
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
127 * Old auto-save files, and similar files, such as Gnus drafts,
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
128 containing non-ASCII characters probably won't be re-read correctly.
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
129
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
130
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
131 This file is part of GNU Emacs.
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
132
94831
cac099ec0724 Switch to recommended form of GPLv3 permissions notice.
Glenn Morris <rgm@gnu.org>
parents: 92006
diff changeset
133 GNU Emacs is free software: you can redistribute it and/or modify
92006
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
134 it under the terms of the GNU General Public License as published by
94831
cac099ec0724 Switch to recommended form of GPLv3 permissions notice.
Glenn Morris <rgm@gnu.org>
parents: 92006
diff changeset
135 the Free Software Foundation, either version 3 of the License, or
cac099ec0724 Switch to recommended form of GPLv3 permissions notice.
Glenn Morris <rgm@gnu.org>
parents: 92006
diff changeset
136 (at your option) any later version.
92006
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
137
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
138 GNU Emacs is distributed in the hope that it will be useful,
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
139 but WITHOUT ANY WARRANTY; without even the implied warranty of
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
140 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
141 GNU General Public License for more details.
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
142
850ec4b2f0bc Split off from README.unicode
Glenn Morris <rgm@gnu.org>
parents:
diff changeset
143 You should have received a copy of the GNU General Public License
94831
cac099ec0724 Switch to recommended form of GPLv3 permissions notice.
Glenn Morris <rgm@gnu.org>
parents: 92006
diff changeset
144 along with GNU Emacs. If not, see <http://www.gnu.org/licenses/>.