Mercurial > emacs
annotate src/coding.c @ 88435:72f73971423b
Set `iso-8859-2' for `nonascii-translation'.
author | Kenichi Handa <handa@m17n.org> |
---|---|
date | Tue, 05 Mar 2002 01:06:51 +0000 |
parents | 6418a272b97e |
children | 3a34b722dd71 |
rev | line source |
---|---|
17052 | 1 /* Coding system handler (conversion, detection, and etc). |
20708 | 2 Copyright (C) 1995, 1997, 1998 Electrotechnical Laboratory, JAPAN. |
18269
888bfd80db2c
Change copyright notices.
Richard M. Stallman <rms@gnu.org>
parents:
18180
diff
changeset
|
3 Licensed to the Free Software Foundation. |
38518
883da5f3dbac
(code_convert_region): Handle the multibyte case if
Gerd Moellmann <gerd@gnu.org>
parents:
38473
diff
changeset
|
4 Copyright (C) 2001 Free Software Foundation, Inc. |
88365 | 5 Copyright (C) 2001, 2002 |
6 National Institute of Advanced Industrial Science and Technology (AIST) | |
7 Registration Number H13PRO009 | |
17052 | 8 |
17071 | 9 This file is part of GNU Emacs. |
10 | |
11 GNU Emacs is free software; you can redistribute it and/or modify | |
12 it under the terms of the GNU General Public License as published by | |
13 the Free Software Foundation; either version 2, or (at your option) | |
14 any later version. | |
15 | |
16 GNU Emacs is distributed in the hope that it will be useful, | |
17 but WITHOUT ANY WARRANTY; without even the implied warranty of | |
18 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | |
19 GNU General Public License for more details. | |
20 | |
21 You should have received a copy of the GNU General Public License | |
22 along with GNU Emacs; see the file COPYING. If not, write to | |
23 the Free Software Foundation, Inc., 59 Temple Place - Suite 330, | |
24 Boston, MA 02111-1307, USA. */ | |
17052 | 25 |
26 /*** TABLE OF CONTENTS *** | |
27 | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
28 0. General comments |
17052 | 29 1. Preamble |
88365 | 30 2. Emacs' internal format (emacs-utf-8) handlers |
31 3. UTF-8 handlers | |
32 4. UTF-16 handlers | |
33 5. Charset-base coding systems handlers | |
34 6. emacs-mule (old Emacs' internal format) handlers | |
35 7. ISO2022 handlers | |
36 8. Shift-JIS and BIG5 handlers | |
37 9. CCL handlers | |
38 10. C library functions | |
39 11. Emacs Lisp library functions | |
40 12. Postamble | |
17052 | 41 |
42 */ | |
43 | |
88365 | 44 /*** 0. General comments *** |
45 | |
46 | |
47 CODING SYSTEM | |
48 | |
49 Coding system is an encoding mechanism of one or more character | |
50 sets. Here's a list of coding system types supported by Emacs. | |
51 When we say "decode", it means converting a text encoded by some | |
52 coding system into Emacs' internal format (emacs-utf-8), and when we | |
53 say "encode", it means converting a text of emacs-utf-8 to some | |
54 other coding system. | |
55 | |
56 Emacs represents a coding system by a Lisp symbol. Each symbol is a | |
57 key to the hash table Vcharset_hash_table. This hash table | |
58 associates the symbol to the corresponding detailed specifications. | |
59 | |
60 Before using a coding system for decoding and encoding, we setup a | |
61 structure of type `struct coding_system'. This structure keeps | |
62 various information about a specific code conversion (e.g. the | |
63 location of source and destination data). | |
64 | |
65 Coding systems are classified into the following types by how to | |
66 represent a character in a byte sequence. Here's a brief descrition | |
67 about type. | |
68 | |
69 o Emacs' internal format (emacs-utf-8) | |
70 | |
71 The extended UTF-8 which allows eight-bit raw bytes mixed with | |
72 character codes. Emacs holds characters in buffers and strings by | |
73 this format. | |
74 | |
75 o UTF-8 | |
76 | |
77 o UTF-16 | |
78 | |
79 o Charset-base coding system | |
80 | |
81 A coding system defined by one or more (coded) character sets. | |
82 Decoding and encoding are done by code converter defined for each | |
83 character set. | |
84 | |
85 o Old Emacs' internal format (emacs-mule) | |
86 | |
87 The coding system adopted by an old versions of Emacs (20 and 21). | |
88 | |
89 o ISO2022-base coding system | |
17052 | 90 |
91 The most famous coding system for multiple character sets. X's | |
88365 | 92 Compound Text, various EUCs (Extended Unix Code), and coding systems |
93 used in the Internet communication such as ISO-2022-JP are all | |
94 variants of ISO2022. | |
95 | |
96 o SJIS (or Shift-JIS or MS-Kanji-Code) | |
97 | |
17052 | 98 A coding system to encode character sets: ASCII, JISX0201, and |
99 JISX0208. Widely used for PC's in Japan. Details are described in | |
88365 | 100 section 8. |
101 | |
102 o BIG5 | |
103 | |
104 A coding system to encode character sets: ASCII and Big5. Widely | |
105 used by Chinese (mainly in Taiwan and Hong Kong). Details are | |
106 described in section 8. In this file, when we write "big5" (all | |
107 lowercase), we mean the coding system, and when we write "Big5" | |
108 (capitalized), we mean the character set. | |
109 | |
110 o CCL | |
111 | |
112 If a user wants to decode/encode a text encoded in a coding system | |
113 not listed above, he can supply a decoder and an encoder for it in | |
114 CCL (Code Conversion Language) programs. Emacs executes the CCL | |
115 program while decoding/encoding. | |
116 | |
117 o Raw-text | |
118 | |
119 A coding system for a text containing raw eight-bit data. Emacs | |
120 treat each byte of source text as a character (except for | |
121 end-of-line conversion). | |
122 | |
123 o No-conversion | |
124 | |
125 Like raw text, but don't do end-of-line conversion. | |
126 | |
127 | |
128 END-OF-LINE FORMAT | |
129 | |
130 How end-of-line of a text is encoded depends on a system. For | |
131 instance, Unix's format is just one byte of LF (line-feed) code, | |
18766 | 132 whereas DOS's format is two-byte sequence of `carriage-return' and |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
133 `line-feed' codes. MacOS's format is usually one byte of |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
134 `carriage-return'. |
17052 | 135 |
88365 | 136 Since text characters encoding and end-of-line encoding are |
137 independent, any coding system described above can take any format | |
138 of end-of-line (except for no-conversion). | |
17052 | 139 |
140 */ | |
141 | |
88365 | 142 /* COMMON MACROS */ |
143 | |
144 | |
17052 | 145 /*** GENERAL NOTES on `detect_coding_XXX ()' functions *** |
146 | |
88365 | 147 These functions check if a byte sequence specified as a source in |
148 CODING conforms to the format of XXX. Return 1 if the data contains | |
149 a byte sequence which can be decoded into non-ASCII characters by | |
150 the coding system. Otherwize (i.e. the data contains only ASCII | |
151 characters or invalid sequence) return 0. | |
152 | |
153 It also resets some bits of an integer pointed by MASK. The macros | |
154 CATEGORY_MASK_XXX specifies each bit of this integer. | |
155 | |
156 Below is the template of these functions. */ | |
157 | |
17052 | 158 #if 0 |
88365 | 159 static int |
160 detect_coding_XXX (coding, mask) | |
161 struct coding_system *coding; | |
162 int *mask; | |
17052 | 163 { |
88365 | 164 unsigned char *src = coding->source; |
165 unsigned char *src_end = coding->source + coding->src_bytes; | |
166 int multibytep = coding->src_multibyte; | |
167 int c; | |
168 int found = 0; | |
169 ...; | |
170 | |
171 while (1) | |
172 { | |
173 /* Get one byte from the source. If the souce is exausted, jump | |
174 to no_more_source:. */ | |
175 ONE_MORE_BYTE (c); | |
176 /* Check if it conforms to XXX. If not, break the loop. */ | |
177 } | |
178 /* As the data is invalid for XXX, reset a proper bits. */ | |
179 *mask &= ~CODING_CATEGORY_XXX; | |
180 return 0; | |
181 no_more_source: | |
182 /* The source exausted. */ | |
183 if (!found) | |
184 /* ASCII characters only. */ | |
185 return 0; | |
186 /* Some data should be decoded into non-ASCII characters. */ | |
187 *mask &= CODING_CATEGORY_XXX; | |
188 return 1; | |
17052 | 189 } |
190 #endif | |
191 | |
192 /*** GENERAL NOTES on `decode_coding_XXX ()' functions *** | |
193 | |
88365 | 194 These functions decode a byte sequence specified as a source by |
195 CODING. The resulting multibyte text goes to a place pointed to by | |
196 CODING->charbuf, the length of which should not exceed | |
197 CODING->charbuf_size; | |
198 | |
199 These functions set the information of original and decoded texts in | |
200 CODING->consumed, CODING->consumed_char, and CODING->charbuf_used. | |
201 They also set CODING->result to one of CODING_RESULT_XXX indicating | |
202 how the decoding is finished. | |
203 | |
204 Below is the template of these functions. */ | |
205 | |
17052 | 206 #if 0 |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
207 static void |
88365 | 208 decode_coding_XXXX (coding) |
17052 | 209 struct coding_system *coding; |
210 { | |
88365 | 211 unsigned char *src = coding->source + coding->consumed; |
212 unsigned char *src_end = coding->source + coding->src_bytes; | |
213 /* SRC_BASE remembers the start position in source in each loop. | |
214 The loop will be exited when there's not enough source code, or | |
215 when there's no room in CHARBUF for a decoded character. */ | |
216 unsigned char *src_base; | |
217 /* A buffer to produce decoded characters. */ | |
218 int *charbuf = coding->charbuf; | |
219 int *charbuf_end = charbuf + coding->charbuf_size; | |
220 int multibytep = coding->src_multibyte; | |
221 | |
222 while (1) | |
223 { | |
224 src_base = src; | |
225 if (charbuf < charbuf_end) | |
226 /* No more room to produce a decoded character. */ | |
227 break; | |
228 ONE_MORE_BYTE (c); | |
229 /* Decode it. */ | |
230 } | |
231 | |
232 no_more_source: | |
233 if (src_base < src_end | |
234 && coding->mode & CODING_MODE_LAST_BLOCK) | |
235 /* If the source ends by partial bytes to construct a character, | |
236 treat them as eight-bit raw data. */ | |
237 while (src_base < src_end && charbuf < charbuf_end) | |
238 *charbuf++ = *src_base++; | |
239 /* Remember how many bytes and characters we consumed. If the | |
240 source is multibyte, the bytes and chars are not identical. */ | |
241 coding->consumed = coding->consumed_char = src_base - coding->source; | |
242 /* Remember how many characters we produced. */ | |
243 coding->charbuf_used = charbuf - coding->charbuf; | |
17052 | 244 } |
245 #endif | |
246 | |
247 /*** GENERAL NOTES on `encode_coding_XXX ()' functions *** | |
248 | |
88365 | 249 These functions encode SRC_BYTES length text at SOURCE of Emacs' |
250 internal multibyte format by CODING. The resulting byte sequence | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
251 goes to a place pointed to by DESTINATION, the length of which |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
252 should not exceed DST_BYTES. |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
253 |
88365 | 254 These functions set the information of original and encoded texts in |
255 the members produced, produced_char, consumed, and consumed_char of | |
256 the structure *CODING. They also set the member result to one of | |
257 CODING_RESULT_XXX indicating how the encoding finished. | |
258 | |
259 DST_BYTES zero means that source area and destination area are | |
260 overlapped, which means that we can produce a encoded text until it | |
261 reaches at the head of not-yet-encoded source text. | |
262 | |
263 Below is a template of these functions. */ | |
17052 | 264 #if 0 |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
265 static void |
88365 | 266 encode_coding_XXX (coding) |
17052 | 267 struct coding_system *coding; |
268 { | |
88365 | 269 int multibytep = coding->dst_multibyte; |
270 int *charbuf = coding->charbuf; | |
271 int *charbuf_end = charbuf->charbuf + coding->charbuf_used; | |
272 unsigned char *dst = coding->destination + coding->produced; | |
273 unsigned char *dst_end = coding->destination + coding->dst_bytes; | |
274 unsigned char *adjusted_dst_end = dst_end - _MAX_BYTES_PRODUCED_IN_LOOP_; | |
275 int produced_chars = 0; | |
276 | |
277 for (; charbuf < charbuf_end && dst < adjusted_dst_end; charbuf++) | |
278 { | |
279 int c = *charbuf; | |
280 /* Encode C into DST, and increment DST. */ | |
281 } | |
282 label_no_more_destination: | |
283 /* How many chars and bytes we produced. */ | |
284 coding->produced_char += produced_chars; | |
285 coding->produced = dst - coding->destination; | |
17052 | 286 } |
287 #endif | |
288 | |
289 | |
290 /*** 1. Preamble ***/ | |
291 | |
26088
b7aa6ac26872
Add support for large files, 64-bit Solaris, system locale codings.
Paul Eggert <eggert@twinsun.com>
parents:
26067
diff
changeset
|
292 #include <config.h> |
17052 | 293 #include <stdio.h> |
294 | |
295 #include "lisp.h" | |
296 #include "buffer.h" | |
88365 | 297 #include "character.h" |
17052 | 298 #include "charset.h" |
88365 | 299 #include "ccl.h" |
26847 | 300 #include "composite.h" |
17052 | 301 #include "coding.h" |
302 #include "window.h" | |
303 | |
88365 | 304 Lisp_Object Vcoding_system_hash_table; |
305 | |
306 Lisp_Object Qcoding_system, Qcoding_aliases, Qeol_type; | |
307 Lisp_Object Qunix, Qdos, Qmac; | |
17052 | 308 Lisp_Object Qbuffer_file_coding_system; |
309 Lisp_Object Qpost_read_conversion, Qpre_write_conversion; | |
88365 | 310 Lisp_Object Qdefault_char; |
19612
783efd6c7c1e
(Qno_conversion, Qundecided): New variables.
Kenichi Handa <handa@m17n.org>
parents:
19546
diff
changeset
|
311 Lisp_Object Qno_conversion, Qundecided; |
88365 | 312 Lisp_Object Qcharset, Qiso_2022, Qutf_8, Qutf_16, Qshift_jis, Qbig5; |
313 Lisp_Object Qutf_16_be_nosig, Qutf_16_be, Qutf_16_le_nosig, Qutf_16_le; | |
314 Lisp_Object Qsignature, Qendian, Qbig, Qlittle; | |
19750
95e4e1cba6ac
(Qcoding_system_history): New variable.
Richard M. Stallman <rms@gnu.org>
parents:
19747
diff
changeset
|
315 Lisp_Object Qcoding_system_history; |
22874
b133f07a76db
(Qvalid_codes): New variable.
Kenichi Handa <handa@m17n.org>
parents:
22812
diff
changeset
|
316 Lisp_Object Qvalid_codes; |
17052 | 317 |
318 extern Lisp_Object Qinsert_file_contents, Qwrite_region; | |
319 Lisp_Object Qcall_process, Qcall_process_region, Qprocess_argument; | |
320 Lisp_Object Qstart_process, Qopen_network_stream; | |
321 Lisp_Object Qtarget_idx; | |
322 | |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
323 Lisp_Object Vselect_safe_coding_system_function; |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
324 |
24200
b9d9fccad516
(syms_of_coding): eol-mnemonic-* variables are now
Eli Zaretskii <eliz@gnu.org>
parents:
24178
diff
changeset
|
325 /* Mnemonic string for each format of end-of-line. */ |
b9d9fccad516
(syms_of_coding): eol-mnemonic-* variables are now
Eli Zaretskii <eliz@gnu.org>
parents:
24178
diff
changeset
|
326 Lisp_Object eol_mnemonic_unix, eol_mnemonic_dos, eol_mnemonic_mac; |
b9d9fccad516
(syms_of_coding): eol-mnemonic-* variables are now
Eli Zaretskii <eliz@gnu.org>
parents:
24178
diff
changeset
|
327 /* Mnemonic string to indicate format of end-of-line is not yet |
17052 | 328 decided. */ |
24200
b9d9fccad516
(syms_of_coding): eol-mnemonic-* variables are now
Eli Zaretskii <eliz@gnu.org>
parents:
24178
diff
changeset
|
329 Lisp_Object eol_mnemonic_undecided; |
17052 | 330 |
331 #ifdef emacs | |
332 | |
20105
c017642863c2
(Qcoding_system_spec): Deleted.
Kenichi Handa <handa@m17n.org>
parents:
19824
diff
changeset
|
333 Lisp_Object Vcoding_system_list, Vcoding_system_alist; |
c017642863c2
(Qcoding_system_spec): Deleted.
Kenichi Handa <handa@m17n.org>
parents:
19824
diff
changeset
|
334 |
c017642863c2
(Qcoding_system_spec): Deleted.
Kenichi Handa <handa@m17n.org>
parents:
19824
diff
changeset
|
335 Lisp_Object Qcoding_system_p, Qcoding_system_error; |
17052 | 336 |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
337 /* Coding system emacs-mule and raw-text are for converting only |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
338 end-of-line format. */ |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
339 Lisp_Object Qemacs_mule, Qraw_text; |
18650
aa3f2820e2ac
(Qemacs_mule, inhibit_eol_conversion): New variables.
Kenichi Handa <handa@m17n.org>
parents:
18613
diff
changeset
|
340 |
17052 | 341 /* Coding-systems are handed between Emacs Lisp programs and C internal |
342 routines by the following three variables. */ | |
343 /* Coding-system for reading files and receiving data from process. */ | |
344 Lisp_Object Vcoding_system_for_read; | |
345 /* Coding-system for writing files and sending data to process. */ | |
346 Lisp_Object Vcoding_system_for_write; | |
347 /* Coding-system actually used in the latest I/O. */ | |
348 Lisp_Object Vlast_coding_system_used; | |
349 | |
19280
e755044718ee
(ENCODE_ISO_CHARACTER_DIMENSION1): Pay attention to
Kenichi Handa <handa@m17n.org>
parents:
19193
diff
changeset
|
350 /* A vector of length 256 which contains information about special |
22529 | 351 Latin codes (especially for dealing with Microsoft codes). */ |
19365
d9374f5ebd3a
(CODING_FLAG_ISO_LATIN_EXTRA): New macro.
Kenichi Handa <handa@m17n.org>
parents:
19285
diff
changeset
|
352 Lisp_Object Vlatin_extra_code_table; |
19280
e755044718ee
(ENCODE_ISO_CHARACTER_DIMENSION1): Pay attention to
Kenichi Handa <handa@m17n.org>
parents:
19193
diff
changeset
|
353 |
18650
aa3f2820e2ac
(Qemacs_mule, inhibit_eol_conversion): New variables.
Kenichi Handa <handa@m17n.org>
parents:
18613
diff
changeset
|
354 /* Flag to inhibit code conversion of end-of-line format. */ |
aa3f2820e2ac
(Qemacs_mule, inhibit_eol_conversion): New variables.
Kenichi Handa <handa@m17n.org>
parents:
18613
diff
changeset
|
355 int inhibit_eol_conversion; |
aa3f2820e2ac
(Qemacs_mule, inhibit_eol_conversion): New variables.
Kenichi Handa <handa@m17n.org>
parents:
18613
diff
changeset
|
356 |
30204
35aec8514228
(inhibit_iso_escape_detection): New variable.
Kenichi Handa <handa@m17n.org>
parents:
29985
diff
changeset
|
357 /* Flag to inhibit ISO2022 escape sequence detection. */ |
35aec8514228
(inhibit_iso_escape_detection): New variable.
Kenichi Handa <handa@m17n.org>
parents:
29985
diff
changeset
|
358 int inhibit_iso_escape_detection; |
35aec8514228
(inhibit_iso_escape_detection): New variable.
Kenichi Handa <handa@m17n.org>
parents:
29985
diff
changeset
|
359 |
21574
30394e3ae7f8
(syms_of_coding): Declare and define inherit-process-coding-system.
Eli Zaretskii <eliz@gnu.org>
parents:
21520
diff
changeset
|
360 /* Flag to make buffer-file-coding-system inherit from process-coding. */ |
30394e3ae7f8
(syms_of_coding): Declare and define inherit-process-coding-system.
Eli Zaretskii <eliz@gnu.org>
parents:
21520
diff
changeset
|
361 int inherit_process_coding_system; |
30394e3ae7f8
(syms_of_coding): Declare and define inherit-process-coding-system.
Eli Zaretskii <eliz@gnu.org>
parents:
21520
diff
changeset
|
362 |
19280
e755044718ee
(ENCODE_ISO_CHARACTER_DIMENSION1): Pay attention to
Kenichi Handa <handa@m17n.org>
parents:
19193
diff
changeset
|
363 /* Coding system to be used to encode text for terminal display. */ |
17052 | 364 struct coding_system terminal_coding; |
365 | |
19280
e755044718ee
(ENCODE_ISO_CHARACTER_DIMENSION1): Pay attention to
Kenichi Handa <handa@m17n.org>
parents:
19193
diff
changeset
|
366 /* Coding system to be used to encode text for terminal display when |
e755044718ee
(ENCODE_ISO_CHARACTER_DIMENSION1): Pay attention to
Kenichi Handa <handa@m17n.org>
parents:
19193
diff
changeset
|
367 terminal coding system is nil. */ |
e755044718ee
(ENCODE_ISO_CHARACTER_DIMENSION1): Pay attention to
Kenichi Handa <handa@m17n.org>
parents:
19193
diff
changeset
|
368 struct coding_system safe_terminal_coding; |
e755044718ee
(ENCODE_ISO_CHARACTER_DIMENSION1): Pay attention to
Kenichi Handa <handa@m17n.org>
parents:
19193
diff
changeset
|
369 |
e755044718ee
(ENCODE_ISO_CHARACTER_DIMENSION1): Pay attention to
Kenichi Handa <handa@m17n.org>
parents:
19193
diff
changeset
|
370 /* Coding system of what is sent from terminal keyboard. */ |
17052 | 371 struct coding_system keyboard_coding; |
372 | |
18180
5f4c4da24e75
(Vcoding_system_alist): Deleted.
Kenichi Handa <handa@m17n.org>
parents:
18002
diff
changeset
|
373 Lisp_Object Vfile_coding_system_alist; |
5f4c4da24e75
(Vcoding_system_alist): Deleted.
Kenichi Handa <handa@m17n.org>
parents:
18002
diff
changeset
|
374 Lisp_Object Vprocess_coding_system_alist; |
5f4c4da24e75
(Vcoding_system_alist): Deleted.
Kenichi Handa <handa@m17n.org>
parents:
18002
diff
changeset
|
375 Lisp_Object Vnetwork_coding_system_alist; |
17052 | 376 |
26088
b7aa6ac26872
Add support for large files, 64-bit Solaris, system locale codings.
Paul Eggert <eggert@twinsun.com>
parents:
26067
diff
changeset
|
377 Lisp_Object Vlocale_coding_system; |
b7aa6ac26872
Add support for large files, 64-bit Solaris, system locale codings.
Paul Eggert <eggert@twinsun.com>
parents:
26067
diff
changeset
|
378 |
17052 | 379 #endif /* emacs */ |
380 | |
22186
fc4aaf1b1772
Change term "character translation table" to "translation table".
Kenichi Handa <handa@m17n.org>
parents:
22166
diff
changeset
|
381 /* Flag to tell if we look up translation table on character code |
fc4aaf1b1772
Change term "character translation table" to "translation table".
Kenichi Handa <handa@m17n.org>
parents:
22166
diff
changeset
|
382 conversion. */ |
22119
592bb8b9bcfd
Change terms unify/unification to
Kenichi Handa <handa@m17n.org>
parents:
22020
diff
changeset
|
383 Lisp_Object Venable_character_translation; |
22186
fc4aaf1b1772
Change term "character translation table" to "translation table".
Kenichi Handa <handa@m17n.org>
parents:
22166
diff
changeset
|
384 /* Standard translation table to look up on decoding (reading). */ |
fc4aaf1b1772
Change term "character translation table" to "translation table".
Kenichi Handa <handa@m17n.org>
parents:
22166
diff
changeset
|
385 Lisp_Object Vstandard_translation_table_for_decode; |
fc4aaf1b1772
Change term "character translation table" to "translation table".
Kenichi Handa <handa@m17n.org>
parents:
22166
diff
changeset
|
386 /* Standard translation table to look up on encoding (writing). */ |
fc4aaf1b1772
Change term "character translation table" to "translation table".
Kenichi Handa <handa@m17n.org>
parents:
22166
diff
changeset
|
387 Lisp_Object Vstandard_translation_table_for_encode; |
fc4aaf1b1772
Change term "character translation table" to "translation table".
Kenichi Handa <handa@m17n.org>
parents:
22166
diff
changeset
|
388 |
fc4aaf1b1772
Change term "character translation table" to "translation table".
Kenichi Handa <handa@m17n.org>
parents:
22166
diff
changeset
|
389 Lisp_Object Qtranslation_table; |
fc4aaf1b1772
Change term "character translation table" to "translation table".
Kenichi Handa <handa@m17n.org>
parents:
22166
diff
changeset
|
390 Lisp_Object Qtranslation_table_id; |
fc4aaf1b1772
Change term "character translation table" to "translation table".
Kenichi Handa <handa@m17n.org>
parents:
22166
diff
changeset
|
391 Lisp_Object Qtranslation_table_for_decode; |
fc4aaf1b1772
Change term "character translation table" to "translation table".
Kenichi Handa <handa@m17n.org>
parents:
22166
diff
changeset
|
392 Lisp_Object Qtranslation_table_for_encode; |
17052 | 393 |
394 /* Alist of charsets vs revision number. */ | |
88365 | 395 static Lisp_Object Vcharset_revision_table; |
17052 | 396 |
18180
5f4c4da24e75
(Vcoding_system_alist): Deleted.
Kenichi Handa <handa@m17n.org>
parents:
18002
diff
changeset
|
397 /* Default coding systems used for process I/O. */ |
5f4c4da24e75
(Vcoding_system_alist): Deleted.
Kenichi Handa <handa@m17n.org>
parents:
18002
diff
changeset
|
398 Lisp_Object Vdefault_process_coding_system; |
5f4c4da24e75
(Vcoding_system_alist): Deleted.
Kenichi Handa <handa@m17n.org>
parents:
18002
diff
changeset
|
399 |
26067
f54ca66e2571
(code_convert_string): Add record_unwind_protect to
Kenichi Handa <handa@m17n.org>
parents:
25860
diff
changeset
|
400 /* Global flag to tell that we can't call post-read-conversion and |
f54ca66e2571
(code_convert_string): Add record_unwind_protect to
Kenichi Handa <handa@m17n.org>
parents:
25860
diff
changeset
|
401 pre-write-conversion functions. Usually the value is zero, but it |
f54ca66e2571
(code_convert_string): Add record_unwind_protect to
Kenichi Handa <handa@m17n.org>
parents:
25860
diff
changeset
|
402 is set to 1 temporarily while such functions are running. This is |
f54ca66e2571
(code_convert_string): Add record_unwind_protect to
Kenichi Handa <handa@m17n.org>
parents:
25860
diff
changeset
|
403 to avoid infinite recursive call. */ |
f54ca66e2571
(code_convert_string): Add record_unwind_protect to
Kenichi Handa <handa@m17n.org>
parents:
25860
diff
changeset
|
404 static int inhibit_pre_post_conversion; |
f54ca66e2571
(code_convert_string): Add record_unwind_protect to
Kenichi Handa <handa@m17n.org>
parents:
25860
diff
changeset
|
405 |
30487
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
406 /* Char-table containing safe coding systems of each character. */ |
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
407 Lisp_Object Vchar_coding_system_table; |
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
408 Lisp_Object Qchar_coding_system; |
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
409 |
88365 | 410 /* Two special coding systems. */ |
411 Lisp_Object Vsjis_coding_system; | |
412 Lisp_Object Vbig5_coding_system; | |
413 | |
414 | |
415 static int detect_coding_utf_8 P_ ((struct coding_system *, int *)); | |
416 static void decode_coding_utf_8 P_ ((struct coding_system *)); | |
417 static int encode_coding_utf_8 P_ ((struct coding_system *)); | |
418 | |
419 static int detect_coding_utf_16 P_ ((struct coding_system *, int *)); | |
420 static void decode_coding_utf_16 P_ ((struct coding_system *)); | |
421 static int encode_coding_utf_16 P_ ((struct coding_system *)); | |
422 | |
423 static int detect_coding_iso_2022 P_ ((struct coding_system *, int *)); | |
424 static void decode_coding_iso_2022 P_ ((struct coding_system *)); | |
425 static int encode_coding_iso_2022 P_ ((struct coding_system *)); | |
426 | |
427 static int detect_coding_emacs_mule P_ ((struct coding_system *, int *)); | |
428 static void decode_coding_emacs_mule P_ ((struct coding_system *)); | |
429 static int encode_coding_emacs_mule P_ ((struct coding_system *)); | |
430 | |
431 static int detect_coding_sjis P_ ((struct coding_system *, int *)); | |
432 static void decode_coding_sjis P_ ((struct coding_system *)); | |
433 static int encode_coding_sjis P_ ((struct coding_system *)); | |
434 | |
435 static int detect_coding_big5 P_ ((struct coding_system *, int *)); | |
436 static void decode_coding_big5 P_ ((struct coding_system *)); | |
437 static int encode_coding_big5 P_ ((struct coding_system *)); | |
438 | |
439 static int detect_coding_ccl P_ ((struct coding_system *, int *)); | |
440 static void decode_coding_ccl P_ ((struct coding_system *)); | |
441 static int encode_coding_ccl P_ ((struct coding_system *)); | |
442 | |
443 static void decode_coding_raw_text P_ ((struct coding_system *)); | |
444 static int encode_coding_raw_text P_ ((struct coding_system *)); | |
445 | |
446 | |
447 /* ISO2022 section */ | |
448 | |
449 #define CODING_ISO_INITIAL(coding, reg) \ | |
450 (XINT (AREF (AREF (CODING_ID_ATTRS ((coding)->id), \ | |
451 coding_attr_iso_initial), \ | |
452 reg))) | |
453 | |
454 | |
455 #define CODING_ISO_REQUEST(coding, charset_id) \ | |
456 ((charset_id <= (coding)->max_charset_id \ | |
457 ? (coding)->safe_charsets[charset_id] \ | |
458 : -1)) | |
459 | |
460 | |
461 #define CODING_ISO_FLAGS(coding) \ | |
462 ((coding)->spec.iso_2022.flags) | |
463 #define CODING_ISO_DESIGNATION(coding, reg) \ | |
464 ((coding)->spec.iso_2022.current_designation[reg]) | |
465 #define CODING_ISO_INVOCATION(coding, plane) \ | |
466 ((coding)->spec.iso_2022.current_invocation[plane]) | |
467 #define CODING_ISO_SINGLE_SHIFTING(coding) \ | |
468 ((coding)->spec.iso_2022.single_shifting) | |
469 #define CODING_ISO_BOL(coding) \ | |
470 ((coding)->spec.iso_2022.bol) | |
471 #define CODING_ISO_INVOKED_CHARSET(coding, plane) \ | |
472 CODING_ISO_DESIGNATION ((coding), CODING_ISO_INVOCATION ((coding), (plane))) | |
473 | |
474 /* Control characters of ISO2022. */ | |
475 /* code */ /* function */ | |
476 #define ISO_CODE_LF 0x0A /* line-feed */ | |
477 #define ISO_CODE_CR 0x0D /* carriage-return */ | |
478 #define ISO_CODE_SO 0x0E /* shift-out */ | |
479 #define ISO_CODE_SI 0x0F /* shift-in */ | |
480 #define ISO_CODE_SS2_7 0x19 /* single-shift-2 for 7-bit code */ | |
481 #define ISO_CODE_ESC 0x1B /* escape */ | |
482 #define ISO_CODE_SS2 0x8E /* single-shift-2 */ | |
483 #define ISO_CODE_SS3 0x8F /* single-shift-3 */ | |
484 #define ISO_CODE_CSI 0x9B /* control-sequence-introducer */ | |
485 | |
486 /* All code (1-byte) of ISO2022 is classified into one of the | |
487 followings. */ | |
488 enum iso_code_class_type | |
489 { | |
490 ISO_control_0, /* Control codes in the range | |
491 0x00..0x1F and 0x7F, except for the | |
492 following 5 codes. */ | |
493 ISO_carriage_return, /* ISO_CODE_CR (0x0D) */ | |
494 ISO_shift_out, /* ISO_CODE_SO (0x0E) */ | |
495 ISO_shift_in, /* ISO_CODE_SI (0x0F) */ | |
496 ISO_single_shift_2_7, /* ISO_CODE_SS2_7 (0x19) */ | |
497 ISO_escape, /* ISO_CODE_SO (0x1B) */ | |
498 ISO_control_1, /* Control codes in the range | |
499 0x80..0x9F, except for the | |
500 following 3 codes. */ | |
501 ISO_single_shift_2, /* ISO_CODE_SS2 (0x8E) */ | |
502 ISO_single_shift_3, /* ISO_CODE_SS3 (0x8F) */ | |
503 ISO_control_sequence_introducer, /* ISO_CODE_CSI (0x9B) */ | |
504 ISO_0x20_or_0x7F, /* Codes of the values 0x20 or 0x7F. */ | |
505 ISO_graphic_plane_0, /* Graphic codes in the range 0x21..0x7E. */ | |
506 ISO_0xA0_or_0xFF, /* Codes of the values 0xA0 or 0xFF. */ | |
507 ISO_graphic_plane_1 /* Graphic codes in the range 0xA1..0xFE. */ | |
508 }; | |
509 | |
510 /** The macros CODING_ISO_FLAG_XXX defines a flag bit of the | |
511 `iso-flags' attribute of an iso2022 coding system. */ | |
512 | |
513 /* If set, produce long-form designation sequence (e.g. ESC $ ( A) | |
514 instead of the correct short-form sequence (e.g. ESC $ A). */ | |
515 #define CODING_ISO_FLAG_LONG_FORM 0x0001 | |
516 | |
517 /* If set, reset graphic planes and registers at end-of-line to the | |
518 initial state. */ | |
519 #define CODING_ISO_FLAG_RESET_AT_EOL 0x0002 | |
520 | |
521 /* If set, reset graphic planes and registers before any control | |
522 characters to the initial state. */ | |
523 #define CODING_ISO_FLAG_RESET_AT_CNTL 0x0004 | |
524 | |
525 /* If set, encode by 7-bit environment. */ | |
526 #define CODING_ISO_FLAG_SEVEN_BITS 0x0008 | |
527 | |
528 /* If set, use locking-shift function. */ | |
529 #define CODING_ISO_FLAG_LOCKING_SHIFT 0x0010 | |
530 | |
531 /* If set, use single-shift function. Overwrite | |
532 CODING_ISO_FLAG_LOCKING_SHIFT. */ | |
533 #define CODING_ISO_FLAG_SINGLE_SHIFT 0x0020 | |
534 | |
535 /* If set, use designation escape sequence. */ | |
536 #define CODING_ISO_FLAG_DESIGNATION 0x0040 | |
537 | |
538 /* If set, produce revision number sequence. */ | |
539 #define CODING_ISO_FLAG_REVISION 0x0080 | |
540 | |
541 /* If set, produce ISO6429's direction specifying sequence. */ | |
542 #define CODING_ISO_FLAG_DIRECTION 0x0100 | |
543 | |
544 /* If set, assume designation states are reset at beginning of line on | |
545 output. */ | |
546 #define CODING_ISO_FLAG_INIT_AT_BOL 0x0200 | |
547 | |
548 /* If set, designation sequence should be placed at beginning of line | |
549 on output. */ | |
550 #define CODING_ISO_FLAG_DESIGNATE_AT_BOL 0x0400 | |
551 | |
552 /* If set, do not encode unsafe charactes on output. */ | |
553 #define CODING_ISO_FLAG_SAFE 0x0800 | |
554 | |
555 /* If set, extra latin codes (128..159) are accepted as a valid code | |
556 on input. */ | |
557 #define CODING_ISO_FLAG_LATIN_EXTRA 0x1000 | |
558 | |
559 #define CODING_ISO_FLAG_COMPOSITION 0x2000 | |
560 | |
561 #define CODING_ISO_FLAG_EUC_TW_SHIFT 0x4000 | |
562 | |
563 #define CODING_ISO_FLAG_FULL_SUPPORT 0x8000 | |
564 | |
565 /* A character to be produced on output if encoding of the original | |
566 character is prohibited by CODING_ISO_FLAG_SAFE. */ | |
567 #define CODING_INHIBIT_CHARACTER_SUBSTITUTION '?' | |
568 | |
569 | |
570 /* UTF-16 section */ | |
571 #define CODING_UTF_16_BOM(coding) \ | |
572 ((coding)->spec.utf_16.bom) | |
573 | |
574 #define CODING_UTF_16_ENDIAN(coding) \ | |
575 ((coding)->spec.utf_16.endian) | |
576 | |
577 #define CODING_UTF_16_SURROGATE(coding) \ | |
578 ((coding)->spec.utf_16.surrogate) | |
579 | |
580 | |
581 /* CCL section */ | |
582 #define CODING_CCL_DECODER(coding) \ | |
583 AREF (CODING_ID_ATTRS ((coding)->id), coding_attr_ccl_decoder) | |
584 #define CODING_CCL_ENCODER(coding) \ | |
585 AREF (CODING_ID_ATTRS ((coding)->id), coding_attr_ccl_encoder) | |
586 #define CODING_CCL_VALIDS(coding) \ | |
587 (XSTRING (AREF (CODING_ID_ATTRS ((coding)->id), coding_attr_ccl_valids)) \ | |
588 ->data) | |
589 | |
590 /* Index for each coding category in `coding_category_table' */ | |
591 | |
592 enum coding_category | |
593 { | |
594 coding_category_iso_7, | |
595 coding_category_iso_7_tight, | |
596 coding_category_iso_8_1, | |
597 coding_category_iso_8_2, | |
598 coding_category_iso_7_else, | |
599 coding_category_iso_8_else, | |
600 coding_category_utf_8, | |
601 coding_category_utf_16_auto, | |
602 coding_category_utf_16_be, | |
603 coding_category_utf_16_le, | |
604 coding_category_utf_16_be_nosig, | |
605 coding_category_utf_16_le_nosig, | |
606 coding_category_charset, | |
607 coding_category_sjis, | |
608 coding_category_big5, | |
609 coding_category_ccl, | |
610 coding_category_emacs_mule, | |
611 /* All above are targets of code detection. */ | |
612 coding_category_raw_text, | |
613 coding_category_undecided, | |
614 coding_category_max | |
615 }; | |
616 | |
617 /* Definitions of flag bits used in detect_coding_XXXX. */ | |
618 #define CATEGORY_MASK_ISO_7 (1 << coding_category_iso_7) | |
619 #define CATEGORY_MASK_ISO_7_TIGHT (1 << coding_category_iso_7_tight) | |
620 #define CATEGORY_MASK_ISO_8_1 (1 << coding_category_iso_8_1) | |
621 #define CATEGORY_MASK_ISO_8_2 (1 << coding_category_iso_8_2) | |
622 #define CATEGORY_MASK_ISO_7_ELSE (1 << coding_category_iso_7_else) | |
623 #define CATEGORY_MASK_ISO_8_ELSE (1 << coding_category_iso_8_else) | |
624 #define CATEGORY_MASK_UTF_8 (1 << coding_category_utf_8) | |
625 #define CATEGORY_MASK_UTF_16_BE (1 << coding_category_utf_16_be) | |
626 #define CATEGORY_MASK_UTF_16_LE (1 << coding_category_utf_16_le) | |
627 #define CATEGORY_MASK_UTF_16_BE_NOSIG (1 << coding_category_utf_16_be_nosig) | |
628 #define CATEGORY_MASK_UTF_16_LE_NOSIG (1 << coding_category_utf_16_le_nosig) | |
629 #define CATEGORY_MASK_CHARSET (1 << coding_category_charset) | |
630 #define CATEGORY_MASK_SJIS (1 << coding_category_sjis) | |
631 #define CATEGORY_MASK_BIG5 (1 << coding_category_big5) | |
632 #define CATEGORY_MASK_CCL (1 << coding_category_ccl) | |
633 #define CATEGORY_MASK_EMACS_MULE (1 << coding_category_emacs_mule) | |
634 | |
635 /* This value is returned if detect_coding_mask () find nothing other | |
636 than ASCII characters. */ | |
637 #define CATEGORY_MASK_ANY \ | |
638 (CATEGORY_MASK_ISO_7 \ | |
639 | CATEGORY_MASK_ISO_7_TIGHT \ | |
640 | CATEGORY_MASK_ISO_8_1 \ | |
641 | CATEGORY_MASK_ISO_8_2 \ | |
642 | CATEGORY_MASK_ISO_7_ELSE \ | |
643 | CATEGORY_MASK_ISO_8_ELSE \ | |
644 | CATEGORY_MASK_UTF_8 \ | |
645 | CATEGORY_MASK_UTF_16_BE \ | |
646 | CATEGORY_MASK_UTF_16_LE \ | |
647 | CATEGORY_MASK_UTF_16_BE_NOSIG \ | |
648 | CATEGORY_MASK_UTF_16_LE_NOSIG \ | |
649 | CATEGORY_MASK_CHARSET \ | |
650 | CATEGORY_MASK_SJIS \ | |
651 | CATEGORY_MASK_BIG5 \ | |
652 | CATEGORY_MASK_CCL \ | |
653 | CATEGORY_MASK_EMACS_MULE) | |
654 | |
655 | |
656 #define CATEGORY_MASK_ISO_7BIT \ | |
657 (CATEGORY_MASK_ISO_7 | CATEGORY_MASK_ISO_7_TIGHT) | |
658 | |
659 #define CATEGORY_MASK_ISO_8BIT \ | |
660 (CATEGORY_MASK_ISO_8_1 | CATEGORY_MASK_ISO_8_2) | |
661 | |
662 #define CATEGORY_MASK_ISO_ELSE \ | |
663 (CATEGORY_MASK_ISO_7_ELSE | CATEGORY_MASK_ISO_8_ELSE) | |
664 | |
665 #define CATEGORY_MASK_ISO_ESCAPE \ | |
666 (CATEGORY_MASK_ISO_7 \ | |
667 | CATEGORY_MASK_ISO_7_TIGHT \ | |
668 | CATEGORY_MASK_ISO_7_ELSE \ | |
669 | CATEGORY_MASK_ISO_8_ELSE) | |
670 | |
671 #define CATEGORY_MASK_ISO \ | |
672 ( CATEGORY_MASK_ISO_7BIT \ | |
673 | CATEGORY_MASK_ISO_8BIT \ | |
674 | CATEGORY_MASK_ISO_ELSE) | |
675 | |
676 #define CATEGORY_MASK_UTF_16 \ | |
677 (CATEGORY_MASK_UTF_16_BE \ | |
678 | CATEGORY_MASK_UTF_16_LE \ | |
679 | CATEGORY_MASK_UTF_16_BE_NOSIG \ | |
680 | CATEGORY_MASK_UTF_16_LE_NOSIG) | |
681 | |
682 | |
683 /* List of symbols `coding-category-xxx' ordered by priority. This | |
684 variable is exposed to Emacs Lisp. */ | |
685 static Lisp_Object Vcoding_category_list; | |
686 | |
687 /* Table of coding categories (Lisp symbols). This variable is for | |
688 internal use oly. */ | |
689 static Lisp_Object Vcoding_category_table; | |
690 | |
691 /* Table of coding-categories ordered by priority. */ | |
692 static enum coding_category coding_priorities[coding_category_max]; | |
693 | |
694 /* Nth element is a coding context for the coding system bound to the | |
695 Nth coding category. */ | |
696 static struct coding_system coding_categories[coding_category_max]; | |
697 | |
698 static int detected_mask[coding_category_raw_text] = | |
699 { CATEGORY_MASK_ISO, | |
700 CATEGORY_MASK_ISO, | |
701 CATEGORY_MASK_ISO, | |
702 CATEGORY_MASK_ISO, | |
703 CATEGORY_MASK_ISO, | |
704 CATEGORY_MASK_ISO, | |
705 CATEGORY_MASK_UTF_8, | |
706 CATEGORY_MASK_UTF_16, | |
707 CATEGORY_MASK_UTF_16, | |
708 CATEGORY_MASK_UTF_16, | |
709 CATEGORY_MASK_UTF_16, | |
710 CATEGORY_MASK_UTF_16, | |
711 CATEGORY_MASK_CHARSET, | |
712 CATEGORY_MASK_SJIS, | |
713 CATEGORY_MASK_BIG5, | |
714 CATEGORY_MASK_CCL, | |
715 CATEGORY_MASK_EMACS_MULE | |
716 }; | |
717 | |
718 /*** Commonly used macros and functions ***/ | |
719 | |
720 #ifndef min | |
721 #define min(a, b) ((a) < (b) ? (a) : (b)) | |
722 #endif | |
723 #ifndef max | |
724 #define max(a, b) ((a) > (b) ? (a) : (b)) | |
725 #endif | |
726 | |
727 #define CODING_GET_INFO(coding, attrs, eol_type, charset_list) \ | |
728 do { \ | |
729 attrs = CODING_ID_ATTRS (coding->id); \ | |
730 eol_type = CODING_ID_EOL_TYPE (coding->id); \ | |
731 if (VECTORP (eol_type)) \ | |
732 eol_type = Qunix; \ | |
733 charset_list = CODING_ATTR_CHARSET_LIST (attrs); \ | |
734 } while (0) | |
735 | |
736 | |
737 /* Safely get one byte from the source text pointed by SRC which ends | |
738 at SRC_END, and set C to that byte. If there are not enough bytes | |
739 in the source, it jumps to `no_more_source'. The caller | |
740 should declare and set these variables appropriately in advance: | |
741 src, src_end, multibytep | |
742 */ | |
743 | |
744 #define ONE_MORE_BYTE(c) \ | |
745 do { \ | |
746 if (src == src_end) \ | |
747 { \ | |
748 if (src_base < src) \ | |
749 coding->result = CODING_RESULT_INSUFFICIENT_SRC; \ | |
750 goto no_more_source; \ | |
751 } \ | |
752 c = *src++; \ | |
753 if (multibytep && (c & 0x80)) \ | |
754 { \ | |
755 if ((c & 0xFE) != 0xC0) \ | |
756 error ("Undecodable char found"); \ | |
757 c = ((c & 1) << 6) | *src++; \ | |
758 } \ | |
759 consumed_chars++; \ | |
760 } while (0) | |
761 | |
762 | |
763 #define ONE_MORE_BYTE_NO_CHECK(c) \ | |
764 do { \ | |
765 c = *src++; \ | |
766 if (multibytep && (c & 0x80)) \ | |
767 { \ | |
768 if ((c & 0xFE) != 0xC0) \ | |
769 error ("Undecodable char found"); \ | |
770 c = ((c & 1) << 6) | *src++; \ | |
771 } \ | |
772 } while (0) | |
773 | |
774 | |
775 /* Store a byte C in the place pointed by DST and increment DST to the | |
776 next free point, and increment PRODUCED_CHARS. The caller should | |
777 assure that C is 0..127, and declare and set the variable `dst' | |
778 appropriately in advance. | |
779 */ | |
780 | |
781 | |
782 #define EMIT_ONE_ASCII_BYTE(c) \ | |
783 do { \ | |
784 produced_chars++; \ | |
785 *dst++ = (c); \ | |
786 } while (0) | |
787 | |
788 | |
789 /* Like EMIT_ONE_ASCII_BYTE byt store two bytes; C1 and C2. */ | |
790 | |
791 #define EMIT_TWO_ASCII_BYTES(c1, c2) \ | |
792 do { \ | |
793 produced_chars += 2; \ | |
794 *dst++ = (c1), *dst++ = (c2); \ | |
795 } while (0) | |
796 | |
797 | |
798 /* Store a byte C in the place pointed by DST and increment DST to the | |
799 next free point, and increment PRODUCED_CHARS. If MULTIBYTEP is | |
800 nonzero, store in an appropriate multibyte from. The caller should | |
801 declare and set the variables `dst' and `multibytep' appropriately | |
802 in advance. */ | |
803 | |
804 #define EMIT_ONE_BYTE(c) \ | |
805 do { \ | |
806 produced_chars++; \ | |
807 if (multibytep) \ | |
808 { \ | |
809 int ch = (c); \ | |
810 if (ch >= 0x80) \ | |
811 ch = BYTE8_TO_CHAR (ch); \ | |
812 CHAR_STRING_ADVANCE (ch, dst); \ | |
813 } \ | |
814 else \ | |
815 *dst++ = (c); \ | |
816 } while (0) | |
817 | |
818 | |
819 /* Like EMIT_ONE_BYTE, but emit two bytes; C1 and C2. */ | |
820 | |
821 #define EMIT_TWO_BYTES(c1, c2) \ | |
822 do { \ | |
823 produced_chars += 2; \ | |
824 if (multibytep) \ | |
825 { \ | |
826 CHAR_STRING_ADVANCE ((int) (c1), dst); \ | |
827 CHAR_STRING_ADVANCE ((int) (c2), dst); \ | |
828 } \ | |
829 else \ | |
830 { \ | |
831 *dst++ = (c1); \ | |
832 *dst++ = (c2); \ | |
833 } \ | |
834 } while (0) | |
835 | |
836 | |
837 #define EMIT_THREE_BYTES(c1, c2, c3) \ | |
838 do { \ | |
839 EMIT_ONE_BYTE (c1); \ | |
840 EMIT_TWO_BYTES (c2, c3); \ | |
841 } while (0) | |
842 | |
843 | |
844 #define EMIT_FOUR_BYTES(c1, c2, c3, c4) \ | |
845 do { \ | |
846 EMIT_TWO_BYTES (c1, c2); \ | |
847 EMIT_TWO_BYTES (c3, c4); \ | |
848 } while (0) | |
849 | |
850 | |
851 #define CODING_DECODE_CHAR(coding, src, src_base, src_end, charset, code, c) \ | |
852 do { \ | |
853 charset_map_loaded = 0; \ | |
854 c = DECODE_CHAR (charset, code); \ | |
855 if (charset_map_loaded) \ | |
856 { \ | |
857 unsigned char *orig = coding->source; \ | |
858 EMACS_INT offset; \ | |
859 \ | |
860 coding_set_source (coding); \ | |
861 offset = coding->source - orig; \ | |
862 src += offset; \ | |
863 src_base += offset; \ | |
864 src_end += offset; \ | |
865 } \ | |
866 } while (0) | |
867 | |
868 | |
869 #define ASSURE_DESTINATION(bytes) \ | |
870 do { \ | |
871 if (dst + (bytes) >= dst_end) \ | |
872 { \ | |
873 int more_bytes = charbuf_end - charbuf + (bytes); \ | |
874 \ | |
875 dst = alloc_destination (coding, more_bytes, dst); \ | |
876 dst_end = coding->destination + coding->dst_bytes; \ | |
877 } \ | |
878 } while (0) | |
879 | |
880 | |
881 | |
882 static void | |
883 coding_set_source (coding) | |
884 struct coding_system *coding; | |
885 { | |
886 if (BUFFERP (coding->src_object)) | |
887 { | |
888 if (coding->src_pos < 0) | |
889 coding->source = GAP_END_ADDR + coding->src_pos_byte; | |
890 else | |
891 { | |
892 if (coding->src_pos < GPT | |
893 && coding->src_pos + coding->src_chars >= GPT) | |
894 move_gap_both (coding->src_pos, coding->src_pos_byte); | |
895 coding->source = BYTE_POS_ADDR (coding->src_pos_byte); | |
896 } | |
897 } | |
898 else if (STRINGP (coding->src_object)) | |
899 { | |
900 coding->source = (XSTRING (coding->src_object)->data | |
901 + coding->src_pos_byte); | |
902 } | |
903 else | |
904 /* Otherwise, the source is C string and is never relocated | |
905 automatically. Thus we don't have to update anything. */ | |
906 ; | |
907 } | |
908 | |
909 static void | |
910 coding_set_destination (coding) | |
30487
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
911 struct coding_system *coding; |
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
912 { |
88365 | 913 if (BUFFERP (coding->dst_object)) |
914 { | |
915 /* We are sure that coding->dst_pos_byte is before the gap of the | |
916 buffer. */ | |
917 coding->destination = (BUF_BEG_ADDR (XBUFFER (coding->dst_object)) | |
918 + coding->dst_pos_byte - 1); | |
919 if (coding->src_pos < 0) | |
920 /* The source and destination is in the same buffer. */ | |
921 coding->dst_bytes = (GAP_END_ADDR | |
922 - (coding->src_bytes - coding->consumed) | |
923 - coding->destination); | |
924 else | |
925 coding->dst_bytes = (BUF_GAP_END_ADDR (XBUFFER (coding->dst_object)) | |
926 - coding->destination); | |
927 } | |
928 else | |
929 /* Otherwise, the destination is C string and is never relocated | |
930 automatically. Thus we don't have to update anything. */ | |
931 ; | |
932 } | |
933 | |
934 | |
935 static void | |
936 coding_alloc_by_realloc (coding, bytes) | |
937 struct coding_system *coding; | |
938 EMACS_INT bytes; | |
939 { | |
940 coding->destination = (unsigned char *) xrealloc (coding->destination, | |
941 coding->dst_bytes + bytes); | |
942 coding->dst_bytes += bytes; | |
30487
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
943 } |
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
944 |
88365 | 945 static void |
946 coding_alloc_by_making_gap (coding, bytes) | |
947 struct coding_system *coding; | |
948 EMACS_INT bytes; | |
949 { | |
950 Lisp_Object this_buffer; | |
951 | |
952 this_buffer = Fcurrent_buffer (); | |
953 if (EQ (this_buffer, coding->dst_object)) | |
954 { | |
955 EMACS_INT add = coding->src_bytes - coding->consumed; | |
956 | |
957 GAP_SIZE -= add; ZV += add; Z += add; ZV_BYTE += add; Z_BYTE += add; | |
958 make_gap (bytes); | |
959 GAP_SIZE += add; ZV -= add; Z -= add; ZV_BYTE -= add; Z_BYTE -= add; | |
960 } | |
961 else | |
962 { | |
963 set_buffer_internal (XBUFFER (coding->dst_object)); | |
964 make_gap (bytes); | |
965 set_buffer_internal (XBUFFER (this_buffer)); | |
966 } | |
967 } | |
968 | |
969 | |
970 static unsigned char * | |
971 alloc_destination (coding, nbytes, dst) | |
972 struct coding_system *coding; | |
973 int nbytes; | |
974 unsigned char *dst; | |
975 { | |
976 EMACS_INT offset = dst - coding->destination; | |
977 | |
978 if (BUFFERP (coding->dst_object)) | |
979 coding_alloc_by_making_gap (coding, nbytes); | |
980 else | |
981 coding_alloc_by_realloc (coding, nbytes); | |
982 coding->result = CODING_RESULT_SUCCESS; | |
983 coding_set_destination (coding); | |
984 dst = coding->destination + offset; | |
985 return dst; | |
986 } | |
987 | |
988 | |
989 /*** 2. Emacs' internal format (emacs-utf-8) ***/ | |
990 | |
991 | |
30487
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
992 |
17052 | 993 |
88365 | 994 /*** 3. UTF-8 ***/ |
995 | |
996 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions". | |
997 Check if a text is encoded in UTF-8. If it is, return | |
998 CATEGORY_MASK_UTF_8, else return 0. */ | |
999 | |
1000 #define UTF_8_1_OCTET_P(c) ((c) < 0x80) | |
1001 #define UTF_8_EXTRA_OCTET_P(c) (((c) & 0xC0) == 0x80) | |
1002 #define UTF_8_2_OCTET_LEADING_P(c) (((c) & 0xE0) == 0xC0) | |
1003 #define UTF_8_3_OCTET_LEADING_P(c) (((c) & 0xF0) == 0xE0) | |
1004 #define UTF_8_4_OCTET_LEADING_P(c) (((c) & 0xF8) == 0xF0) | |
1005 #define UTF_8_5_OCTET_LEADING_P(c) (((c) & 0xFC) == 0xF8) | |
1006 | |
1007 static int | |
1008 detect_coding_utf_8 (coding, mask) | |
1009 struct coding_system *coding; | |
1010 int *mask; | |
1011 { | |
1012 unsigned char *src = coding->source, *src_base = src; | |
1013 unsigned char *src_end = coding->source + coding->src_bytes; | |
1014 int multibytep = coding->src_multibyte; | |
1015 int consumed_chars = 0; | |
1016 int found = 0; | |
1017 | |
1018 /* A coding system of this category is always ASCII compatible. */ | |
1019 src += coding->head_ascii; | |
1020 | |
1021 while (1) | |
1022 { | |
1023 int c, c1, c2, c3, c4; | |
1024 | |
1025 ONE_MORE_BYTE (c); | |
1026 if (UTF_8_1_OCTET_P (c)) | |
1027 continue; | |
1028 ONE_MORE_BYTE (c1); | |
1029 if (! UTF_8_EXTRA_OCTET_P (c1)) | |
1030 break; | |
1031 if (UTF_8_2_OCTET_LEADING_P (c)) | |
1032 { | |
1033 found++; | |
1034 continue; | |
1035 } | |
1036 ONE_MORE_BYTE (c2); | |
1037 if (! UTF_8_EXTRA_OCTET_P (c2)) | |
1038 break; | |
1039 if (UTF_8_3_OCTET_LEADING_P (c)) | |
1040 { | |
1041 found++; | |
1042 continue; | |
1043 } | |
1044 ONE_MORE_BYTE (c3); | |
1045 if (! UTF_8_EXTRA_OCTET_P (c3)) | |
1046 break; | |
1047 if (UTF_8_4_OCTET_LEADING_P (c)) | |
1048 { | |
1049 found++; | |
1050 continue; | |
1051 } | |
1052 ONE_MORE_BYTE (c4); | |
1053 if (! UTF_8_EXTRA_OCTET_P (c4)) | |
1054 break; | |
1055 if (UTF_8_5_OCTET_LEADING_P (c)) | |
1056 { | |
1057 found++; | |
1058 continue; | |
1059 } | |
1060 break; | |
1061 } | |
1062 *mask &= ~CATEGORY_MASK_UTF_8; | |
1063 return 0; | |
1064 | |
1065 no_more_source: | |
1066 if (! found) | |
1067 return 0; | |
1068 *mask &= CATEGORY_MASK_UTF_8; | |
1069 return 1; | |
1070 } | |
1071 | |
1072 | |
1073 static void | |
1074 decode_coding_utf_8 (coding) | |
1075 struct coding_system *coding; | |
1076 { | |
1077 unsigned char *src = coding->source + coding->consumed; | |
1078 unsigned char *src_end = coding->source + coding->src_bytes; | |
1079 unsigned char *src_base; | |
1080 int *charbuf = coding->charbuf; | |
1081 int *charbuf_end = charbuf + coding->charbuf_size; | |
1082 int consumed_chars = 0, consumed_chars_base; | |
1083 int multibytep = coding->src_multibyte; | |
1084 Lisp_Object attr, eol_type, charset_list; | |
1085 | |
1086 CODING_GET_INFO (coding, attr, eol_type, charset_list); | |
1087 | |
1088 while (1) | |
1089 { | |
1090 int c, c1, c2, c3, c4, c5; | |
1091 | |
1092 src_base = src; | |
1093 consumed_chars_base = consumed_chars; | |
1094 | |
1095 if (charbuf >= charbuf_end) | |
1096 break; | |
1097 | |
1098 ONE_MORE_BYTE (c1); | |
1099 if (UTF_8_1_OCTET_P(c1)) | |
1100 { | |
1101 c = c1; | |
1102 if (c == '\r') | |
1103 { | |
1104 if (EQ (eol_type, Qdos)) | |
1105 { | |
1106 if (src == src_end) | |
1107 goto no_more_source; | |
1108 if (*src == '\n') | |
1109 ONE_MORE_BYTE (c); | |
1110 } | |
1111 else if (EQ (eol_type, Qmac)) | |
1112 c = '\n'; | |
1113 } | |
1114 } | |
1115 else | |
1116 { | |
1117 ONE_MORE_BYTE (c2); | |
1118 if (! UTF_8_EXTRA_OCTET_P (c2)) | |
1119 goto invalid_code; | |
1120 if (UTF_8_2_OCTET_LEADING_P (c1)) | |
1121 c = ((c1 & 0x1F) << 6) | (c2 & 0x3F); | |
1122 else | |
1123 { | |
1124 ONE_MORE_BYTE (c3); | |
1125 if (! UTF_8_EXTRA_OCTET_P (c3)) | |
1126 goto invalid_code; | |
1127 if (UTF_8_3_OCTET_LEADING_P (c1)) | |
1128 c = (((c1 & 0xF) << 12) | |
1129 | ((c2 & 0x3F) << 6) | (c3 & 0x3F)); | |
1130 else | |
1131 { | |
1132 ONE_MORE_BYTE (c4); | |
1133 if (! UTF_8_EXTRA_OCTET_P (c4)) | |
1134 goto invalid_code; | |
1135 if (UTF_8_4_OCTET_LEADING_P (c1)) | |
1136 c = (((c1 & 0x7) << 18) | ((c2 & 0x3F) << 12) | |
1137 | ((c3 & 0x3F) << 6) | (c4 & 0x3F)); | |
1138 else | |
1139 { | |
1140 ONE_MORE_BYTE (c5); | |
1141 if (! UTF_8_EXTRA_OCTET_P (c5)) | |
1142 goto invalid_code; | |
1143 if (UTF_8_5_OCTET_LEADING_P (c1)) | |
1144 { | |
1145 c = (((c1 & 0x3) << 24) | ((c2 & 0x3F) << 18) | |
1146 | ((c3 & 0x3F) << 12) | ((c4 & 0x3F) << 6) | |
1147 | (c5 & 0x3F)); | |
1148 if (c > MAX_CHAR) | |
1149 goto invalid_code; | |
1150 } | |
1151 else | |
1152 goto invalid_code; | |
1153 } | |
1154 } | |
1155 } | |
1156 } | |
1157 | |
1158 *charbuf++ = c; | |
1159 continue; | |
1160 | |
1161 invalid_code: | |
1162 src = src_base; | |
1163 consumed_chars = consumed_chars_base; | |
1164 ONE_MORE_BYTE (c); | |
1165 *charbuf++ = ASCII_BYTE_P (c) ? c : BYTE8_TO_CHAR (c); | |
1166 coding->errors++; | |
1167 } | |
1168 | |
1169 no_more_source: | |
1170 coding->consumed_char += consumed_chars_base; | |
1171 coding->consumed = src_base - coding->source; | |
1172 coding->charbuf_used = charbuf - coding->charbuf; | |
1173 } | |
1174 | |
1175 | |
1176 static int | |
1177 encode_coding_utf_8 (coding) | |
1178 struct coding_system *coding; | |
1179 { | |
1180 int multibytep = coding->dst_multibyte; | |
1181 int *charbuf = coding->charbuf; | |
1182 int *charbuf_end = charbuf + coding->charbuf_used; | |
1183 unsigned char *dst = coding->destination + coding->produced; | |
1184 unsigned char *dst_end = coding->destination + coding->dst_bytes; | |
1185 int produced_chars; | |
1186 int c; | |
1187 | |
1188 if (multibytep) | |
1189 { | |
1190 int safe_room = MAX_MULTIBYTE_LENGTH * 2; | |
1191 | |
1192 while (charbuf < charbuf_end) | |
1193 { | |
1194 unsigned char str[MAX_MULTIBYTE_LENGTH], *p, *pend = str; | |
1195 | |
1196 ASSURE_DESTINATION (safe_room); | |
1197 c = *charbuf++; | |
1198 CHAR_STRING_ADVANCE (c, pend); | |
1199 for (p = str; p < pend; p++) | |
1200 EMIT_ONE_BYTE (*p); | |
1201 } | |
1202 } | |
1203 else | |
1204 { | |
1205 int safe_room = MAX_MULTIBYTE_LENGTH; | |
1206 | |
1207 while (charbuf < charbuf_end) | |
1208 { | |
1209 ASSURE_DESTINATION (safe_room); | |
1210 c = *charbuf++; | |
1211 dst += CHAR_STRING (c, dst); | |
1212 produced_chars++; | |
1213 } | |
1214 } | |
1215 coding->result = CODING_RESULT_SUCCESS; | |
1216 coding->produced_char += produced_chars; | |
1217 coding->produced = dst - coding->destination; | |
1218 return 0; | |
1219 } | |
1220 | |
1221 | |
1222 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions". | |
1223 Check if a text is encoded in UTF-16 Big Endian (endian == 1) or | |
1224 Little Endian (otherwise). If it is, return | |
1225 CATEGORY_MASK_UTF_16_BE or CATEGORY_MASK_UTF_16_LE, | |
1226 else return 0. */ | |
1227 | |
1228 #define UTF_16_HIGH_SURROGATE_P(val) \ | |
1229 (((val) & 0xFC00) == 0xD800) | |
1230 | |
1231 #define UTF_16_LOW_SURROGATE_P(val) \ | |
1232 (((val) & 0xFC00) == 0xDC00) | |
1233 | |
1234 #define UTF_16_INVALID_P(val) \ | |
1235 (((val) == 0xFFFE) \ | |
1236 || ((val) == 0xFFFF) \ | |
1237 || UTF_16_LOW_SURROGATE_P (val)) | |
1238 | |
1239 | |
1240 static int | |
1241 detect_coding_utf_16 (coding, mask) | |
1242 struct coding_system *coding; | |
1243 int *mask; | |
1244 { | |
1245 unsigned char *src = coding->source, *src_base = src; | |
1246 unsigned char *src_end = coding->source + coding->src_bytes; | |
1247 int multibytep = coding->src_multibyte; | |
1248 int consumed_chars = 0; | |
1249 int c1, c2; | |
1250 | |
1251 ONE_MORE_BYTE (c1); | |
1252 ONE_MORE_BYTE (c2); | |
1253 | |
1254 if ((c1 == 0xFF) && (c2 == 0xFE)) | |
1255 { | |
1256 *mask &= CATEGORY_MASK_UTF_16_LE; | |
1257 return 1; | |
1258 } | |
1259 else if ((c1 == 0xFE) && (c2 == 0xFF)) | |
1260 { | |
1261 *mask &= CATEGORY_MASK_UTF_16_BE; | |
1262 return 1; | |
1263 } | |
1264 no_more_source: | |
1265 return 0; | |
1266 } | |
1267 | |
1268 static void | |
1269 decode_coding_utf_16 (coding) | |
1270 struct coding_system *coding; | |
1271 { | |
1272 unsigned char *src = coding->source + coding->consumed; | |
1273 unsigned char *src_end = coding->source + coding->src_bytes; | |
88430
6418a272b97e
* coding.c: Delete unused variables.
Kenichi Handa <handa@m17n.org>
parents:
88365
diff
changeset
|
1274 unsigned char *src_base; |
88365 | 1275 int *charbuf = coding->charbuf; |
1276 int *charbuf_end = charbuf + coding->charbuf_size; | |
1277 int consumed_chars = 0, consumed_chars_base; | |
1278 int multibytep = coding->src_multibyte; | |
1279 enum utf_16_bom_type bom = CODING_UTF_16_BOM (coding); | |
1280 enum utf_16_endian_type endian = CODING_UTF_16_ENDIAN (coding); | |
1281 int surrogate = CODING_UTF_16_SURROGATE (coding); | |
1282 Lisp_Object attr, eol_type, charset_list; | |
1283 | |
1284 CODING_GET_INFO (coding, attr, eol_type, charset_list); | |
1285 | |
1286 if (bom != utf_16_without_bom) | |
1287 { | |
1288 int c, c1, c2; | |
1289 | |
1290 src_base = src; | |
1291 ONE_MORE_BYTE (c1); | |
1292 ONE_MORE_BYTE (c2); | |
1293 c = (c1 << 16) | c2; | |
1294 if (bom == utf_16_with_bom) | |
1295 { | |
1296 if (endian == utf_16_big_endian | |
1297 ? c != 0xFFFE : c != 0xFEFF) | |
1298 { | |
1299 /* We are sure that there's enouph room at CHARBUF. */ | |
1300 *charbuf++ = c1; | |
1301 *charbuf++ = c2; | |
1302 coding->errors++; | |
1303 } | |
1304 } | |
1305 else | |
1306 { | |
1307 if (c == 0xFFFE) | |
1308 CODING_UTF_16_ENDIAN (coding) | |
1309 = endian = utf_16_big_endian; | |
1310 else if (c == 0xFEFF) | |
1311 CODING_UTF_16_ENDIAN (coding) | |
1312 = endian = utf_16_little_endian; | |
1313 else | |
1314 { | |
1315 CODING_UTF_16_ENDIAN (coding) | |
1316 = endian = utf_16_big_endian; | |
1317 src = src_base; | |
1318 } | |
1319 } | |
1320 CODING_UTF_16_BOM (coding) = utf_16_with_bom; | |
1321 } | |
1322 | |
1323 while (1) | |
1324 { | |
1325 int c, c1, c2; | |
1326 | |
1327 src_base = src; | |
1328 consumed_chars_base = consumed_chars; | |
1329 | |
1330 if (charbuf + 2 >= charbuf_end) | |
1331 break; | |
1332 | |
1333 ONE_MORE_BYTE (c1); | |
1334 ONE_MORE_BYTE (c2); | |
1335 c = (endian == utf_16_big_endian | |
1336 ? ((c1 << 16) | c2) : ((c2 << 16) | c1)); | |
1337 if (surrogate) | |
1338 { | |
1339 if (! UTF_16_LOW_SURROGATE_P (c)) | |
1340 { | |
1341 if (endian == utf_16_big_endian) | |
1342 c1 = surrogate >> 8, c2 = surrogate & 0xFF; | |
1343 else | |
1344 c1 = surrogate & 0xFF, c2 = surrogate >> 8; | |
1345 *charbuf++ = c1; | |
1346 *charbuf++ = c2; | |
1347 coding->errors++; | |
1348 if (UTF_16_HIGH_SURROGATE_P (c)) | |
1349 CODING_UTF_16_SURROGATE (coding) = surrogate = c; | |
1350 else | |
1351 *charbuf++ = c; | |
1352 } | |
1353 else | |
1354 { | |
1355 c = ((surrogate - 0xD800) << 10) | (c - 0xDC00); | |
1356 CODING_UTF_16_SURROGATE (coding) = surrogate = 0; | |
1357 *charbuf++ = c; | |
1358 } | |
1359 } | |
1360 else | |
1361 { | |
1362 if (UTF_16_HIGH_SURROGATE_P (c)) | |
1363 CODING_UTF_16_SURROGATE (coding) = surrogate = c; | |
1364 else | |
1365 *charbuf++ = c; | |
1366 } | |
1367 } | |
1368 | |
1369 no_more_source: | |
1370 coding->consumed_char += consumed_chars_base; | |
1371 coding->consumed = src_base - coding->source; | |
1372 coding->charbuf_used = charbuf - coding->charbuf; | |
1373 } | |
1374 | |
1375 static int | |
1376 encode_coding_utf_16 (coding) | |
1377 struct coding_system *coding; | |
1378 { | |
1379 int multibytep = coding->dst_multibyte; | |
1380 int *charbuf = coding->charbuf; | |
1381 int *charbuf_end = charbuf + coding->charbuf_used; | |
1382 unsigned char *dst = coding->destination + coding->produced; | |
1383 unsigned char *dst_end = coding->destination + coding->dst_bytes; | |
1384 int safe_room = 8; | |
1385 enum utf_16_bom_type bom = CODING_UTF_16_BOM (coding); | |
1386 int big_endian = CODING_UTF_16_ENDIAN (coding) == utf_16_big_endian; | |
1387 int produced_chars = 0; | |
1388 Lisp_Object attrs, eol_type, charset_list; | |
1389 int c; | |
1390 | |
1391 CODING_GET_INFO (coding, attrs, eol_type, charset_list); | |
1392 | |
1393 if (bom == utf_16_with_bom) | |
1394 { | |
1395 ASSURE_DESTINATION (safe_room); | |
1396 if (big_endian) | |
1397 EMIT_TWO_BYTES (0xFF, 0xFE); | |
1398 else | |
1399 EMIT_TWO_BYTES (0xFE, 0xFF); | |
1400 CODING_UTF_16_BOM (coding) = utf_16_without_bom; | |
1401 } | |
1402 | |
1403 while (charbuf < charbuf_end) | |
1404 { | |
1405 ASSURE_DESTINATION (safe_room); | |
1406 c = *charbuf++; | |
1407 if (c >= 0x110000) | |
1408 c = 0xFFFF; | |
1409 | |
1410 if (c < 0x10000) | |
1411 { | |
1412 if (big_endian) | |
1413 EMIT_TWO_BYTES (c >> 8, c & 0xFF); | |
1414 else | |
1415 EMIT_TWO_BYTES (c & 0xFF, c >> 8); | |
1416 } | |
1417 else | |
1418 { | |
1419 int c1, c2; | |
1420 | |
1421 c -= 0x10000; | |
1422 c1 = (c >> 10) + 0xD800; | |
1423 c2 = (c & 0x3FF) + 0xDC00; | |
1424 if (big_endian) | |
1425 EMIT_FOUR_BYTES (c1 >> 8, c1 & 0xFF, c2 >> 8, c2 & 0xFF); | |
1426 else | |
1427 EMIT_FOUR_BYTES (c1 & 0xFF, c1 >> 8, c2 & 0xFF, c2 >> 8); | |
1428 } | |
1429 } | |
1430 coding->result = CODING_RESULT_SUCCESS; | |
1431 coding->produced = dst - coding->destination; | |
1432 coding->produced_char += produced_chars; | |
1433 return 0; | |
1434 } | |
1435 | |
1436 | |
1437 /*** 6. Old Emacs' internal format (emacs-mule) ***/ | |
17052 | 1438 |
34888
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1439 /* Emacs' internal format for representation of multiple character |
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1440 sets is a kind of multi-byte encoding, i.e. characters are |
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1441 represented by variable-length sequences of one-byte codes. |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
1442 |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
1443 ASCII characters and control characters (e.g. `tab', `newline') are |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
1444 represented by one-byte sequences which are their ASCII codes, in |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
1445 the range 0x00 through 0x7F. |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
1446 |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
1447 8-bit characters of the range 0x80..0x9F are represented by |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
1448 two-byte sequences of LEADING_CODE_8_BIT_CONTROL and (their 8-bit |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
1449 code + 0x20). |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
1450 |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
1451 8-bit characters of the range 0xA0..0xFF are represented by |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
1452 one-byte sequences which are their 8-bit code. |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
1453 |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
1454 The other characters are represented by a sequence of `base |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
1455 leading-code', optional `extended leading-code', and one or two |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
1456 `position-code's. The length of the sequence is determined by the |
34888
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1457 base leading-code. Leading-code takes the range 0x81 through 0x9D, |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
1458 whereas extended leading-code and position-code take the range 0xA0 |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
1459 through 0xFF. See `charset.h' for more details about leading-code |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
1460 and position-code. |
18766 | 1461 |
17052 | 1462 --- CODE RANGE of Emacs' internal format --- |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
1463 character set range |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
1464 ------------- ----- |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
1465 ascii 0x00..0x7F |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
1466 eight-bit-control LEADING_CODE_8_BIT_CONTROL + 0xA0..0xBF |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
1467 eight-bit-graphic 0xA0..0xBF |
34888
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1468 ELSE 0x81..0x9D + [0xA0..0xFF]+ |
17052 | 1469 --------------------------------------------- |
1470 | |
34888
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1471 As this is the internal character representation, the format is |
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1472 usually not used externally (i.e. in a file or in a data sent to a |
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1473 process). But, it is possible to have a text externally in this |
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1474 format (i.e. by encoding by the coding system `emacs-mule'). |
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1475 |
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1476 In that case, a sequence of one-byte codes has a slightly different |
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1477 form. |
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1478 |
88365 | 1479 At first, all characters in eight-bit-control are represented by |
34888
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1480 one-byte sequences which are their 8-bit code. |
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1481 |
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1482 Next, character composition data are represented by the byte |
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1483 sequence of the form: 0x80 METHOD BYTES CHARS COMPONENT ..., |
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1484 where, |
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1485 METHOD is 0xF0 plus one of composition method (enum |
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1486 composition_method), |
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1487 |
88365 | 1488 BYTES is 0xA0 plus a byte length of this composition data, |
1489 | |
1490 CHARS is 0x20 plus a number of characters composed by this | |
34888
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1491 data, |
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1492 |
88365 | 1493 COMPONENTs are characters of multibye form or composition |
34888
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1494 rules encoded by two-byte of ASCII codes. |
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1495 |
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1496 In addition, for backward compatibility, the following formats are |
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1497 also recognized as composition data on decoding. |
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1498 |
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1499 0x80 MSEQ ... |
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1500 0x80 0xFF MSEQ RULE MSEQ RULE ... MSEQ |
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1501 |
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1502 Here, |
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1503 MSEQ is a multibyte form but in these special format: |
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1504 ASCII: 0xA0 ASCII_CODE+0x80, |
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1505 other: LEADING_CODE+0x20 FOLLOWING-BYTE ..., |
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1506 RULE is a one byte code of the range 0xA0..0xF0 that |
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1507 represents a composition rule. |
17052 | 1508 */ |
1509 | |
88365 | 1510 char emacs_mule_bytes[256]; |
1511 | |
1512 /* Leading-code followed by extended leading-code. */ | |
1513 #define LEADING_CODE_PRIVATE_11 0x9A /* for private DIMENSION1 of 1-column */ | |
1514 #define LEADING_CODE_PRIVATE_12 0x9B /* for private DIMENSION1 of 2-column */ | |
1515 #define LEADING_CODE_PRIVATE_21 0x9C /* for private DIMENSION2 of 1-column */ | |
1516 #define LEADING_CODE_PRIVATE_22 0x9D /* for private DIMENSION2 of 2-column */ | |
1517 | |
1518 | |
1519 int | |
1520 emacs_mule_char (coding, composition, nbytes, nchars) | |
1521 struct coding_system *coding; | |
1522 int composition; | |
1523 int *nbytes, *nchars; | |
1524 { | |
1525 unsigned char *src = coding->source + coding->consumed; | |
1526 unsigned char *src_end = coding->source + coding->src_bytes; | |
1527 int multibytep = coding->src_multibyte; | |
1528 unsigned char *src_base = src; | |
1529 struct charset *charset; | |
1530 unsigned code; | |
1531 int c; | |
1532 int consumed_chars = 0; | |
1533 | |
1534 ONE_MORE_BYTE (c); | |
1535 if (composition) | |
1536 { | |
1537 c -= 0x20; | |
1538 if (c == 0x80) | |
1539 { | |
1540 ONE_MORE_BYTE (c); | |
1541 if (c < 0xA0) | |
1542 goto invalid_code; | |
1543 *nbytes = src - src_base; | |
1544 *nchars = consumed_chars; | |
1545 return (c - 0x80); | |
1546 } | |
1547 } | |
1548 | |
1549 switch (emacs_mule_bytes[c]) | |
1550 { | |
1551 case 2: | |
1552 if (! (charset = emacs_mule_charset[c])) | |
1553 goto invalid_code; | |
1554 ONE_MORE_BYTE (c); | |
1555 code = c & 0x7F; | |
1556 break; | |
1557 | |
1558 case 3: | |
1559 if (c == LEADING_CODE_PRIVATE_11 | |
1560 || c == LEADING_CODE_PRIVATE_12) | |
1561 { | |
1562 ONE_MORE_BYTE (c); | |
1563 if (! (charset = emacs_mule_charset[c])) | |
1564 goto invalid_code; | |
1565 ONE_MORE_BYTE (c); | |
1566 code = c & 0x7F; | |
1567 } | |
1568 else | |
1569 { | |
1570 if (! (charset = emacs_mule_charset[c])) | |
1571 goto invalid_code; | |
1572 ONE_MORE_BYTE (c); | |
1573 code = (c & 0x7F) << 7; | |
1574 ONE_MORE_BYTE (c); | |
1575 code |= c & 0x7F; | |
1576 } | |
1577 break; | |
1578 | |
1579 case 4: | |
1580 if (! (charset = emacs_mule_charset[c])) | |
1581 goto invalid_code; | |
1582 ONE_MORE_BYTE (c); | |
1583 code = (c & 0x7F) << 7; | |
1584 ONE_MORE_BYTE (c); | |
1585 code |= c & 0x7F; | |
1586 break; | |
1587 | |
1588 case 1: | |
1589 code = c; | |
1590 charset = CHARSET_FROM_ID (ASCII_BYTE_P (code) ? charset_ascii | |
1591 : code < 0xA0 ? charset_8_bit_control | |
1592 : charset_8_bit_graphic); | |
1593 break; | |
1594 | |
1595 default: | |
1596 abort (); | |
1597 } | |
1598 c = DECODE_CHAR (charset, code); | |
1599 if (c < 0) | |
1600 goto invalid_code; | |
1601 *nbytes = src - src_base; | |
1602 *nchars = consumed_chars; | |
1603 return c; | |
1604 | |
1605 no_more_source: | |
1606 return -2; | |
1607 | |
1608 invalid_code: | |
1609 return -1; | |
1610 } | |
1611 | |
17052 | 1612 |
1613 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions". | |
88365 | 1614 Check if a text is encoded in `emacs-mule'. */ |
17052 | 1615 |
34531
37f85e931855
(ONE_MORE_BYTE_CHECK_MULTIBYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34197
diff
changeset
|
1616 static int |
88365 | 1617 detect_coding_emacs_mule (coding, mask) |
1618 struct coding_system *coding; | |
1619 int *mask; | |
17052 | 1620 { |
88365 | 1621 unsigned char *src = coding->source, *src_base = src; |
1622 unsigned char *src_end = coding->source + coding->src_bytes; | |
1623 int multibytep = coding->src_multibyte; | |
1624 int consumed_chars = 0; | |
1625 int c; | |
1626 int found = 0; | |
1627 | |
1628 /* A coding system of this category is always ASCII compatible. */ | |
1629 src += coding->head_ascii; | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
1630 |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
1631 while (1) |
17052 | 1632 { |
88365 | 1633 ONE_MORE_BYTE (c); |
1634 | |
1635 if (c == 0x80) | |
17052 | 1636 { |
88365 | 1637 /* Perhaps the start of composite character. We simple skip |
1638 it because analyzing it is too heavy for detecting. But, | |
1639 at least, we check that the composite character | |
1640 constitues of more than 4 bytes. */ | |
1641 unsigned char *src_base; | |
1642 | |
1643 repeat: | |
1644 src_base = src; | |
1645 do | |
1646 { | |
1647 ONE_MORE_BYTE (c); | |
1648 } | |
1649 while (c >= 0xA0); | |
1650 | |
1651 if (src - src_base <= 4) | |
1652 break; | |
1653 found = 1; | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
1654 if (c == 0x80) |
88365 | 1655 goto repeat; |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
1656 } |
88365 | 1657 |
1658 if (c < 0x80) | |
1659 { | |
1660 if (c < 0x20 | |
1661 && (c == ISO_CODE_ESC || c == ISO_CODE_SI || c == ISO_CODE_SO)) | |
1662 break; | |
1663 } | |
1664 else | |
1665 { | |
1666 unsigned char *src_base = src - 1; | |
1667 | |
1668 do | |
1669 { | |
1670 ONE_MORE_BYTE (c); | |
1671 } | |
1672 while (c >= 0xA0); | |
1673 if (src - src_base != emacs_mule_bytes[*src_base]) | |
1674 break; | |
1675 found = 1; | |
1676 } | |
1677 } | |
1678 *mask &= ~CATEGORY_MASK_EMACS_MULE; | |
1679 return 0; | |
1680 | |
1681 no_more_source: | |
1682 if (!found) | |
1683 return 0; | |
1684 *mask &= CATEGORY_MASK_EMACS_MULE; | |
1685 return 1; | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
1686 } |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
1687 |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
1688 |
88365 | 1689 /* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions". */ |
34888
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1690 |
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1691 /* Decode a character represented as a component of composition |
88365 | 1692 sequence of Emacs 20/21 style at SRC. Set C to that character and |
1693 update SRC to the head of next character (or an encoded composition | |
1694 rule). If SRC doesn't points a composition component, set C to -1. | |
1695 If SRC points an invalid byte sequence, global exit by a return | |
1696 value 0. */ | |
1697 | |
1698 #define DECODE_EMACS_MULE_COMPOSITION_CHAR(buf) \ | |
1699 if (1) \ | |
1700 { \ | |
1701 int c; \ | |
1702 int nbytes, nchars; \ | |
34888
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1703 \ |
88365 | 1704 if (src == src_end) \ |
1705 break; \ | |
1706 c = emacs_mule_char (coding, 1, &nbytes, &nchars); \ | |
1707 if (c < 0) \ | |
1708 { \ | |
1709 if (c == -2) \ | |
1710 break; \ | |
1711 goto invalid_code; \ | |
1712 } \ | |
1713 *buf++ = c; \ | |
1714 src += nbytes; \ | |
1715 consumed_chars += nchars; \ | |
1716 } \ | |
1717 else | |
34888
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1718 |
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1719 |
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1720 /* Decode a composition rule represented as a component of composition |
88365 | 1721 sequence of Emacs 20 style at SRC. Set C to the rule. If SRC |
1722 points an invalid byte sequence, set C to -1. */ | |
1723 | |
1724 #define DECODE_EMACS_MULE_COMPOSITION_RULE(buf) \ | |
34888
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1725 do { \ |
88365 | 1726 int c, gref, nref; \ |
1727 \ | |
1728 if (src < src_end) \ | |
1729 goto invalid_code; \ | |
1730 ONE_MORE_BYTE_NO_CHECK (c); \ | |
34888
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1731 c -= 0xA0; \ |
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1732 if (c < 0 || c >= 81) \ |
88365 | 1733 goto invalid_code; \ |
1734 \ | |
1735 gref = c / 9, nref = c % 9; \ | |
1736 *buf++ = COMPOSITION_ENCODE_RULE (gref, nref); \ | |
1737 } while (0) | |
1738 | |
1739 | |
1740 #define ADD_COMPOSITION_DATA(buf, method, nchars) \ | |
1741 do { \ | |
1742 *buf++ = -5; \ | |
1743 *buf++ = coding->produced_char + char_offset; \ | |
1744 *buf++ = CODING_ANNOTATE_COMPOSITION_MASK; \ | |
1745 *buf++ = method; \ | |
1746 *buf++ = nchars; \ | |
34888
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1747 } while (0) |
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1748 |
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1749 |
88365 | 1750 #define DECODE_EMACS_MULE_21_COMPOSITION(c) \ |
1751 do { \ | |
1752 /* Emacs 21 style format. The first three bytes at SRC are \ | |
1753 (METHOD - 0xF0), (BYTES - 0xA0), (CHARS - 0xA0), where BYTES is \ | |
1754 the byte length of this composition information, CHARS is the \ | |
1755 number of characters composed by this composition. */ \ | |
1756 enum composition_method method = c - 0xF0; \ | |
1757 int consumed_chars_limit; \ | |
1758 int nbytes, nchars; \ | |
1759 \ | |
1760 ONE_MORE_BYTE (c); \ | |
1761 nbytes = c - 0xA0; \ | |
1762 if (nbytes < 3) \ | |
1763 goto invalid_code; \ | |
1764 ONE_MORE_BYTE (c); \ | |
1765 nchars = c - 0xA0; \ | |
1766 ADD_COMPOSITION_DATA (charbuf, method, nchars); \ | |
1767 consumed_chars_limit = consumed_chars_base + nbytes; \ | |
1768 if (method != COMPOSITION_RELATIVE) \ | |
1769 { \ | |
1770 int i = 0; \ | |
1771 while (consumed_chars < consumed_chars_limit) \ | |
1772 { \ | |
1773 if (i % 2 && method != COMPOSITION_WITH_ALTCHARS) \ | |
1774 DECODE_EMACS_MULE_COMPOSITION_RULE (charbuf); \ | |
1775 else \ | |
1776 DECODE_EMACS_MULE_COMPOSITION_CHAR (charbuf); \ | |
1777 } \ | |
1778 if (consumed_chars < consumed_chars_limit) \ | |
1779 goto invalid_code; \ | |
1780 } \ | |
1781 } while (0) | |
1782 | |
1783 | |
1784 #define DECODE_EMACS_MULE_20_RELATIVE_COMPOSITION(c) \ | |
1785 do { \ | |
1786 /* Emacs 20 style format for relative composition. */ \ | |
1787 /* Store multibyte form of characters to be composed. */ \ | |
1788 int components[MAX_COMPOSITION_COMPONENTS * 2 - 1]; \ | |
1789 int *buf = components; \ | |
1790 int i, j; \ | |
1791 \ | |
1792 src = src_base; \ | |
1793 ONE_MORE_BYTE (c); /* skip 0x80 */ \ | |
1794 for (i = 0; i < MAX_COMPOSITION_COMPONENTS; i++) \ | |
1795 DECODE_EMACS_MULE_COMPOSITION_CHAR (buf); \ | |
1796 if (i < 2) \ | |
1797 goto invalid_code; \ | |
1798 ADD_COMPOSITION_DATA (charbuf, COMPOSITION_RELATIVE, i); \ | |
1799 for (j = 0; j < i; j++) \ | |
1800 *charbuf++ = components[j]; \ | |
1801 } while (0) | |
1802 | |
1803 | |
1804 #define DECODE_EMACS_MULE_20_RULEBASE_COMPOSITION(c) \ | |
1805 do { \ | |
1806 /* Emacs 20 style format for rule-base composition. */ \ | |
1807 /* Store multibyte form of characters to be composed. */ \ | |
1808 int components[MAX_COMPOSITION_COMPONENTS * 2 - 1]; \ | |
1809 int *buf = components; \ | |
1810 int i, j; \ | |
1811 \ | |
1812 DECODE_EMACS_MULE_COMPOSITION_CHAR (buf); \ | |
1813 for (i = 0; i < MAX_COMPOSITION_COMPONENTS; i++) \ | |
1814 { \ | |
1815 DECODE_EMACS_MULE_COMPOSITION_RULE (buf); \ | |
1816 DECODE_EMACS_MULE_COMPOSITION_CHAR (buf); \ | |
1817 } \ | |
1818 if (i < 1 || (buf - components) % 2 == 0) \ | |
1819 goto invalid_code; \ | |
1820 if (charbuf + i + (i / 2) + 1 < charbuf_end) \ | |
1821 goto no_more_source; \ | |
1822 ADD_COMPOSITION_DATA (buf, COMPOSITION_WITH_RULE, i); \ | |
1823 for (j = 0; j < i; j++) \ | |
1824 *charbuf++ = components[j]; \ | |
1825 for (j = 0; j < i; j += 2) \ | |
1826 *charbuf++ = components[j]; \ | |
1827 } while (0) | |
1828 | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
1829 |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
1830 static void |
88365 | 1831 decode_coding_emacs_mule (coding) |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
1832 struct coding_system *coding; |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
1833 { |
88365 | 1834 unsigned char *src = coding->source + coding->consumed; |
1835 unsigned char *src_end = coding->source + coding->src_bytes; | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
1836 unsigned char *src_base; |
88365 | 1837 int *charbuf = coding->charbuf; |
1838 int *charbuf_end = charbuf + coding->charbuf_size; | |
1839 int consumed_chars = 0, consumed_chars_base; | |
1840 int char_offset = 0; | |
1841 int multibytep = coding->src_multibyte; | |
1842 Lisp_Object attrs, eol_type, charset_list; | |
1843 | |
1844 CODING_GET_INFO (coding, attrs, eol_type, charset_list); | |
1845 | |
1846 while (1) | |
1847 { | |
1848 int c; | |
1849 | |
1850 src_base = src; | |
1851 consumed_chars_base = consumed_chars; | |
1852 | |
1853 if (charbuf >= charbuf_end) | |
1854 break; | |
1855 | |
1856 ONE_MORE_BYTE (c); | |
1857 | |
1858 if (c < 0x80) | |
32806
9502d0a5b2ad
(decode_coding_emacs_mule): If coding->eol_type is CR
Eli Zaretskii <eliz@gnu.org>
parents:
32745
diff
changeset
|
1859 { |
88365 | 1860 if (c == '\r') |
32806
9502d0a5b2ad
(decode_coding_emacs_mule): If coding->eol_type is CR
Eli Zaretskii <eliz@gnu.org>
parents:
32745
diff
changeset
|
1861 { |
88365 | 1862 if (EQ (eol_type, Qdos)) |
32806
9502d0a5b2ad
(decode_coding_emacs_mule): If coding->eol_type is CR
Eli Zaretskii <eliz@gnu.org>
parents:
32745
diff
changeset
|
1863 { |
88365 | 1864 if (src == src_end) |
1865 goto no_more_source; | |
1866 if (*src == '\n') | |
1867 ONE_MORE_BYTE (c); | |
32806
9502d0a5b2ad
(decode_coding_emacs_mule): If coding->eol_type is CR
Eli Zaretskii <eliz@gnu.org>
parents:
32745
diff
changeset
|
1868 } |
88365 | 1869 else if (EQ (eol_type, Qmac)) |
1870 c = '\n'; | |
32806
9502d0a5b2ad
(decode_coding_emacs_mule): If coding->eol_type is CR
Eli Zaretskii <eliz@gnu.org>
parents:
32745
diff
changeset
|
1871 } |
88365 | 1872 *charbuf++ = c; |
1873 char_offset++; | |
32806
9502d0a5b2ad
(decode_coding_emacs_mule): If coding->eol_type is CR
Eli Zaretskii <eliz@gnu.org>
parents:
32745
diff
changeset
|
1874 } |
88365 | 1875 else if (c == 0x80) |
32806
9502d0a5b2ad
(decode_coding_emacs_mule): If coding->eol_type is CR
Eli Zaretskii <eliz@gnu.org>
parents:
32745
diff
changeset
|
1876 { |
88365 | 1877 if (charbuf + 5 + (MAX_COMPOSITION_COMPONENTS * 2) - 1 > charbuf_end) |
1878 break; | |
1879 ONE_MORE_BYTE (c); | |
1880 if (c - 0xF0 >= COMPOSITION_RELATIVE | |
1881 && c - 0xF0 <= COMPOSITION_WITH_RULE_ALTCHARS) | |
1882 DECODE_EMACS_MULE_21_COMPOSITION (c); | |
1883 else if (c < 0xC0) | |
1884 DECODE_EMACS_MULE_20_RELATIVE_COMPOSITION (c); | |
1885 else if (c == 0xFF) | |
1886 DECODE_EMACS_MULE_20_RULEBASE_COMPOSITION (c); | |
1887 else | |
1888 goto invalid_code; | |
32806
9502d0a5b2ad
(decode_coding_emacs_mule): If coding->eol_type is CR
Eli Zaretskii <eliz@gnu.org>
parents:
32745
diff
changeset
|
1889 } |
88365 | 1890 else if (c < 0xA0 && emacs_mule_bytes[c] > 1) |
34888
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1891 { |
88365 | 1892 int nbytes, nchars; |
1893 src--; | |
1894 c = emacs_mule_char (coding, 0, &nbytes, &nchars); | |
1895 if (c < 0) | |
34888
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1896 { |
88365 | 1897 if (c == -2) |
1898 break; | |
1899 goto invalid_code; | |
34888
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1900 } |
88365 | 1901 *charbuf++ = c; |
1902 char_offset++; | |
34888
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1903 } |
88365 | 1904 continue; |
1905 | |
1906 invalid_code: | |
1907 src = src_base; | |
1908 consumed_chars = consumed_chars_base; | |
1909 ONE_MORE_BYTE (c); | |
1910 *charbuf++ = ASCII_BYTE_P (c) ? c : BYTE8_TO_CHAR (c); | |
1911 coding->errors++; | |
1912 } | |
1913 | |
1914 no_more_source: | |
1915 coding->consumed_char += consumed_chars_base; | |
1916 coding->consumed = src_base - coding->source; | |
1917 coding->charbuf_used = charbuf - coding->charbuf; | |
1918 } | |
1919 | |
1920 | |
1921 #define EMACS_MULE_LEADING_CODES(id, codes) \ | |
1922 do { \ | |
1923 if (id < 0xA0) \ | |
1924 codes[0] = id, codes[1] = 0; \ | |
1925 else if (id < 0xE0) \ | |
1926 codes[0] = 0x9A, codes[1] = id; \ | |
1927 else if (id < 0xF0) \ | |
1928 codes[0] = 0x9B, codes[1] = id; \ | |
1929 else if (id < 0xF5) \ | |
1930 codes[0] = 0x9C, codes[1] = id; \ | |
1931 else \ | |
1932 codes[0] = 0x9D, codes[1] = id; \ | |
1933 } while (0); | |
1934 | |
1935 | |
1936 static int | |
1937 encode_coding_emacs_mule (coding) | |
1938 struct coding_system *coding; | |
1939 { | |
1940 int multibytep = coding->dst_multibyte; | |
1941 int *charbuf = coding->charbuf; | |
1942 int *charbuf_end = charbuf + coding->charbuf_used; | |
1943 unsigned char *dst = coding->destination + coding->produced; | |
1944 unsigned char *dst_end = coding->destination + coding->dst_bytes; | |
1945 int safe_room = 8; | |
1946 int produced_chars = 0; | |
1947 Lisp_Object attrs, eol_type, charset_list; | |
1948 int c; | |
1949 | |
1950 CODING_GET_INFO (coding, attrs, eol_type, charset_list); | |
1951 | |
1952 while (charbuf < charbuf_end) | |
1953 { | |
1954 ASSURE_DESTINATION (safe_room); | |
1955 c = *charbuf++; | |
1956 if (ASCII_CHAR_P (c)) | |
1957 EMIT_ONE_ASCII_BYTE (c); | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
1958 else |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
1959 { |
88365 | 1960 struct charset *charset; |
1961 unsigned code; | |
1962 int dimension; | |
1963 int emacs_mule_id; | |
1964 unsigned char leading_codes[2]; | |
1965 | |
1966 charset = char_charset (c, charset_list, &code); | |
1967 if (! charset) | |
1968 { | |
1969 c = coding->default_char; | |
1970 if (ASCII_CHAR_P (c)) | |
1971 { | |
1972 EMIT_ONE_ASCII_BYTE (c); | |
1973 continue; | |
1974 } | |
1975 charset = char_charset (c, charset_list, &code); | |
1976 } | |
1977 dimension = CHARSET_DIMENSION (charset); | |
1978 emacs_mule_id = CHARSET_EMACS_MULE_ID (charset); | |
1979 EMACS_MULE_LEADING_CODES (emacs_mule_id, leading_codes); | |
1980 EMIT_ONE_BYTE (leading_codes[0]); | |
1981 if (leading_codes[1]) | |
1982 EMIT_ONE_BYTE (leading_codes[1]); | |
1983 if (dimension == 1) | |
1984 EMIT_ONE_BYTE (code); | |
1985 else | |
1986 { | |
1987 EMIT_ONE_BYTE (code >> 8); | |
1988 EMIT_ONE_BYTE (code & 0xFF); | |
1989 } | |
17052 | 1990 } |
88365 | 1991 } |
1992 coding->result = CODING_RESULT_SUCCESS; | |
1993 coding->produced_char += produced_chars; | |
1994 coding->produced = dst - coding->destination; | |
1995 return 0; | |
34888
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
1996 } |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
1997 |
17052 | 1998 |
88365 | 1999 /*** 7. ISO2022 handlers ***/ |
17052 | 2000 |
2001 /* The following note describes the coding system ISO2022 briefly. | |
24425
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2002 Since the intention of this note is to help understand the |
88365 | 2003 functions in this file, some parts are NOT ACCURATE or OVERLY |
24425
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2004 SIMPLIFIED. For thorough understanding, please refer to the |
88365 | 2005 original document of ISO2022. |
17052 | 2006 |
2007 ISO2022 provides many mechanisms to encode several character sets | |
88365 | 2008 in 7-bit and 8-bit environments. For 7-bite environments, all text |
24425
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2009 is encoded using bytes less than 128. This may make the encoded |
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2010 text a little bit longer, but the text passes more easily through |
88365 | 2011 several gateways, some of which strip off MSB (Most Signigant Bit). |
2012 | |
2013 There are two kinds of character sets: control character set and | |
2014 graphic character set. The former contains control characters such | |
17052 | 2015 as `newline' and `escape' to provide control functions (control |
24425
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2016 functions are also provided by escape sequences). The latter |
88365 | 2017 contains graphic characters such as 'A' and '-'. Emacs recognizes |
17052 | 2018 two control character sets and many graphic character sets. |
2019 | |
2020 Graphic character sets are classified into one of the following | |
24425
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2021 four classes, according to the number of bytes (DIMENSION) and |
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2022 number of characters in one dimension (CHARS) of the set: |
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2023 - DIMENSION1_CHARS94 |
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2024 - DIMENSION1_CHARS96 |
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2025 - DIMENSION2_CHARS94 |
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2026 - DIMENSION2_CHARS96 |
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2027 |
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2028 In addition, each character set is assigned an identification tag, |
88365 | 2029 unique for each set, called "final character" (denoted as <F> |
24425
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2030 hereafter). The <F> of each character set is decided by ECMA(*) |
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2031 when it is registered in ISO. The code range of <F> is 0x30..0x7F |
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2032 (0x30..0x3F are for private use only). |
17052 | 2033 |
2034 Note (*): ECMA = European Computer Manufacturers Association | |
2035 | |
88365 | 2036 Here are examples of graphic character set [NAME(<F>)]: |
17052 | 2037 o DIMENSION1_CHARS94 -- ASCII('B'), right-half-of-JISX0201('I'), ... |
2038 o DIMENSION1_CHARS96 -- right-half-of-ISO8859-1('A'), ... | |
2039 o DIMENSION2_CHARS94 -- GB2312('A'), JISX0208('B'), ... | |
2040 o DIMENSION2_CHARS96 -- none for the moment | |
2041 | |
24425
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2042 A code area (1 byte=8 bits) is divided into 4 areas, C0, GL, C1, and GR. |
17052 | 2043 C0 [0x00..0x1F] -- control character plane 0 |
2044 GL [0x20..0x7F] -- graphic character plane 0 | |
2045 C1 [0x80..0x9F] -- control character plane 1 | |
2046 GR [0xA0..0xFF] -- graphic character plane 1 | |
2047 | |
2048 A control character set is directly designated and invoked to C0 or | |
24425
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2049 C1 by an escape sequence. The most common case is that: |
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2050 - ISO646's control character set is designated/invoked to C0, and |
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2051 - ISO6429's control character set is designated/invoked to C1, |
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2052 and usually these designations/invocations are omitted in encoded |
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2053 text. In a 7-bit environment, only C0 can be used, and a control |
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2054 character for C1 is encoded by an appropriate escape sequence to |
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2055 fit into the environment. All control characters for C1 are |
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2056 defined to have corresponding escape sequences. |
17052 | 2057 |
2058 A graphic character set is at first designated to one of four | |
2059 graphic registers (G0 through G3), then these graphic registers are | |
2060 invoked to GL or GR. These designations and invocations can be | |
2061 done independently. The most common case is that G0 is invoked to | |
24425
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2062 GL, G1 is invoked to GR, and ASCII is designated to G0. Usually |
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2063 these invocations and designations are omitted in encoded text. |
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2064 In a 7-bit environment, only GL can be used. |
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2065 |
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2066 When a graphic character set of CHARS94 is invoked to GL, codes |
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2067 0x20 and 0x7F of the GL area work as control characters SPACE and |
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2068 DEL respectively, and codes 0xA0 and 0xFF of the GR area should not |
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2069 be used. |
17052 | 2070 |
2071 There are two ways of invocation: locking-shift and single-shift. | |
2072 With locking-shift, the invocation lasts until the next different | |
24425
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2073 invocation, whereas with single-shift, the invocation affects the |
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2074 following character only and doesn't affect the locking-shift |
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2075 state. Invocations are done by the following control characters or |
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2076 escape sequences: |
17052 | 2077 |
2078 ---------------------------------------------------------------------- | |
24425
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2079 abbrev function cntrl escape seq description |
17052 | 2080 ---------------------------------------------------------------------- |
24425
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2081 SI/LS0 (shift-in) 0x0F none invoke G0 into GL |
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2082 SO/LS1 (shift-out) 0x0E none invoke G1 into GL |
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2083 LS2 (locking-shift-2) none ESC 'n' invoke G2 into GL |
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2084 LS3 (locking-shift-3) none ESC 'o' invoke G3 into GL |
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2085 LS1R (locking-shift-1 right) none ESC '~' invoke G1 into GR (*) |
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2086 LS2R (locking-shift-2 right) none ESC '}' invoke G2 into GR (*) |
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2087 LS3R (locking-shift 3 right) none ESC '|' invoke G3 into GR (*) |
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2088 SS2 (single-shift-2) 0x8E ESC 'N' invoke G2 for one char |
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2089 SS3 (single-shift-3) 0x8F ESC 'O' invoke G3 for one char |
17052 | 2090 ---------------------------------------------------------------------- |
24425
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2091 (*) These are not used by any known coding system. |
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2092 |
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2093 Control characters for these functions are defined by macros |
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2094 ISO_CODE_XXX in `coding.h'. |
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2095 |
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2096 Designations are done by the following escape sequences: |
17052 | 2097 ---------------------------------------------------------------------- |
2098 escape sequence description | |
2099 ---------------------------------------------------------------------- | |
2100 ESC '(' <F> designate DIMENSION1_CHARS94<F> to G0 | |
2101 ESC ')' <F> designate DIMENSION1_CHARS94<F> to G1 | |
2102 ESC '*' <F> designate DIMENSION1_CHARS94<F> to G2 | |
2103 ESC '+' <F> designate DIMENSION1_CHARS94<F> to G3 | |
2104 ESC ',' <F> designate DIMENSION1_CHARS96<F> to G0 (*) | |
2105 ESC '-' <F> designate DIMENSION1_CHARS96<F> to G1 | |
2106 ESC '.' <F> designate DIMENSION1_CHARS96<F> to G2 | |
2107 ESC '/' <F> designate DIMENSION1_CHARS96<F> to G3 | |
2108 ESC '$' '(' <F> designate DIMENSION2_CHARS94<F> to G0 (**) | |
2109 ESC '$' ')' <F> designate DIMENSION2_CHARS94<F> to G1 | |
2110 ESC '$' '*' <F> designate DIMENSION2_CHARS94<F> to G2 | |
2111 ESC '$' '+' <F> designate DIMENSION2_CHARS94<F> to G3 | |
2112 ESC '$' ',' <F> designate DIMENSION2_CHARS96<F> to G0 (*) | |
2113 ESC '$' '-' <F> designate DIMENSION2_CHARS96<F> to G1 | |
2114 ESC '$' '.' <F> designate DIMENSION2_CHARS96<F> to G2 | |
2115 ESC '$' '/' <F> designate DIMENSION2_CHARS96<F> to G3 | |
2116 ---------------------------------------------------------------------- | |
2117 | |
2118 In this list, "DIMENSION1_CHARS94<F>" means a graphic character set | |
24425
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2119 of dimension 1, chars 94, and final character <F>, etc... |
17052 | 2120 |
2121 Note (*): Although these designations are not allowed in ISO2022, | |
2122 Emacs accepts them on decoding, and produces them on encoding | |
24425
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2123 CHARS96 character sets in a coding system which is characterized as |
17052 | 2124 7-bit environment, non-locking-shift, and non-single-shift. |
2125 | |
2126 Note (**): If <F> is '@', 'A', or 'B', the intermediate character | |
88365 | 2127 '(' must be omitted. We refer to this as "short-form" hereafter. |
2128 | |
2129 Now you may notice that there are a lot of ways for encoding the | |
24425
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2130 same multilingual text in ISO2022. Actually, there exist many |
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2131 coding systems such as Compound Text (used in X11's inter client |
88365 | 2132 communication, ISO-2022-JP (used in Japanese internet), ISO-2022-KR |
2133 (used in Korean internet), EUC (Extended UNIX Code, used in Asian | |
17052 | 2134 localized platforms), and all of these are variants of ISO2022. |
2135 | |
2136 In addition to the above, Emacs handles two more kinds of escape | |
2137 sequences: ISO6429's direction specification and Emacs' private | |
2138 sequence for specifying character composition. | |
2139 | |
24425
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2140 ISO6429's direction specification takes the following form: |
17052 | 2141 o CSI ']' -- end of the current direction |
2142 o CSI '0' ']' -- end of the current direction | |
2143 o CSI '1' ']' -- start of left-to-right text | |
2144 o CSI '2' ']' -- start of right-to-left text | |
2145 The control character CSI (0x9B: control sequence introducer) is | |
24425
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2146 abbreviated to the escape sequence ESC '[' in a 7-bit environment. |
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2147 |
61c6b3be1d51
Comment for ISO 2022 encoding mechanism modified.
Kenichi Handa <handa@m17n.org>
parents:
24344
diff
changeset
|
2148 Character composition specification takes the following form: |
26847 | 2149 o ESC '0' -- start relative composition |
2150 o ESC '1' -- end composition | |
2151 o ESC '2' -- start rule-base composition (*) | |
2152 o ESC '3' -- start relative composition with alternate chars (**) | |
2153 o ESC '4' -- start rule-base composition with alternate chars (**) | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2154 Since these are not standard escape sequences of any ISO standard, |
88365 | 2155 the use of them for these meaning is restricted to Emacs only. |
2156 | |
2157 (*) This form is used only in Emacs 20.5 and the older versions, | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2158 but the newer versions can safely decode it. |
88365 | 2159 (**) This form is used only in Emacs 21.1 and the newer versions, |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2160 and the older versions can't decode it. |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2161 |
88365 | 2162 Here's a list of examples usages of these composition escape |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2163 sequences (categorized by `enum composition_method'). |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2164 |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2165 COMPOSITION_RELATIVE: |
26847 | 2166 ESC 0 CHAR [ CHAR ] ESC 1 |
88365 | 2167 COMPOSITOIN_WITH_RULE: |
26847 | 2168 ESC 2 CHAR [ RULE CHAR ] ESC 1 |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2169 COMPOSITION_WITH_ALTCHARS: |
26847 | 2170 ESC 3 ALTCHAR [ ALTCHAR ] ESC 0 CHAR [ CHAR ] ESC 1 |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2171 COMPOSITION_WITH_RULE_ALTCHARS: |
26847 | 2172 ESC 4 ALTCHAR [ RULE ALTCHAR ] ESC 0 CHAR [ CHAR ] ESC 1 */ |
17052 | 2173 |
2174 enum iso_code_class_type iso_code_class[256]; | |
2175 | |
88365 | 2176 #define SAFE_CHARSET_P(coding, id) \ |
2177 ((id) <= (coding)->max_charset_id \ | |
2178 && (coding)->safe_charsets[id] >= 0) | |
2179 | |
2180 | |
2181 #define SHIFT_OUT_OK(category) \ | |
2182 (CODING_ISO_INITIAL (&coding_categories[category], 1) >= 0) | |
2183 | |
2184 static void | |
2185 setup_iso_safe_charsets (Lisp_Object attrs) | |
2186 { | |
2187 Lisp_Object charset_list, safe_charsets; | |
2188 Lisp_Object request; | |
2189 Lisp_Object reg_usage; | |
2190 Lisp_Object tail; | |
2191 int reg94, reg96; | |
2192 int flags = XINT (AREF (attrs, coding_attr_iso_flags)); | |
2193 int max_charset_id; | |
2194 | |
2195 charset_list = CODING_ATTR_CHARSET_LIST (attrs); | |
2196 if ((flags & CODING_ISO_FLAG_FULL_SUPPORT) | |
2197 && ! EQ (charset_list, Viso_2022_charset_list)) | |
2198 { | |
2199 CODING_ATTR_CHARSET_LIST (attrs) | |
2200 = charset_list = Viso_2022_charset_list; | |
2201 ASET (attrs, coding_attr_safe_charsets, Qnil); | |
2202 } | |
2203 | |
2204 if (STRINGP (AREF (attrs, coding_attr_safe_charsets))) | |
2205 return; | |
2206 | |
2207 max_charset_id = 0; | |
2208 for (tail = charset_list; CONSP (tail); tail = XCDR (tail)) | |
2209 { | |
2210 int id = XINT (XCAR (tail)); | |
2211 if (max_charset_id < id) | |
2212 max_charset_id = id; | |
2213 } | |
2214 | |
2215 safe_charsets = Fmake_string (make_number (max_charset_id + 1), | |
2216 make_number (255)); | |
2217 request = AREF (attrs, coding_attr_iso_request); | |
2218 reg_usage = AREF (attrs, coding_attr_iso_usage); | |
2219 reg94 = XINT (XCAR (reg_usage)); | |
2220 reg96 = XINT (XCDR (reg_usage)); | |
2221 | |
2222 for (tail = charset_list; CONSP (tail); tail = XCDR (tail)) | |
2223 { | |
2224 Lisp_Object id; | |
2225 Lisp_Object reg; | |
2226 struct charset *charset; | |
2227 | |
2228 id = XCAR (tail); | |
2229 charset = CHARSET_FROM_ID (XINT (id)); | |
2230 reg = Fcdr (Fassq (request, id)); | |
2231 if (! NILP (reg)) | |
2232 XSTRING (safe_charsets)->data[XINT (id)] = XINT (reg); | |
2233 else if (charset->iso_chars_96) | |
2234 { | |
2235 if (reg96 < 4) | |
2236 XSTRING (safe_charsets)->data[XINT (id)] = reg96; | |
2237 } | |
2238 else | |
2239 { | |
2240 if (reg94 < 4) | |
2241 XSTRING (safe_charsets)->data[XINT (id)] = reg94; | |
2242 } | |
2243 } | |
2244 ASET (attrs, coding_attr_safe_charsets, safe_charsets); | |
2245 } | |
2246 | |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
2247 |
17052 | 2248 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions". |
88365 | 2249 Check if a text is encoded in ISO2022. If it is, returns an |
17052 | 2250 integer in which appropriate flag bits any of: |
88365 | 2251 CATEGORY_MASK_ISO_7 |
2252 CATEGORY_MASK_ISO_7_TIGHT | |
2253 CATEGORY_MASK_ISO_8_1 | |
2254 CATEGORY_MASK_ISO_8_2 | |
2255 CATEGORY_MASK_ISO_7_ELSE | |
2256 CATEGORY_MASK_ISO_8_ELSE | |
17052 | 2257 are set. If a code which should never appear in ISO2022 is found, |
2258 returns 0. */ | |
2259 | |
34531
37f85e931855
(ONE_MORE_BYTE_CHECK_MULTIBYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34197
diff
changeset
|
2260 static int |
88365 | 2261 detect_coding_iso_2022 (coding, mask) |
2262 struct coding_system *coding; | |
2263 int *mask; | |
17052 | 2264 { |
88365 | 2265 unsigned char *src = coding->source, *src_base = src; |
2266 unsigned char *src_end = coding->source + coding->src_bytes; | |
2267 int multibytep = coding->src_multibyte; | |
2268 int mask_iso = CATEGORY_MASK_ISO; | |
2269 int mask_found = 0, mask_8bit_found = 0; | |
23088
45c36d636f66
(detect_coding_iso2022): Don't check the byte length of
Kenichi Handa <handa@m17n.org>
parents:
23082
diff
changeset
|
2270 int reg[4], shift_out = 0, single_shifting = 0; |
88365 | 2271 int id; |
2272 int c, c1; | |
2273 int consumed_chars = 0; | |
2274 int i; | |
2275 | |
2276 for (i = coding_category_iso_7; i <= coding_category_iso_8_else; i++) | |
2277 { | |
2278 struct coding_system *this = &(coding_categories[i]); | |
2279 Lisp_Object attrs, val; | |
2280 | |
2281 attrs = CODING_ID_ATTRS (this->id); | |
2282 if (CODING_ISO_FLAGS (this) & CODING_ISO_FLAG_FULL_SUPPORT | |
2283 && ! EQ (CODING_ATTR_SAFE_CHARSETS (attrs), Viso_2022_charset_list)) | |
2284 setup_iso_safe_charsets (attrs); | |
2285 val = CODING_ATTR_SAFE_CHARSETS (attrs); | |
2286 this->max_charset_id = XSTRING (val)->size - 1; | |
2287 this->safe_charsets = (char *) XSTRING (val)->data; | |
2288 } | |
2289 | |
2290 /* A coding system of this category is always ASCII compatible. */ | |
2291 src += coding->head_ascii; | |
2292 | |
2293 reg[0] = charset_ascii, reg[1] = reg[2] = reg[3] = -1; | |
2294 while (mask_iso && src < src_end) | |
2295 { | |
2296 ONE_MORE_BYTE (c); | |
17052 | 2297 switch (c) |
2298 { | |
2299 case ISO_CODE_ESC: | |
30204
35aec8514228
(inhibit_iso_escape_detection): New variable.
Kenichi Handa <handa@m17n.org>
parents:
29985
diff
changeset
|
2300 if (inhibit_iso_escape_detection) |
35aec8514228
(inhibit_iso_escape_detection): New variable.
Kenichi Handa <handa@m17n.org>
parents:
29985
diff
changeset
|
2301 break; |
23088
45c36d636f66
(detect_coding_iso2022): Don't check the byte length of
Kenichi Handa <handa@m17n.org>
parents:
23082
diff
changeset
|
2302 single_shifting = 0; |
88365 | 2303 ONE_MORE_BYTE (c); |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
2304 if (c >= '(' && c <= '/') |
19134
8fa6e23f8d22
(detect_coding_iso2022): Do not exclude posibility of
Kenichi Handa <handa@m17n.org>
parents:
19118
diff
changeset
|
2305 { |
8fa6e23f8d22
(detect_coding_iso2022): Do not exclude posibility of
Kenichi Handa <handa@m17n.org>
parents:
19118
diff
changeset
|
2306 /* Designation sequence for a charset of dimension 1. */ |
88365 | 2307 ONE_MORE_BYTE (c1); |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
2308 if (c1 < ' ' || c1 >= 0x80 |
88365 | 2309 || (id = iso_charset_table[0][c >= ','][c1]) < 0) |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
2310 /* Invalid designation sequence. Just ignore. */ |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
2311 break; |
88365 | 2312 reg[(c - '(') % 4] = id; |
19134
8fa6e23f8d22
(detect_coding_iso2022): Do not exclude posibility of
Kenichi Handa <handa@m17n.org>
parents:
19118
diff
changeset
|
2313 } |
8fa6e23f8d22
(detect_coding_iso2022): Do not exclude posibility of
Kenichi Handa <handa@m17n.org>
parents:
19118
diff
changeset
|
2314 else if (c == '$') |
17052 | 2315 { |
19134
8fa6e23f8d22
(detect_coding_iso2022): Do not exclude posibility of
Kenichi Handa <handa@m17n.org>
parents:
19118
diff
changeset
|
2316 /* Designation sequence for a charset of dimension 2. */ |
88365 | 2317 ONE_MORE_BYTE (c); |
19134
8fa6e23f8d22
(detect_coding_iso2022): Do not exclude posibility of
Kenichi Handa <handa@m17n.org>
parents:
19118
diff
changeset
|
2318 if (c >= '@' && c <= 'B') |
8fa6e23f8d22
(detect_coding_iso2022): Do not exclude posibility of
Kenichi Handa <handa@m17n.org>
parents:
19118
diff
changeset
|
2319 /* Designation for JISX0208.1978, GB2312, or JISX0208. */ |
88365 | 2320 reg[0] = id = iso_charset_table[1][0][c]; |
19134
8fa6e23f8d22
(detect_coding_iso2022): Do not exclude posibility of
Kenichi Handa <handa@m17n.org>
parents:
19118
diff
changeset
|
2321 else if (c >= '(' && c <= '/') |
17320
9d15bec5f47e
(detect_coding_iso2022, detect_coding_mask): Ignore
Kenichi Handa <handa@m17n.org>
parents:
17304
diff
changeset
|
2322 { |
88365 | 2323 ONE_MORE_BYTE (c1); |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
2324 if (c1 < ' ' || c1 >= 0x80 |
88365 | 2325 || (id = iso_charset_table[1][c >= ','][c1]) < 0) |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
2326 /* Invalid designation sequence. Just ignore. */ |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
2327 break; |
88365 | 2328 reg[(c - '(') % 4] = id; |
17320
9d15bec5f47e
(detect_coding_iso2022, detect_coding_mask): Ignore
Kenichi Handa <handa@m17n.org>
parents:
17304
diff
changeset
|
2329 } |
19134
8fa6e23f8d22
(detect_coding_iso2022): Do not exclude posibility of
Kenichi Handa <handa@m17n.org>
parents:
19118
diff
changeset
|
2330 else |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
2331 /* Invalid designation sequence. Just ignore. */ |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
2332 break; |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
2333 } |
23116
6736da064f4a
(detect_coding_iso2022): Handle ESC N and ESC O
Kenichi Handa <handa@m17n.org>
parents:
23089
diff
changeset
|
2334 else if (c == 'N' || c == 'O') |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
2335 { |
23116
6736da064f4a
(detect_coding_iso2022): Handle ESC N and ESC O
Kenichi Handa <handa@m17n.org>
parents:
23089
diff
changeset
|
2336 /* ESC <Fe> for SS2 or SS3. */ |
88365 | 2337 mask_iso &= CATEGORY_MASK_ISO_7_ELSE; |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
2338 break; |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
2339 } |
26847 | 2340 else if (c >= '0' && c <= '4') |
2341 { | |
2342 /* ESC <Fp> for start/end composition. */ | |
88365 | 2343 mask_found |= CATEGORY_MASK_ISO; |
26847 | 2344 break; |
2345 } | |
19134
8fa6e23f8d22
(detect_coding_iso2022): Do not exclude posibility of
Kenichi Handa <handa@m17n.org>
parents:
19118
diff
changeset
|
2346 else |
88365 | 2347 { |
2348 /* Invalid escape sequence. */ | |
2349 mask_iso &= ~CATEGORY_MASK_ISO_ESCAPE; | |
2350 break; | |
2351 } | |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
2352 |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
2353 /* We found a valid designation sequence for CHARSET. */ |
88365 | 2354 mask_iso &= ~CATEGORY_MASK_ISO_8BIT; |
2355 if (SAFE_CHARSET_P (&coding_categories[coding_category_iso_7], | |
2356 id)) | |
2357 mask_found |= CATEGORY_MASK_ISO_7; | |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
2358 else |
88365 | 2359 mask_iso &= ~CATEGORY_MASK_ISO_7; |
2360 if (SAFE_CHARSET_P (&coding_categories[coding_category_iso_7_tight], | |
2361 id)) | |
2362 mask_found |= CATEGORY_MASK_ISO_7_TIGHT; | |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
2363 else |
88365 | 2364 mask_iso &= ~CATEGORY_MASK_ISO_7_TIGHT; |
2365 if (SAFE_CHARSET_P (&coding_categories[coding_category_iso_7_else], | |
2366 id)) | |
2367 mask_found |= CATEGORY_MASK_ISO_7_ELSE; | |
23116
6736da064f4a
(detect_coding_iso2022): Handle ESC N and ESC O
Kenichi Handa <handa@m17n.org>
parents:
23089
diff
changeset
|
2368 else |
88365 | 2369 mask_iso &= ~CATEGORY_MASK_ISO_7_ELSE; |
2370 if (SAFE_CHARSET_P (&coding_categories[coding_category_iso_8_else], | |
2371 id)) | |
2372 mask_found |= CATEGORY_MASK_ISO_8_ELSE; | |
23116
6736da064f4a
(detect_coding_iso2022): Handle ESC N and ESC O
Kenichi Handa <handa@m17n.org>
parents:
23089
diff
changeset
|
2373 else |
88365 | 2374 mask_iso &= ~CATEGORY_MASK_ISO_8_ELSE; |
17052 | 2375 break; |
2376 | |
2377 case ISO_CODE_SO: | |
30204
35aec8514228
(inhibit_iso_escape_detection): New variable.
Kenichi Handa <handa@m17n.org>
parents:
29985
diff
changeset
|
2378 if (inhibit_iso_escape_detection) |
35aec8514228
(inhibit_iso_escape_detection): New variable.
Kenichi Handa <handa@m17n.org>
parents:
29985
diff
changeset
|
2379 break; |
23088
45c36d636f66
(detect_coding_iso2022): Don't check the byte length of
Kenichi Handa <handa@m17n.org>
parents:
23082
diff
changeset
|
2380 single_shifting = 0; |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
2381 if (shift_out == 0 |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
2382 && (reg[1] >= 0 |
88365 | 2383 || SHIFT_OUT_OK (coding_category_iso_7_else) |
2384 || SHIFT_OUT_OK (coding_category_iso_8_else))) | |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
2385 { |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
2386 /* Locking shift out. */ |
88365 | 2387 mask_iso &= ~CATEGORY_MASK_ISO_7BIT; |
2388 mask_found |= CATEGORY_MASK_ISO_ELSE; | |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
2389 } |
17119
2cfb31c15ced
(create_process, Fopen_network_stream): Typo in indexes
Kenichi Handa <handa@m17n.org>
parents:
17071
diff
changeset
|
2390 break; |
88365 | 2391 |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
2392 case ISO_CODE_SI: |
30204
35aec8514228
(inhibit_iso_escape_detection): New variable.
Kenichi Handa <handa@m17n.org>
parents:
29985
diff
changeset
|
2393 if (inhibit_iso_escape_detection) |
35aec8514228
(inhibit_iso_escape_detection): New variable.
Kenichi Handa <handa@m17n.org>
parents:
29985
diff
changeset
|
2394 break; |
23088
45c36d636f66
(detect_coding_iso2022): Don't check the byte length of
Kenichi Handa <handa@m17n.org>
parents:
23082
diff
changeset
|
2395 single_shifting = 0; |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
2396 if (shift_out == 1) |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
2397 { |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
2398 /* Locking shift in. */ |
88365 | 2399 mask_iso &= ~CATEGORY_MASK_ISO_7BIT; |
2400 mask_found |= CATEGORY_MASK_ISO_ELSE; | |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
2401 } |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
2402 break; |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
2403 |
17052 | 2404 case ISO_CODE_CSI: |
23088
45c36d636f66
(detect_coding_iso2022): Don't check the byte length of
Kenichi Handa <handa@m17n.org>
parents:
23082
diff
changeset
|
2405 single_shifting = 0; |
17052 | 2406 case ISO_CODE_SS2: |
2407 case ISO_CODE_SS3: | |
19365
d9374f5ebd3a
(CODING_FLAG_ISO_LATIN_EXTRA): New macro.
Kenichi Handa <handa@m17n.org>
parents:
19285
diff
changeset
|
2408 { |
88365 | 2409 int newmask = CATEGORY_MASK_ISO_8_ELSE; |
19365
d9374f5ebd3a
(CODING_FLAG_ISO_LATIN_EXTRA): New macro.
Kenichi Handa <handa@m17n.org>
parents:
19285
diff
changeset
|
2410 |
30204
35aec8514228
(inhibit_iso_escape_detection): New variable.
Kenichi Handa <handa@m17n.org>
parents:
29985
diff
changeset
|
2411 if (inhibit_iso_escape_detection) |
35aec8514228
(inhibit_iso_escape_detection): New variable.
Kenichi Handa <handa@m17n.org>
parents:
29985
diff
changeset
|
2412 break; |
20150
402b6e5f4b58
(encode_designation_at_bol): Fix bug of finding graphic
Kenichi Handa <handa@m17n.org>
parents:
20105
diff
changeset
|
2413 if (c != ISO_CODE_CSI) |
402b6e5f4b58
(encode_designation_at_bol): Fix bug of finding graphic
Kenichi Handa <handa@m17n.org>
parents:
20105
diff
changeset
|
2414 { |
88365 | 2415 if (CODING_ISO_FLAGS (&coding_categories[coding_category_iso_8_1]) |
2416 & CODING_ISO_FLAG_SINGLE_SHIFT) | |
2417 newmask |= CATEGORY_MASK_ISO_8_1; | |
2418 if (CODING_ISO_FLAGS (&coding_categories[coding_category_iso_8_2]) | |
2419 & CODING_ISO_FLAG_SINGLE_SHIFT) | |
2420 newmask |= CATEGORY_MASK_ISO_8_2; | |
23088
45c36d636f66
(detect_coding_iso2022): Don't check the byte length of
Kenichi Handa <handa@m17n.org>
parents:
23082
diff
changeset
|
2421 single_shifting = 1; |
20150
402b6e5f4b58
(encode_designation_at_bol): Fix bug of finding graphic
Kenichi Handa <handa@m17n.org>
parents:
20105
diff
changeset
|
2422 } |
19365
d9374f5ebd3a
(CODING_FLAG_ISO_LATIN_EXTRA): New macro.
Kenichi Handa <handa@m17n.org>
parents:
19285
diff
changeset
|
2423 if (VECTORP (Vlatin_extra_code_table) |
d9374f5ebd3a
(CODING_FLAG_ISO_LATIN_EXTRA): New macro.
Kenichi Handa <handa@m17n.org>
parents:
19285
diff
changeset
|
2424 && !NILP (XVECTOR (Vlatin_extra_code_table)->contents[c])) |
d9374f5ebd3a
(CODING_FLAG_ISO_LATIN_EXTRA): New macro.
Kenichi Handa <handa@m17n.org>
parents:
19285
diff
changeset
|
2425 { |
88365 | 2426 if (CODING_ISO_FLAGS (&coding_categories[coding_category_iso_8_1]) |
2427 & CODING_ISO_FLAG_LATIN_EXTRA) | |
2428 newmask |= CATEGORY_MASK_ISO_8_1; | |
2429 if (CODING_ISO_FLAGS (&coding_categories[coding_category_iso_8_2]) | |
2430 & CODING_ISO_FLAG_LATIN_EXTRA) | |
2431 newmask |= CATEGORY_MASK_ISO_8_2; | |
19365
d9374f5ebd3a
(CODING_FLAG_ISO_LATIN_EXTRA): New macro.
Kenichi Handa <handa@m17n.org>
parents:
19285
diff
changeset
|
2432 } |
88365 | 2433 mask_iso &= newmask; |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
2434 mask_found |= newmask; |
19365
d9374f5ebd3a
(CODING_FLAG_ISO_LATIN_EXTRA): New macro.
Kenichi Handa <handa@m17n.org>
parents:
19285
diff
changeset
|
2435 } |
d9374f5ebd3a
(CODING_FLAG_ISO_LATIN_EXTRA): New macro.
Kenichi Handa <handa@m17n.org>
parents:
19285
diff
changeset
|
2436 break; |
17052 | 2437 |
2438 default: | |
2439 if (c < 0x80) | |
23088
45c36d636f66
(detect_coding_iso2022): Don't check the byte length of
Kenichi Handa <handa@m17n.org>
parents:
23082
diff
changeset
|
2440 { |
45c36d636f66
(detect_coding_iso2022): Don't check the byte length of
Kenichi Handa <handa@m17n.org>
parents:
23082
diff
changeset
|
2441 single_shifting = 0; |
45c36d636f66
(detect_coding_iso2022): Don't check the byte length of
Kenichi Handa <handa@m17n.org>
parents:
23082
diff
changeset
|
2442 break; |
45c36d636f66
(detect_coding_iso2022): Don't check the byte length of
Kenichi Handa <handa@m17n.org>
parents:
23082
diff
changeset
|
2443 } |
17052 | 2444 else if (c < 0xA0) |
19280
e755044718ee
(ENCODE_ISO_CHARACTER_DIMENSION1): Pay attention to
Kenichi Handa <handa@m17n.org>
parents:
19193
diff
changeset
|
2445 { |
23088
45c36d636f66
(detect_coding_iso2022): Don't check the byte length of
Kenichi Handa <handa@m17n.org>
parents:
23082
diff
changeset
|
2446 single_shifting = 0; |
88365 | 2447 mask_8bit_found = 1; |
19365
d9374f5ebd3a
(CODING_FLAG_ISO_LATIN_EXTRA): New macro.
Kenichi Handa <handa@m17n.org>
parents:
19285
diff
changeset
|
2448 if (VECTORP (Vlatin_extra_code_table) |
d9374f5ebd3a
(CODING_FLAG_ISO_LATIN_EXTRA): New macro.
Kenichi Handa <handa@m17n.org>
parents:
19285
diff
changeset
|
2449 && !NILP (XVECTOR (Vlatin_extra_code_table)->contents[c])) |
19280
e755044718ee
(ENCODE_ISO_CHARACTER_DIMENSION1): Pay attention to
Kenichi Handa <handa@m17n.org>
parents:
19193
diff
changeset
|
2450 { |
19365
d9374f5ebd3a
(CODING_FLAG_ISO_LATIN_EXTRA): New macro.
Kenichi Handa <handa@m17n.org>
parents:
19285
diff
changeset
|
2451 int newmask = 0; |
d9374f5ebd3a
(CODING_FLAG_ISO_LATIN_EXTRA): New macro.
Kenichi Handa <handa@m17n.org>
parents:
19285
diff
changeset
|
2452 |
88365 | 2453 if (CODING_ISO_FLAGS (&coding_categories[coding_category_iso_8_1]) |
2454 & CODING_ISO_FLAG_LATIN_EXTRA) | |
2455 newmask |= CATEGORY_MASK_ISO_8_1; | |
2456 if (CODING_ISO_FLAGS (&coding_categories[coding_category_iso_8_2]) | |
2457 & CODING_ISO_FLAG_LATIN_EXTRA) | |
2458 newmask |= CATEGORY_MASK_ISO_8_2; | |
2459 mask_iso &= newmask; | |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
2460 mask_found |= newmask; |
19280
e755044718ee
(ENCODE_ISO_CHARACTER_DIMENSION1): Pay attention to
Kenichi Handa <handa@m17n.org>
parents:
19193
diff
changeset
|
2461 } |
19365
d9374f5ebd3a
(CODING_FLAG_ISO_LATIN_EXTRA): New macro.
Kenichi Handa <handa@m17n.org>
parents:
19285
diff
changeset
|
2462 else |
d9374f5ebd3a
(CODING_FLAG_ISO_LATIN_EXTRA): New macro.
Kenichi Handa <handa@m17n.org>
parents:
19285
diff
changeset
|
2463 return 0; |
19280
e755044718ee
(ENCODE_ISO_CHARACTER_DIMENSION1): Pay attention to
Kenichi Handa <handa@m17n.org>
parents:
19193
diff
changeset
|
2464 } |
17052 | 2465 else |
2466 { | |
88365 | 2467 mask_iso &= ~(CATEGORY_MASK_ISO_7BIT |
2468 | CATEGORY_MASK_ISO_7_ELSE); | |
2469 mask_found |= CATEGORY_MASK_ISO_8_1; | |
2470 mask_8bit_found = 1; | |
23088
45c36d636f66
(detect_coding_iso2022): Don't check the byte length of
Kenichi Handa <handa@m17n.org>
parents:
23082
diff
changeset
|
2471 /* Check the length of succeeding codes of the range |
45c36d636f66
(detect_coding_iso2022): Don't check the byte length of
Kenichi Handa <handa@m17n.org>
parents:
23082
diff
changeset
|
2472 0xA0..0FF. If the byte length is odd, we exclude |
88365 | 2473 CATEGORY_MASK_ISO_8_2. We can check this only |
23088
45c36d636f66
(detect_coding_iso2022): Don't check the byte length of
Kenichi Handa <handa@m17n.org>
parents:
23082
diff
changeset
|
2474 when we are not single shifting. */ |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2475 if (!single_shifting |
88365 | 2476 && mask_iso & CATEGORY_MASK_ISO_8_2) |
23088
45c36d636f66
(detect_coding_iso2022): Don't check the byte length of
Kenichi Handa <handa@m17n.org>
parents:
23082
diff
changeset
|
2477 { |
29299
b33b38d81020
(detect_coding_iso2022): Fix code for checking
Kenichi Handa <handa@m17n.org>
parents:
29275
diff
changeset
|
2478 int i = 1; |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2479 while (src < src_end) |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2480 { |
88365 | 2481 ONE_MORE_BYTE (c); |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2482 if (c < 0xA0) |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2483 break; |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2484 i++; |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2485 } |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2486 |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2487 if (i & 1 && src < src_end) |
88365 | 2488 mask_iso &= ~CATEGORY_MASK_ISO_8_2; |
23088
45c36d636f66
(detect_coding_iso2022): Don't check the byte length of
Kenichi Handa <handa@m17n.org>
parents:
23082
diff
changeset
|
2489 else |
88365 | 2490 mask_found |= CATEGORY_MASK_ISO_8_2; |
23088
45c36d636f66
(detect_coding_iso2022): Don't check the byte length of
Kenichi Handa <handa@m17n.org>
parents:
23082
diff
changeset
|
2491 } |
17052 | 2492 } |
2493 break; | |
2494 } | |
2495 } | |
88365 | 2496 no_more_source: |
2497 if (!mask_iso) | |
2498 { | |
2499 *mask &= ~CATEGORY_MASK_ISO; | |
2500 return 0; | |
2501 } | |
2502 if (!mask_found) | |
2503 return 0; | |
2504 *mask &= mask_iso & mask_found; | |
2505 if (! mask_8bit_found) | |
2506 *mask &= ~(CATEGORY_MASK_ISO_8BIT | CATEGORY_MASK_ISO_8_ELSE); | |
2507 return 1; | |
17052 | 2508 } |
2509 | |
2510 | |
2511 /* Set designation state into CODING. */ | |
88365 | 2512 #define DECODE_DESIGNATION(reg, dim, chars_96, final) \ |
2513 do { \ | |
2514 int id, prev; \ | |
2515 \ | |
2516 if (final < '0' || final >= 128 \ | |
2517 || ((id = ISO_CHARSET_TABLE (dim, chars_96, final)) < 0) \ | |
2518 || !SAFE_CHARSET_P (coding, id)) \ | |
2519 { \ | |
2520 CODING_ISO_DESIGNATION (coding, reg) = -2; \ | |
2521 goto invalid_code; \ | |
2522 } \ | |
2523 prev = CODING_ISO_DESIGNATION (coding, reg); \ | |
2524 CODING_ISO_DESIGNATION (coding, reg) = id; \ | |
2525 /* If there was an invalid designation to REG previously, and this \ | |
2526 designation is ASCII to REG, we should keep this designation \ | |
2527 sequence. */ \ | |
2528 if (prev == -2 && id == charset_ascii) \ | |
2529 goto invalid_code; \ | |
17052 | 2530 } while (0) |
2531 | |
88365 | 2532 |
2533 #define MAYBE_FINISH_COMPOSITION() \ | |
2534 do { \ | |
2535 int i; \ | |
2536 if (composition_state == COMPOSING_NO) \ | |
2537 break; \ | |
2538 /* It is assured that we have enough room for producing \ | |
2539 characters stored in the table `components'. */ \ | |
2540 if (charbuf + component_idx > charbuf_end) \ | |
2541 goto no_more_source; \ | |
2542 composition_state = COMPOSING_NO; \ | |
2543 if (method == COMPOSITION_RELATIVE \ | |
2544 || method == COMPOSITION_WITH_ALTCHARS) \ | |
2545 { \ | |
2546 for (i = 0; i < component_idx; i++) \ | |
2547 *charbuf++ = components[i]; \ | |
2548 char_offset += component_idx; \ | |
2549 } \ | |
2550 else \ | |
2551 { \ | |
2552 for (i = 0; i < component_idx; i += 2) \ | |
2553 *charbuf++ = components[i]; \ | |
2554 char_offset += (component_idx / 2) + 1; \ | |
2555 } \ | |
2556 } while (0) | |
2557 | |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
2558 |
34888
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
2559 /* Handle composition start sequence ESC 0, ESC 2, ESC 3, or ESC 4. |
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
2560 ESC 0 : relative composition : ESC 0 CHAR ... ESC 1 |
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
2561 ESC 2 : rulebase composition : ESC 2 CHAR RULE CHAR RULE ... CHAR ESC 1 |
88365 | 2562 ESC 3 : altchar composition : ESC 3 CHAR ... ESC 0 CHAR ... ESC 1 |
2563 ESC 4 : alt&rule composition : ESC 4 CHAR RULE ... CHAR ESC 0 CHAR ... ESC 1 | |
34888
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
2564 */ |
26847 | 2565 |
88365 | 2566 #define DECODE_COMPOSITION_START(c1) \ |
26847 | 2567 do { \ |
88365 | 2568 if (c1 == '0' \ |
2569 && composition_state == COMPOSING_COMPONENT_CHAR) \ | |
26847 | 2570 { \ |
88365 | 2571 component_len = component_idx; \ |
2572 composition_state = COMPOSING_CHAR; \ | |
26847 | 2573 } \ |
2574 else \ | |
2575 { \ | |
88365 | 2576 unsigned char *p; \ |
2577 \ | |
2578 MAYBE_FINISH_COMPOSITION (); \ | |
2579 if (charbuf + MAX_COMPOSITION_COMPONENTS > charbuf_end) \ | |
2580 goto no_more_source; \ | |
2581 for (p = src; p < src_end - 1; p++) \ | |
2582 if (*p == ISO_CODE_ESC && p[1] == '1') \ | |
2583 break; \ | |
2584 if (p == src_end - 1) \ | |
2585 { \ | |
2586 if (coding->mode & CODING_MODE_LAST_BLOCK) \ | |
2587 goto invalid_code; \ | |
2588 goto no_more_source; \ | |
2589 } \ | |
2590 \ | |
2591 /* This is surely the start of a composition. */ \ | |
2592 method = (c1 == '0' ? COMPOSITION_RELATIVE \ | |
2593 : c1 == '2' ? COMPOSITION_WITH_RULE \ | |
2594 : c1 == '3' ? COMPOSITION_WITH_ALTCHARS \ | |
2595 : COMPOSITION_WITH_RULE_ALTCHARS); \ | |
2596 composition_state = (c1 <= '2' ? COMPOSING_CHAR \ | |
2597 : COMPOSING_COMPONENT_CHAR); \ | |
2598 component_idx = component_len = 0; \ | |
26847 | 2599 } \ |
2600 } while (0) | |
2601 | |
88365 | 2602 |
2603 /* Handle compositoin end sequence ESC 1. */ | |
2604 | |
2605 #define DECODE_COMPOSITION_END() \ | |
2606 do { \ | |
2607 int nchars = (component_len > 0 ? component_idx - component_len \ | |
2608 : method == COMPOSITION_RELATIVE ? component_idx \ | |
2609 : (component_idx + 1) / 2); \ | |
2610 int i; \ | |
2611 int *saved_charbuf = charbuf; \ | |
2612 \ | |
2613 ADD_COMPOSITION_DATA (charbuf, method, nchars); \ | |
2614 if (method != COMPOSITION_RELATIVE) \ | |
2615 { \ | |
2616 if (component_len == 0) \ | |
2617 for (i = 0; i < component_idx; i++) \ | |
2618 *charbuf++ = components[i]; \ | |
2619 else \ | |
2620 for (i = 0; i < component_len; i++) \ | |
2621 *charbuf++ = components[i]; \ | |
2622 *saved_charbuf = saved_charbuf - charbuf; \ | |
2623 } \ | |
2624 if (method == COMPOSITION_WITH_RULE) \ | |
2625 for (i = 0; i < component_idx; i += 2, char_offset++) \ | |
2626 *charbuf++ = components[i]; \ | |
2627 else \ | |
2628 for (i = component_len; i < component_idx; i++, char_offset++) \ | |
2629 *charbuf++ = components[i]; \ | |
2630 coding->annotated = 1; \ | |
2631 composition_state = COMPOSING_NO; \ | |
2632 } while (0) | |
2633 | |
2634 | |
26847 | 2635 /* Decode a composition rule from the byte C1 (and maybe one more byte |
2636 from SRC) and store one encoded composition rule in | |
2637 coding->cmp_data. */ | |
2638 | |
2639 #define DECODE_COMPOSITION_RULE(c1) \ | |
2640 do { \ | |
2641 (c1) -= 32; \ | |
2642 if (c1 < 81) /* old format (before ver.21) */ \ | |
2643 { \ | |
2644 int gref = (c1) / 9; \ | |
2645 int nref = (c1) % 9; \ | |
2646 if (gref == 4) gref = 10; \ | |
2647 if (nref == 4) nref = 10; \ | |
88365 | 2648 c1 = COMPOSITION_ENCODE_RULE (gref, nref); \ |
26847 | 2649 } \ |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2650 else if (c1 < 93) /* new format (after ver.21) */ \ |
26847 | 2651 { \ |
2652 ONE_MORE_BYTE (c2); \ | |
88365 | 2653 c1 = COMPOSITION_ENCODE_RULE (c1 - 81, c2 - 32); \ |
26847 | 2654 } \ |
88365 | 2655 else \ |
2656 c1 = 0; \ | |
26847 | 2657 } while (0) |
2658 | |
2659 | |
17052 | 2660 /* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions". */ |
2661 | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2662 static void |
88365 | 2663 decode_coding_iso_2022 (coding) |
17052 | 2664 struct coding_system *coding; |
2665 { | |
88365 | 2666 unsigned char *src = coding->source + coding->consumed; |
2667 unsigned char *src_end = coding->source + coding->src_bytes; | |
2668 unsigned char *src_base; | |
2669 int *charbuf = coding->charbuf; | |
2670 int *charbuf_end = charbuf + coding->charbuf_size - 4; | |
2671 int consumed_chars = 0, consumed_chars_base; | |
2672 int char_offset = 0; | |
2673 int multibytep = coding->src_multibyte; | |
17052 | 2674 /* Charsets invoked to graphic plane 0 and 1 respectively. */ |
88365 | 2675 int charset_id_0 = CODING_ISO_INVOKED_CHARSET (coding, 0); |
2676 int charset_id_1 = CODING_ISO_INVOKED_CHARSET (coding, 1); | |
2677 struct charset *charset; | |
2678 int c; | |
2679 /* For handling composition sequence. */ | |
2680 #define COMPOSING_NO 0 | |
2681 #define COMPOSING_CHAR 1 | |
2682 #define COMPOSING_RULE 2 | |
2683 #define COMPOSING_COMPONENT_CHAR 3 | |
2684 #define COMPOSING_COMPONENT_RULE 4 | |
2685 | |
2686 int composition_state = COMPOSING_NO; | |
2687 enum composition_method method; | |
2688 int components[MAX_COMPOSITION_COMPONENTS * 2 + 1]; | |
2689 int component_idx; | |
2690 int component_len; | |
2691 Lisp_Object attrs, eol_type, charset_list; | |
2692 | |
2693 CODING_GET_INFO (coding, attrs, eol_type, charset_list); | |
2694 setup_iso_safe_charsets (attrs); | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2695 |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2696 while (1) |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2697 { |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2698 int c1, c2; |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2699 |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2700 src_base = src; |
88365 | 2701 consumed_chars_base = consumed_chars; |
2702 | |
2703 if (charbuf >= charbuf_end) | |
2704 break; | |
2705 | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2706 ONE_MORE_BYTE (c1); |
17052 | 2707 |
26847 | 2708 /* We produce no character or one character. */ |
17052 | 2709 switch (iso_code_class [c1]) |
2710 { | |
2711 case ISO_0x20_or_0x7F: | |
88365 | 2712 if (composition_state != COMPOSING_NO) |
26847 | 2713 { |
88365 | 2714 if (composition_state == COMPOSING_RULE |
2715 || composition_state == COMPOSING_COMPONENT_RULE) | |
2716 { | |
2717 DECODE_COMPOSITION_RULE (c1); | |
2718 components[component_idx++] = c1; | |
2719 composition_state--; | |
2720 continue; | |
2721 } | |
2722 else if (method == COMPOSITION_WITH_RULE) | |
2723 composition_state = COMPOSING_RULE; | |
2724 else if (method == COMPOSITION_WITH_RULE_ALTCHARS | |
2725 && composition_state == COMPOSING_COMPONENT_CHAR) | |
2726 composition_state = COMPOSING_COMPONENT_CHAR; | |
26847 | 2727 } |
88365 | 2728 if (charset_id_0 < 0 |
2729 || ! CHARSET_ISO_CHARS_96 (CHARSET_FROM_ID (charset_id_0))) | |
17052 | 2730 { |
2731 /* This is SPACE or DEL. */ | |
88365 | 2732 charset = CHARSET_FROM_ID (charset_ascii); |
17052 | 2733 break; |
2734 } | |
2735 /* This is a graphic character, we fall down ... */ | |
2736 | |
2737 case ISO_graphic_plane_0: | |
88365 | 2738 if (composition_state == COMPOSING_RULE) |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2739 { |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2740 DECODE_COMPOSITION_RULE (c1); |
88365 | 2741 components[component_idx++] = c1; |
2742 composition_state = COMPOSING_CHAR; | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2743 } |
88365 | 2744 charset = CHARSET_FROM_ID (charset_id_0); |
17052 | 2745 break; |
2746 | |
2747 case ISO_0xA0_or_0xFF: | |
88365 | 2748 if (charset_id_1 < 0 |
2749 || ! CHARSET_ISO_CHARS_96 (CHARSET_FROM_ID (charset_id_1)) | |
2750 || CODING_ISO_FLAGS (coding) & CODING_ISO_FLAG_SEVEN_BITS) | |
2751 goto invalid_code; | |
17052 | 2752 /* This is a graphic character, we fall down ... */ |
2753 | |
2754 case ISO_graphic_plane_1: | |
88365 | 2755 if (charset_id_1 < 0) |
2756 goto invalid_code; | |
2757 charset = CHARSET_FROM_ID (charset_id_1); | |
17052 | 2758 break; |
2759 | |
88365 | 2760 case ISO_carriage_return: |
2761 if (c1 == '\r') | |
2762 { | |
2763 if (EQ (eol_type, Qdos)) | |
2764 { | |
2765 if (src == src_end) | |
2766 goto no_more_source; | |
2767 if (*src == '\n') | |
2768 ONE_MORE_BYTE (c1); | |
2769 } | |
2770 else if (EQ (eol_type, Qmac)) | |
2771 c1 = '\n'; | |
2772 } | |
2773 /* fall through */ | |
2774 | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2775 case ISO_control_0: |
88365 | 2776 MAYBE_FINISH_COMPOSITION (); |
2777 charset = CHARSET_FROM_ID (charset_ascii); | |
17052 | 2778 break; |
2779 | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2780 case ISO_control_1: |
88365 | 2781 MAYBE_FINISH_COMPOSITION (); |
2782 goto invalid_code; | |
17052 | 2783 |
2784 case ISO_shift_out: | |
88365 | 2785 if (! (CODING_ISO_FLAGS (coding) & CODING_ISO_FLAG_LOCKING_SHIFT) |
2786 || CODING_ISO_DESIGNATION (coding, 1) < 0) | |
2787 goto invalid_code; | |
2788 CODING_ISO_INVOCATION (coding, 0) = 1; | |
2789 charset_id_0 = CODING_ISO_INVOKED_CHARSET (coding, 0); | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2790 continue; |
17052 | 2791 |
2792 case ISO_shift_in: | |
88365 | 2793 if (! (CODING_ISO_FLAGS (coding) & CODING_ISO_FLAG_LOCKING_SHIFT)) |
2794 goto invalid_code; | |
2795 CODING_ISO_INVOCATION (coding, 0) = 0; | |
2796 charset_id_0 = CODING_ISO_INVOKED_CHARSET (coding, 0); | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2797 continue; |
17052 | 2798 |
2799 case ISO_single_shift_2_7: | |
2800 case ISO_single_shift_2: | |
88365 | 2801 if (! (CODING_ISO_FLAGS (coding) & CODING_ISO_FLAG_SINGLE_SHIFT)) |
2802 goto invalid_code; | |
17052 | 2803 /* SS2 is handled as an escape sequence of ESC 'N' */ |
2804 c1 = 'N'; | |
2805 goto label_escape_sequence; | |
2806 | |
2807 case ISO_single_shift_3: | |
88365 | 2808 if (! (CODING_ISO_FLAGS (coding) & CODING_ISO_FLAG_SINGLE_SHIFT)) |
2809 goto invalid_code; | |
17052 | 2810 /* SS2 is handled as an escape sequence of ESC 'O' */ |
2811 c1 = 'O'; | |
2812 goto label_escape_sequence; | |
2813 | |
2814 case ISO_control_sequence_introducer: | |
2815 /* CSI is handled as an escape sequence of ESC '[' ... */ | |
2816 c1 = '['; | |
2817 goto label_escape_sequence; | |
2818 | |
2819 case ISO_escape: | |
2820 ONE_MORE_BYTE (c1); | |
2821 label_escape_sequence: | |
88365 | 2822 /* Escape sequences handled here are invocation, |
17052 | 2823 designation, direction specification, and character |
2824 composition specification. */ | |
2825 switch (c1) | |
2826 { | |
2827 case '&': /* revision of following character set */ | |
2828 ONE_MORE_BYTE (c1); | |
2829 if (!(c1 >= '@' && c1 <= '~')) | |
88365 | 2830 goto invalid_code; |
17052 | 2831 ONE_MORE_BYTE (c1); |
2832 if (c1 != ISO_CODE_ESC) | |
88365 | 2833 goto invalid_code; |
17052 | 2834 ONE_MORE_BYTE (c1); |
2835 goto label_escape_sequence; | |
2836 | |
2837 case '$': /* designation of 2-byte character set */ | |
88365 | 2838 if (! (CODING_ISO_FLAGS (coding) & CODING_ISO_FLAG_DESIGNATION)) |
2839 goto invalid_code; | |
17052 | 2840 ONE_MORE_BYTE (c1); |
2841 if (c1 >= '@' && c1 <= 'B') | |
2842 { /* designation of JISX0208.1978, GB2312.1980, | |
23339
2da87b489590
(check_composing_code): Fix previous change. Now it
Kenichi Handa <handa@m17n.org>
parents:
23325
diff
changeset
|
2843 or JISX0208.1980 */ |
88365 | 2844 DECODE_DESIGNATION (0, 2, 0, c1); |
17052 | 2845 } |
2846 else if (c1 >= 0x28 && c1 <= 0x2B) | |
2847 { /* designation of DIMENSION2_CHARS94 character set */ | |
2848 ONE_MORE_BYTE (c2); | |
88365 | 2849 DECODE_DESIGNATION (c1 - 0x28, 2, 0, c2); |
17052 | 2850 } |
2851 else if (c1 >= 0x2C && c1 <= 0x2F) | |
2852 { /* designation of DIMENSION2_CHARS96 character set */ | |
2853 ONE_MORE_BYTE (c2); | |
88365 | 2854 DECODE_DESIGNATION (c1 - 0x2C, 2, 1, c2); |
17052 | 2855 } |
2856 else | |
88365 | 2857 goto invalid_code; |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2858 /* We must update these variables now. */ |
88365 | 2859 charset_id_0 = CODING_ISO_INVOKED_CHARSET (coding, 0); |
2860 charset_id_1 = CODING_ISO_INVOKED_CHARSET (coding, 1); | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2861 continue; |
17052 | 2862 |
2863 case 'n': /* invocation of locking-shift-2 */ | |
88365 | 2864 if (! (CODING_ISO_FLAGS (coding) & CODING_ISO_FLAG_LOCKING_SHIFT) |
2865 || CODING_ISO_DESIGNATION (coding, 2) < 0) | |
2866 goto invalid_code; | |
2867 CODING_ISO_INVOCATION (coding, 0) = 2; | |
2868 charset_id_0 = CODING_ISO_INVOKED_CHARSET (coding, 0); | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2869 continue; |
17052 | 2870 |
2871 case 'o': /* invocation of locking-shift-3 */ | |
88365 | 2872 if (! (CODING_ISO_FLAGS (coding) & CODING_ISO_FLAG_LOCKING_SHIFT) |
2873 || CODING_ISO_DESIGNATION (coding, 3) < 0) | |
2874 goto invalid_code; | |
2875 CODING_ISO_INVOCATION (coding, 0) = 3; | |
2876 charset_id_0 = CODING_ISO_INVOKED_CHARSET (coding, 0); | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2877 continue; |
17052 | 2878 |
2879 case 'N': /* invocation of single-shift-2 */ | |
88365 | 2880 if (! (CODING_ISO_FLAGS (coding) & CODING_ISO_FLAG_SINGLE_SHIFT) |
2881 || CODING_ISO_DESIGNATION (coding, 2) < 0) | |
2882 goto invalid_code; | |
2883 charset = CHARSET_FROM_ID (CODING_ISO_DESIGNATION (coding, 2)); | |
17052 | 2884 ONE_MORE_BYTE (c1); |
30578
705b94e152b1
(decode_coding_iso2022): More strict check for handling single
Kenichi Handa <handa@m17n.org>
parents:
30487
diff
changeset
|
2885 if (c1 < 0x20 || (c1 >= 0x80 && c1 < 0xA0)) |
88365 | 2886 goto invalid_code; |
17052 | 2887 break; |
2888 | |
2889 case 'O': /* invocation of single-shift-3 */ | |
88365 | 2890 if (! (CODING_ISO_FLAGS (coding) & CODING_ISO_FLAG_SINGLE_SHIFT) |
2891 || CODING_ISO_DESIGNATION (coding, 3) < 0) | |
2892 goto invalid_code; | |
2893 charset = CHARSET_FROM_ID (CODING_ISO_DESIGNATION (coding, 3)); | |
17052 | 2894 ONE_MORE_BYTE (c1); |
30578
705b94e152b1
(decode_coding_iso2022): More strict check for handling single
Kenichi Handa <handa@m17n.org>
parents:
30487
diff
changeset
|
2895 if (c1 < 0x20 || (c1 >= 0x80 && c1 < 0xA0)) |
88365 | 2896 goto invalid_code; |
17052 | 2897 break; |
2898 | |
26847 | 2899 case '0': case '2': case '3': case '4': /* start composition */ |
88365 | 2900 if (! (coding->common_flags & CODING_ANNOTATE_COMPOSITION_MASK)) |
2901 goto invalid_code; | |
26847 | 2902 DECODE_COMPOSITION_START (c1); |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2903 continue; |
17052 | 2904 |
26847 | 2905 case '1': /* end composition */ |
88365 | 2906 if (composition_state == COMPOSING_NO) |
2907 goto invalid_code; | |
2908 DECODE_COMPOSITION_END (); | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2909 continue; |
17052 | 2910 |
2911 case '[': /* specification of direction */ | |
88365 | 2912 if (! CODING_ISO_FLAGS (coding) & CODING_ISO_FLAG_DIRECTION) |
2913 goto invalid_code; | |
17052 | 2914 /* For the moment, nested direction is not supported. |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
2915 So, `coding->mode & CODING_MODE_DIRECTION' zero means |
88365 | 2916 left-to-right, and nozero means right-to-left. */ |
17052 | 2917 ONE_MORE_BYTE (c1); |
2918 switch (c1) | |
2919 { | |
2920 case ']': /* end of the current direction */ | |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
2921 coding->mode &= ~CODING_MODE_DIRECTION; |
17052 | 2922 |
2923 case '0': /* end of the current direction */ | |
2924 case '1': /* start of left-to-right direction */ | |
2925 ONE_MORE_BYTE (c1); | |
2926 if (c1 == ']') | |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
2927 coding->mode &= ~CODING_MODE_DIRECTION; |
17052 | 2928 else |
88365 | 2929 goto invalid_code; |
17052 | 2930 break; |
2931 | |
2932 case '2': /* start of right-to-left direction */ | |
2933 ONE_MORE_BYTE (c1); | |
2934 if (c1 == ']') | |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
2935 coding->mode |= CODING_MODE_DIRECTION; |
17052 | 2936 else |
88365 | 2937 goto invalid_code; |
17052 | 2938 break; |
2939 | |
2940 default: | |
88365 | 2941 goto invalid_code; |
17052 | 2942 } |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2943 continue; |
17052 | 2944 |
2945 default: | |
88365 | 2946 if (! (CODING_ISO_FLAGS (coding) & CODING_ISO_FLAG_DESIGNATION)) |
2947 goto invalid_code; | |
17052 | 2948 if (c1 >= 0x28 && c1 <= 0x2B) |
2949 { /* designation of DIMENSION1_CHARS94 character set */ | |
2950 ONE_MORE_BYTE (c2); | |
88365 | 2951 DECODE_DESIGNATION (c1 - 0x28, 1, 0, c2); |
17052 | 2952 } |
2953 else if (c1 >= 0x2C && c1 <= 0x2F) | |
2954 { /* designation of DIMENSION1_CHARS96 character set */ | |
2955 ONE_MORE_BYTE (c2); | |
88365 | 2956 DECODE_DESIGNATION (c1 - 0x2C, 1, 1, c2); |
17052 | 2957 } |
2958 else | |
88365 | 2959 goto invalid_code; |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2960 /* We must update these variables now. */ |
88365 | 2961 charset_id_0 = CODING_ISO_INVOKED_CHARSET (coding, 0); |
2962 charset_id_1 = CODING_ISO_INVOKED_CHARSET (coding, 1); | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2963 continue; |
17052 | 2964 } |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2965 } |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2966 |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2967 /* Now we know CHARSET and 1st position code C1 of a character. |
88365 | 2968 Produce a decoded character while getting 2nd position code |
2969 C2 if necessary. */ | |
2970 c1 &= 0x7F; | |
2971 if (CHARSET_DIMENSION (charset) > 1) | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2972 { |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2973 ONE_MORE_BYTE (c2); |
88365 | 2974 if (c2 < 0x20 || (c2 >= 0x80 && c2 < 0xA0)) |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
2975 /* C2 is not in a valid range. */ |
88365 | 2976 goto invalid_code; |
2977 c1 = (c1 << 8) | (c2 & 0x7F); | |
2978 if (CHARSET_DIMENSION (charset) > 2) | |
2979 { | |
2980 ONE_MORE_BYTE (c2); | |
2981 if (c2 < 0x20 || (c2 >= 0x80 && c2 < 0xA0)) | |
2982 /* C2 is not in a valid range. */ | |
2983 goto invalid_code; | |
2984 c1 = (c1 << 8) | (c2 & 0x7F); | |
2985 } | |
17052 | 2986 } |
88365 | 2987 |
2988 CODING_DECODE_CHAR (coding, src, src_base, src_end, charset, c1, c); | |
2989 if (c < 0) | |
2990 { | |
2991 MAYBE_FINISH_COMPOSITION (); | |
2992 for (; src_base < src; src_base++, char_offset++) | |
2993 { | |
2994 if (ASCII_BYTE_P (*src_base)) | |
2995 *charbuf++ = *src_base; | |
2996 else | |
2997 *charbuf++ = BYTE8_TO_CHAR (*src_base); | |
2998 } | |
2999 } | |
3000 else if (composition_state == COMPOSING_NO) | |
3001 { | |
3002 *charbuf++ = c; | |
3003 char_offset++; | |
3004 } | |
3005 else | |
3006 components[component_idx++] = c; | |
17052 | 3007 continue; |
3008 | |
88365 | 3009 invalid_code: |
3010 MAYBE_FINISH_COMPOSITION (); | |
3011 src = src_base; | |
3012 consumed_chars = consumed_chars_base; | |
3013 ONE_MORE_BYTE (c); | |
3014 *charbuf++ = ASCII_BYTE_P (c) ? c : BYTE8_TO_CHAR (c); | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
3015 coding->errors++; |
88365 | 3016 } |
3017 | |
3018 no_more_source: | |
3019 coding->consumed_char += consumed_chars_base; | |
3020 coding->consumed = src_base - coding->source; | |
3021 coding->charbuf_used = charbuf - coding->charbuf; | |
17052 | 3022 } |
3023 | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
3024 |
18766 | 3025 /* ISO2022 encoding stuff. */ |
17052 | 3026 |
3027 /* | |
18766 | 3028 It is not enough to say just "ISO2022" on encoding, we have to |
88365 | 3029 specify more details. In Emacs, each coding system of ISO2022 |
17052 | 3030 variant has the following specifications: |
88365 | 3031 1. Initial designation to G0 thru G3. |
17052 | 3032 2. Allows short-form designation? |
3033 3. ASCII should be designated to G0 before control characters? | |
3034 4. ASCII should be designated to G0 at end of line? | |
3035 5. 7-bit environment or 8-bit environment? | |
3036 6. Use locking-shift? | |
3037 7. Use Single-shift? | |
3038 And the following two are only for Japanese: | |
3039 8. Use ASCII in place of JIS0201-1976-Roman? | |
3040 9. Use JISX0208-1983 in place of JISX0208-1978? | |
88365 | 3041 These specifications are encoded in CODING_ISO_FLAGS (coding) as flag bits |
3042 defined by macros CODING_ISO_FLAG_XXX. See `coding.h' for more | |
18766 | 3043 details. |
17052 | 3044 */ |
3045 | |
3046 /* Produce codes (escape sequence) for designating CHARSET to graphic | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
3047 register REG at DST, and increment DST. If <final-char> of CHARSET is |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
3048 '@', 'A', or 'B' and the coding system CODING allows, produce |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
3049 designation sequence of short-form. */ |
17052 | 3050 |
3051 #define ENCODE_DESIGNATION(charset, reg, coding) \ | |
3052 do { \ | |
88365 | 3053 unsigned char final_char = CHARSET_ISO_FINAL (charset); \ |
17052 | 3054 char *intermediate_char_94 = "()*+"; \ |
3055 char *intermediate_char_96 = ",-./"; \ | |
88365 | 3056 int revision = -1; \ |
3057 int c; \ | |
3058 \ | |
3059 if (CODING_ISO_FLAGS (coding) & CODING_ISO_FLAG_REVISION) \ | |
3060 revision = XINT (CHARSET_ISO_REVISION (charset)); \ | |
3061 \ | |
3062 if (revision >= 0) \ | |
20150
402b6e5f4b58
(encode_designation_at_bol): Fix bug of finding graphic
Kenichi Handa <handa@m17n.org>
parents:
20105
diff
changeset
|
3063 { \ |
88365 | 3064 EMIT_TWO_ASCII_BYTES (ISO_CODE_ESC, '&'); \ |
3065 EMIT_ONE_BYTE ('@' + revision); \ | |
17052 | 3066 } \ |
88365 | 3067 EMIT_ONE_ASCII_BYTE (ISO_CODE_ESC); \ |
17052 | 3068 if (CHARSET_DIMENSION (charset) == 1) \ |
3069 { \ | |
88365 | 3070 if (! CHARSET_ISO_CHARS_96 (charset)) \ |
3071 c = intermediate_char_94[reg]; \ | |
17052 | 3072 else \ |
88365 | 3073 c = intermediate_char_96[reg]; \ |
3074 EMIT_ONE_ASCII_BYTE (c); \ | |
17052 | 3075 } \ |
3076 else \ | |
3077 { \ | |
88365 | 3078 EMIT_ONE_ASCII_BYTE ('$'); \ |
3079 if (! CHARSET_ISO_CHARS_96 (charset)) \ | |
17052 | 3080 { \ |
88365 | 3081 if (CODING_ISO_FLAGS (coding) & CODING_ISO_FLAG_LONG_FORM \ |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
3082 || reg != 0 \ |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
3083 || final_char < '@' || final_char > 'B') \ |
88365 | 3084 EMIT_ONE_ASCII_BYTE (intermediate_char_94[reg]); \ |
17052 | 3085 } \ |
3086 else \ | |
88365 | 3087 EMIT_ONE_ASCII_BYTE (intermediate_char_96[reg]); \ |
17052 | 3088 } \ |
88365 | 3089 EMIT_ONE_ASCII_BYTE (final_char); \ |
3090 \ | |
3091 CODING_ISO_DESIGNATION (coding, reg) = CHARSET_ID (charset); \ | |
17052 | 3092 } while (0) |
3093 | |
88365 | 3094 |
17052 | 3095 /* The following two macros produce codes (control character or escape |
3096 sequence) for ISO2022 single-shift functions (single-shift-2 and | |
3097 single-shift-3). */ | |
3098 | |
88365 | 3099 #define ENCODE_SINGLE_SHIFT_2 \ |
3100 do { \ | |
3101 if (CODING_ISO_FLAGS (coding) & CODING_ISO_FLAG_SEVEN_BITS) \ | |
3102 EMIT_TWO_ASCII_BYTES (ISO_CODE_ESC, 'N'); \ | |
3103 else \ | |
3104 EMIT_ONE_BYTE (ISO_CODE_SS2); \ | |
3105 CODING_ISO_SINGLE_SHIFTING (coding) = 1; \ | |
17052 | 3106 } while (0) |
3107 | |
88365 | 3108 |
3109 #define ENCODE_SINGLE_SHIFT_3 \ | |
3110 do { \ | |
3111 if (CODING_ISO_FLAGS (coding) & CODING_ISO_FLAG_SEVEN_BITS) \ | |
3112 EMIT_TWO_ASCII_BYTES (ISO_CODE_ESC, 'O'); \ | |
3113 else \ | |
3114 EMIT_ONE_BYTE (ISO_CODE_SS3); \ | |
3115 CODING_ISO_SINGLE_SHIFTING (coding) = 1; \ | |
17052 | 3116 } while (0) |
3117 | |
88365 | 3118 |
17052 | 3119 /* The following four macros produce codes (control character or |
3120 escape sequence) for ISO2022 locking-shift functions (shift-in, | |
3121 shift-out, locking-shift-2, and locking-shift-3). */ | |
3122 | |
88365 | 3123 #define ENCODE_SHIFT_IN \ |
3124 do { \ | |
3125 EMIT_ONE_ASCII_BYTE (ISO_CODE_SI); \ | |
3126 CODING_ISO_INVOCATION (coding, 0) = 0; \ | |
17052 | 3127 } while (0) |
3128 | |
88365 | 3129 |
3130 #define ENCODE_SHIFT_OUT \ | |
3131 do { \ | |
3132 EMIT_ONE_ASCII_BYTE (ISO_CODE_SO); \ | |
3133 CODING_ISO_INVOCATION (coding, 0) = 1; \ | |
17052 | 3134 } while (0) |
3135 | |
88365 | 3136 |
3137 #define ENCODE_LOCKING_SHIFT_2 \ | |
3138 do { \ | |
3139 EMIT_TWO_ASCII_BYTES (ISO_CODE_ESC, 'n'); \ | |
3140 CODING_ISO_INVOCATION (coding, 0) = 2; \ | |
17052 | 3141 } while (0) |
3142 | |
88365 | 3143 |
3144 #define ENCODE_LOCKING_SHIFT_3 \ | |
3145 do { \ | |
3146 EMIT_TWO_ASCII_BYTES (ISO_CODE_ESC, 'n'); \ | |
3147 CODING_ISO_INVOCATION (coding, 0) = 3; \ | |
17052 | 3148 } while (0) |
3149 | |
88365 | 3150 |
18766 | 3151 /* Produce codes for a DIMENSION1 character whose character set is |
3152 CHARSET and whose position-code is C1. Designation and invocation | |
17052 | 3153 sequences are also produced in advance if necessary. */ |
3154 | |
19285
af3d00fde172
(Fset_terminal_coding_system_internal): Set
Kenichi Handa <handa@m17n.org>
parents:
19280
diff
changeset
|
3155 #define ENCODE_ISO_CHARACTER_DIMENSION1(charset, c1) \ |
af3d00fde172
(Fset_terminal_coding_system_internal): Set
Kenichi Handa <handa@m17n.org>
parents:
19280
diff
changeset
|
3156 do { \ |
88365 | 3157 int id = CHARSET_ID (charset); \ |
3158 if (CODING_ISO_SINGLE_SHIFTING (coding)) \ | |
19285
af3d00fde172
(Fset_terminal_coding_system_internal): Set
Kenichi Handa <handa@m17n.org>
parents:
19280
diff
changeset
|
3159 { \ |
88365 | 3160 if (CODING_ISO_FLAGS (coding) & CODING_ISO_FLAG_SEVEN_BITS) \ |
3161 EMIT_ONE_ASCII_BYTE (c1 & 0x7F); \ | |
19285
af3d00fde172
(Fset_terminal_coding_system_internal): Set
Kenichi Handa <handa@m17n.org>
parents:
19280
diff
changeset
|
3162 else \ |
88365 | 3163 EMIT_ONE_BYTE (c1 | 0x80); \ |
3164 CODING_ISO_SINGLE_SHIFTING (coding) = 0; \ | |
19285
af3d00fde172
(Fset_terminal_coding_system_internal): Set
Kenichi Handa <handa@m17n.org>
parents:
19280
diff
changeset
|
3165 break; \ |
af3d00fde172
(Fset_terminal_coding_system_internal): Set
Kenichi Handa <handa@m17n.org>
parents:
19280
diff
changeset
|
3166 } \ |
88365 | 3167 else if (id == CODING_ISO_INVOKED_CHARSET (coding, 0)) \ |
19285
af3d00fde172
(Fset_terminal_coding_system_internal): Set
Kenichi Handa <handa@m17n.org>
parents:
19280
diff
changeset
|
3168 { \ |
88365 | 3169 EMIT_ONE_ASCII_BYTE (c1 & 0x7F); \ |
19285
af3d00fde172
(Fset_terminal_coding_system_internal): Set
Kenichi Handa <handa@m17n.org>
parents:
19280
diff
changeset
|
3170 break; \ |
af3d00fde172
(Fset_terminal_coding_system_internal): Set
Kenichi Handa <handa@m17n.org>
parents:
19280
diff
changeset
|
3171 } \ |
88365 | 3172 else if (id == CODING_ISO_INVOKED_CHARSET (coding, 1)) \ |
19285
af3d00fde172
(Fset_terminal_coding_system_internal): Set
Kenichi Handa <handa@m17n.org>
parents:
19280
diff
changeset
|
3173 { \ |
88365 | 3174 EMIT_ONE_BYTE (c1 | 0x80); \ |
19285
af3d00fde172
(Fset_terminal_coding_system_internal): Set
Kenichi Handa <handa@m17n.org>
parents:
19280
diff
changeset
|
3175 break; \ |
af3d00fde172
(Fset_terminal_coding_system_internal): Set
Kenichi Handa <handa@m17n.org>
parents:
19280
diff
changeset
|
3176 } \ |
af3d00fde172
(Fset_terminal_coding_system_internal): Set
Kenichi Handa <handa@m17n.org>
parents:
19280
diff
changeset
|
3177 else \ |
af3d00fde172
(Fset_terminal_coding_system_internal): Set
Kenichi Handa <handa@m17n.org>
parents:
19280
diff
changeset
|
3178 /* Since CHARSET is not yet invoked to any graphic planes, we \ |
af3d00fde172
(Fset_terminal_coding_system_internal): Set
Kenichi Handa <handa@m17n.org>
parents:
19280
diff
changeset
|
3179 must invoke it, or, at first, designate it to some graphic \ |
af3d00fde172
(Fset_terminal_coding_system_internal): Set
Kenichi Handa <handa@m17n.org>
parents:
19280
diff
changeset
|
3180 register. Then repeat the loop to actually produce the \ |
af3d00fde172
(Fset_terminal_coding_system_internal): Set
Kenichi Handa <handa@m17n.org>
parents:
19280
diff
changeset
|
3181 character. */ \ |
88365 | 3182 dst = encode_invocation_designation (charset, coding, dst, \ |
3183 &produced_chars); \ | |
17052 | 3184 } while (1) |
3185 | |
88365 | 3186 |
3187 /* Produce codes for a DIMENSION2 character whose character set is | |
3188 CHARSET and whose position-codes are C1 and C2. Designation and | |
3189 invocation codes are also produced in advance if necessary. */ | |
3190 | |
3191 #define ENCODE_ISO_CHARACTER_DIMENSION2(charset, c1, c2) \ | |
24506
219c99669e4b
(ENCODE_ISO_CHARACTER): Check validity of CHARSET. If
Kenichi Handa <handa@m17n.org>
parents:
24460
diff
changeset
|
3192 do { \ |
88365 | 3193 int id = CHARSET_ID (charset); \ |
3194 if (CODING_ISO_SINGLE_SHIFTING (coding)) \ | |
3195 { \ | |
3196 if (CODING_ISO_FLAGS (coding) & CODING_ISO_FLAG_SEVEN_BITS) \ | |
3197 EMIT_TWO_ASCII_BYTES ((c1) & 0x7F, (c2) & 0x7F); \ | |
3198 else \ | |
3199 EMIT_TWO_BYTES ((c1) | 0x80, (c2) | 0x80); \ | |
3200 CODING_ISO_SINGLE_SHIFTING (coding) = 0; \ | |
3201 break; \ | |
3202 } \ | |
3203 else if (id == CODING_ISO_INVOKED_CHARSET (coding, 0)) \ | |
3204 { \ | |
3205 EMIT_TWO_ASCII_BYTES ((c1) & 0x7F, (c2) & 0x7F); \ | |
3206 break; \ | |
3207 } \ | |
3208 else if (id == CODING_ISO_INVOKED_CHARSET (coding, 1)) \ | |
3209 { \ | |
3210 EMIT_TWO_BYTES ((c1) | 0x80, (c2) | 0x80); \ | |
3211 break; \ | |
3212 } \ | |
3213 else \ | |
3214 /* Since CHARSET is not yet invoked to any graphic planes, we \ | |
3215 must invoke it, or, at first, designate it to some graphic \ | |
3216 register. Then repeat the loop to actually produce the \ | |
3217 character. */ \ | |
3218 dst = encode_invocation_designation (charset, coding, dst, \ | |
3219 &produced_chars); \ | |
3220 } while (1) | |
3221 | |
3222 | |
3223 #define ENCODE_ISO_CHARACTER(charset, c) \ | |
3224 do { \ | |
3225 int code = ENCODE_CHAR ((charset),(c)); \ | |
3226 \ | |
3227 if (CHARSET_DIMENSION (charset) == 1) \ | |
3228 ENCODE_ISO_CHARACTER_DIMENSION1 ((charset), code); \ | |
3229 else \ | |
3230 ENCODE_ISO_CHARACTER_DIMENSION2 ((charset), code >> 8, code & 0xFF); \ | |
22119
592bb8b9bcfd
Change terms unify/unification to
Kenichi Handa <handa@m17n.org>
parents:
22020
diff
changeset
|
3231 } while (0) |
17725
92f042f73be2
(Valternate_charset_table): The valiable deleted.
Kenichi Handa <handa@m17n.org>
parents:
17717
diff
changeset
|
3232 |
30487
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
3233 |
17052 | 3234 /* Produce designation and invocation codes at a place pointed by DST |
88365 | 3235 to use CHARSET. The element `spec.iso_2022' of *CODING is updated. |
17052 | 3236 Return new DST. */ |
3237 | |
3238 unsigned char * | |
88365 | 3239 encode_invocation_designation (charset, coding, dst, p_nchars) |
3240 struct charset *charset; | |
17052 | 3241 struct coding_system *coding; |
3242 unsigned char *dst; | |
88365 | 3243 int *p_nchars; |
17052 | 3244 { |
88365 | 3245 int multibytep = coding->dst_multibyte; |
3246 int produced_chars = *p_nchars; | |
17052 | 3247 int reg; /* graphic register number */ |
88365 | 3248 int id = CHARSET_ID (charset); |
17052 | 3249 |
3250 /* At first, check designations. */ | |
3251 for (reg = 0; reg < 4; reg++) | |
88365 | 3252 if (id == CODING_ISO_DESIGNATION (coding, reg)) |
17052 | 3253 break; |
3254 | |
3255 if (reg >= 4) | |
3256 { | |
3257 /* CHARSET is not yet designated to any graphic registers. */ | |
3258 /* At first check the requested designation. */ | |
88365 | 3259 reg = CODING_ISO_REQUEST (coding, id); |
3260 if (reg < 0) | |
18002
a14261786239
(encode_invocation_designation): Use macro
Kenichi Handa <handa@m17n.org>
parents:
17835
diff
changeset
|
3261 /* Since CHARSET requests no special designation, designate it |
a14261786239
(encode_invocation_designation): Use macro
Kenichi Handa <handa@m17n.org>
parents:
17835
diff
changeset
|
3262 to graphic register 0. */ |
17052 | 3263 reg = 0; |
3264 | |
3265 ENCODE_DESIGNATION (charset, reg, coding); | |
3266 } | |
3267 | |
88365 | 3268 if (CODING_ISO_INVOCATION (coding, 0) != reg |
3269 && CODING_ISO_INVOCATION (coding, 1) != reg) | |
17052 | 3270 { |
3271 /* Since the graphic register REG is not invoked to any graphic | |
3272 planes, invoke it to graphic plane 0. */ | |
3273 switch (reg) | |
3274 { | |
3275 case 0: /* graphic register 0 */ | |
3276 ENCODE_SHIFT_IN; | |
3277 break; | |
3278 | |
3279 case 1: /* graphic register 1 */ | |
3280 ENCODE_SHIFT_OUT; | |
3281 break; | |
3282 | |
3283 case 2: /* graphic register 2 */ | |
88365 | 3284 if (CODING_ISO_FLAGS (coding) & CODING_ISO_FLAG_SINGLE_SHIFT) |
17052 | 3285 ENCODE_SINGLE_SHIFT_2; |
3286 else | |
3287 ENCODE_LOCKING_SHIFT_2; | |
3288 break; | |
3289 | |
3290 case 3: /* graphic register 3 */ | |
88365 | 3291 if (CODING_ISO_FLAGS (coding) & CODING_ISO_FLAG_SINGLE_SHIFT) |
17052 | 3292 ENCODE_SINGLE_SHIFT_3; |
3293 else | |
3294 ENCODE_LOCKING_SHIFT_3; | |
3295 break; | |
3296 } | |
3297 } | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
3298 |
88365 | 3299 *p_nchars = produced_chars; |
17052 | 3300 return dst; |
3301 } | |
3302 | |
3303 /* The following three macros produce codes for indicating direction | |
3304 of text. */ | |
88365 | 3305 #define ENCODE_CONTROL_SEQUENCE_INTRODUCER \ |
3306 do { \ | |
3307 if (CODING_ISO_FLAGS (coding) == CODING_ISO_FLAG_SEVEN_BITS) \ | |
3308 EMIT_TWO_ASCII_BYTES (ISO_CODE_ESC, '['); \ | |
3309 else \ | |
3310 EMIT_ONE_BYTE (ISO_CODE_CSI); \ | |
17052 | 3311 } while (0) |
3312 | |
88365 | 3313 |
3314 #define ENCODE_DIRECTION_R2L() \ | |
3315 do { \ | |
3316 ENCODE_CONTROL_SEQUENCE_INTRODUCER (dst); \ | |
3317 EMIT_TWO_ASCII_BYTES ('2', ']'); \ | |
3318 } while (0) | |
3319 | |
3320 | |
3321 #define ENCODE_DIRECTION_L2R() \ | |
3322 do { \ | |
3323 ENCODE_CONTROL_SEQUENCE_INTRODUCER (dst); \ | |
3324 EMIT_TWO_ASCII_BYTES ('0', ']'); \ | |
3325 } while (0) | |
3326 | |
17052 | 3327 |
3328 /* Produce codes for designation and invocation to reset the graphic | |
3329 planes and registers to initial state. */ | |
88365 | 3330 #define ENCODE_RESET_PLANE_AND_REGISTER() \ |
3331 do { \ | |
3332 int reg; \ | |
3333 struct charset *charset; \ | |
3334 \ | |
3335 if (CODING_ISO_INVOCATION (coding, 0) != 0) \ | |
3336 ENCODE_SHIFT_IN; \ | |
3337 for (reg = 0; reg < 4; reg++) \ | |
3338 if (CODING_ISO_INITIAL (coding, reg) >= 0 \ | |
3339 && (CODING_ISO_DESIGNATION (coding, reg) \ | |
3340 != CODING_ISO_INITIAL (coding, reg))) \ | |
3341 { \ | |
3342 charset = CHARSET_FROM_ID (CODING_ISO_INITIAL (coding, reg)); \ | |
3343 ENCODE_DESIGNATION (charset, reg, coding); \ | |
3344 } \ | |
17052 | 3345 } while (0) |
3346 | |
88365 | 3347 |
17725
92f042f73be2
(Valternate_charset_table): The valiable deleted.
Kenichi Handa <handa@m17n.org>
parents:
17717
diff
changeset
|
3348 /* Produce designation sequences of charsets in the line started from |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
3349 SRC to a place pointed by DST, and return updated DST. |
17725
92f042f73be2
(Valternate_charset_table): The valiable deleted.
Kenichi Handa <handa@m17n.org>
parents:
17717
diff
changeset
|
3350 |
92f042f73be2
(Valternate_charset_table): The valiable deleted.
Kenichi Handa <handa@m17n.org>
parents:
17717
diff
changeset
|
3351 If the current block ends before any end-of-line, we may fail to |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
3352 find all the necessary designations. */ |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
3353 |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
3354 static unsigned char * |
88365 | 3355 encode_designation_at_bol (coding, charbuf, charbuf_end, dst) |
17119
2cfb31c15ced
(create_process, Fopen_network_stream): Typo in indexes
Kenichi Handa <handa@m17n.org>
parents:
17071
diff
changeset
|
3356 struct coding_system *coding; |
88365 | 3357 int *charbuf, *charbuf_end; |
3358 unsigned char *dst; | |
17119
2cfb31c15ced
(create_process, Fopen_network_stream): Typo in indexes
Kenichi Handa <handa@m17n.org>
parents:
17071
diff
changeset
|
3359 { |
88365 | 3360 struct charset *charset; |
17725
92f042f73be2
(Valternate_charset_table): The valiable deleted.
Kenichi Handa <handa@m17n.org>
parents:
17717
diff
changeset
|
3361 /* Table of charsets to be designated to each graphic register. */ |
92f042f73be2
(Valternate_charset_table): The valiable deleted.
Kenichi Handa <handa@m17n.org>
parents:
17717
diff
changeset
|
3362 int r[4]; |
88365 | 3363 int c, found = 0, reg; |
3364 int produced_chars = 0; | |
3365 int multibytep = coding->dst_multibyte; | |
3366 Lisp_Object attrs; | |
3367 Lisp_Object charset_list; | |
3368 | |
3369 attrs = CODING_ID_ATTRS (coding->id); | |
3370 charset_list = CODING_ATTR_CHARSET_LIST (attrs); | |
3371 if (EQ (charset_list, Qiso_2022)) | |
3372 charset_list = Viso_2022_charset_list; | |
17725
92f042f73be2
(Valternate_charset_table): The valiable deleted.
Kenichi Handa <handa@m17n.org>
parents:
17717
diff
changeset
|
3373 |
92f042f73be2
(Valternate_charset_table): The valiable deleted.
Kenichi Handa <handa@m17n.org>
parents:
17717
diff
changeset
|
3374 for (reg = 0; reg < 4; reg++) |
92f042f73be2
(Valternate_charset_table): The valiable deleted.
Kenichi Handa <handa@m17n.org>
parents:
17717
diff
changeset
|
3375 r[reg] = -1; |
92f042f73be2
(Valternate_charset_table): The valiable deleted.
Kenichi Handa <handa@m17n.org>
parents:
17717
diff
changeset
|
3376 |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
3377 while (found < 4) |
17119
2cfb31c15ced
(create_process, Fopen_network_stream): Typo in indexes
Kenichi Handa <handa@m17n.org>
parents:
17071
diff
changeset
|
3378 { |
88365 | 3379 int id; |
3380 | |
3381 c = *charbuf++; | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
3382 if (c == '\n') |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
3383 break; |
88365 | 3384 charset = char_charset (c, charset_list, NULL); |
3385 id = CHARSET_ID (charset); | |
3386 reg = CODING_ISO_REQUEST (coding, id); | |
3387 if (reg >= 0 && r[reg] < 0) | |
17725
92f042f73be2
(Valternate_charset_table): The valiable deleted.
Kenichi Handa <handa@m17n.org>
parents:
17717
diff
changeset
|
3388 { |
92f042f73be2
(Valternate_charset_table): The valiable deleted.
Kenichi Handa <handa@m17n.org>
parents:
17717
diff
changeset
|
3389 found++; |
88365 | 3390 r[reg] = id; |
17119
2cfb31c15ced
(create_process, Fopen_network_stream): Typo in indexes
Kenichi Handa <handa@m17n.org>
parents:
17071
diff
changeset
|
3391 } |
2cfb31c15ced
(create_process, Fopen_network_stream): Typo in indexes
Kenichi Handa <handa@m17n.org>
parents:
17071
diff
changeset
|
3392 } |
17725
92f042f73be2
(Valternate_charset_table): The valiable deleted.
Kenichi Handa <handa@m17n.org>
parents:
17717
diff
changeset
|
3393 |
92f042f73be2
(Valternate_charset_table): The valiable deleted.
Kenichi Handa <handa@m17n.org>
parents:
17717
diff
changeset
|
3394 if (found) |
92f042f73be2
(Valternate_charset_table): The valiable deleted.
Kenichi Handa <handa@m17n.org>
parents:
17717
diff
changeset
|
3395 { |
92f042f73be2
(Valternate_charset_table): The valiable deleted.
Kenichi Handa <handa@m17n.org>
parents:
17717
diff
changeset
|
3396 for (reg = 0; reg < 4; reg++) |
92f042f73be2
(Valternate_charset_table): The valiable deleted.
Kenichi Handa <handa@m17n.org>
parents:
17717
diff
changeset
|
3397 if (r[reg] >= 0 |
88365 | 3398 && CODING_ISO_DESIGNATION (coding, reg) != r[reg]) |
3399 ENCODE_DESIGNATION (CHARSET_FROM_ID (r[reg]), reg, coding); | |
17725
92f042f73be2
(Valternate_charset_table): The valiable deleted.
Kenichi Handa <handa@m17n.org>
parents:
17717
diff
changeset
|
3400 } |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
3401 |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
3402 return dst; |
17119
2cfb31c15ced
(create_process, Fopen_network_stream): Typo in indexes
Kenichi Handa <handa@m17n.org>
parents:
17071
diff
changeset
|
3403 } |
2cfb31c15ced
(create_process, Fopen_network_stream): Typo in indexes
Kenichi Handa <handa@m17n.org>
parents:
17071
diff
changeset
|
3404 |
17052 | 3405 /* See the above "GENERAL NOTES on `encode_coding_XXX ()' functions". */ |
3406 | |
88365 | 3407 static int |
3408 encode_coding_iso_2022 (coding) | |
17052 | 3409 struct coding_system *coding; |
3410 { | |
88365 | 3411 int multibytep = coding->dst_multibyte; |
3412 int *charbuf = coding->charbuf; | |
3413 int *charbuf_end = charbuf + coding->charbuf_used; | |
3414 unsigned char *dst = coding->destination + coding->produced; | |
3415 unsigned char *dst_end = coding->destination + coding->dst_bytes; | |
3416 int safe_room = 16; | |
3417 int bol_designation | |
3418 = (CODING_ISO_FLAGS (coding) & CODING_ISO_FLAG_DESIGNATE_AT_BOL | |
3419 && CODING_ISO_BOL (coding)); | |
3420 int produced_chars = 0; | |
3421 Lisp_Object attrs, eol_type, charset_list; | |
3422 int ascii_compatible; | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
3423 int c; |
88365 | 3424 |
3425 CODING_GET_INFO (coding, attrs, eol_type, charset_list); | |
3426 | |
3427 ascii_compatible = ! NILP (CODING_ATTR_ASCII_COMPAT (attrs)); | |
3428 | |
3429 while (charbuf < charbuf_end) | |
3430 { | |
3431 ASSURE_DESTINATION (safe_room); | |
3432 | |
3433 if (bol_designation) | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
3434 { |
88365 | 3435 unsigned char *dst_prev = dst; |
3436 | |
17725
92f042f73be2
(Valternate_charset_table): The valiable deleted.
Kenichi Handa <handa@m17n.org>
parents:
17717
diff
changeset
|
3437 /* We have to produce designation sequences if any now. */ |
88365 | 3438 dst = encode_designation_at_bol (coding, charbuf, charbuf_end, dst); |
3439 bol_designation = 0; | |
3440 /* We are sure that designation sequences are all ASCII bytes. */ | |
3441 produced_chars += dst - dst_prev; | |
17119
2cfb31c15ced
(create_process, Fopen_network_stream): Typo in indexes
Kenichi Handa <handa@m17n.org>
parents:
17071
diff
changeset
|
3442 } |
2cfb31c15ced
(create_process, Fopen_network_stream): Typo in indexes
Kenichi Handa <handa@m17n.org>
parents:
17071
diff
changeset
|
3443 |
88365 | 3444 c = *charbuf++; |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
3445 |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
3446 /* Now encode the character C. */ |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
3447 if (c < 0x20 || c == 0x7F) |
17052 | 3448 { |
88365 | 3449 if (c == '\n' |
3450 || (c == '\r' && EQ (eol_type, Qmac))) | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
3451 { |
88365 | 3452 if (CODING_ISO_FLAGS (coding) & CODING_ISO_FLAG_RESET_AT_EOL) |
3453 ENCODE_RESET_PLANE_AND_REGISTER (); | |
3454 if (CODING_ISO_FLAGS (coding) & CODING_ISO_FLAG_INIT_AT_BOL) | |
3455 { | |
3456 int i; | |
3457 | |
3458 for (i = 0; i < 4; i++) | |
3459 CODING_ISO_DESIGNATION (coding, i) | |
3460 = CODING_ISO_INITIAL (coding, i); | |
3461 } | |
3462 bol_designation | |
3463 = CODING_ISO_FLAGS (coding) & CODING_ISO_FLAG_DESIGNATE_AT_BOL; | |
19052
302a7b2a6948
(encode_coding_iso2022): Write out invalid multibyte
Kenichi Handa <handa@m17n.org>
parents:
18910
diff
changeset
|
3464 } |
88365 | 3465 else if (CODING_ISO_FLAGS (coding) & CODING_ISO_FLAG_RESET_AT_CNTL) |
3466 ENCODE_RESET_PLANE_AND_REGISTER (); | |
3467 EMIT_ONE_ASCII_BYTE (c); | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
3468 } |
88365 | 3469 else if (ASCII_CHAR_P (c)) |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
3470 { |
88365 | 3471 if (ascii_compatible) |
3472 EMIT_ONE_ASCII_BYTE (c); | |
3473 else | |
3474 ENCODE_ISO_CHARACTER (CHARSET_FROM_ID (charset_ascii), c); | |
17052 | 3475 } |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
3476 else |
88365 | 3477 { |
3478 struct charset *charset = char_charset (c, charset_list, NULL); | |
3479 | |
3480 if (!charset) | |
3481 { | |
3482 c = coding->default_char; | |
3483 charset = char_charset (c, charset_list, NULL); | |
3484 } | |
3485 ENCODE_ISO_CHARACTER (charset, c); | |
3486 } | |
3487 } | |
3488 | |
3489 if (coding->mode & CODING_MODE_LAST_BLOCK | |
3490 && CODING_ISO_FLAGS (coding) & CODING_ISO_FLAG_RESET_AT_EOL) | |
3491 { | |
3492 ASSURE_DESTINATION (safe_room); | |
3493 ENCODE_RESET_PLANE_AND_REGISTER (); | |
3494 } | |
3495 coding->result = CODING_RESULT_SUCCESS; | |
3496 CODING_ISO_BOL (coding) = bol_designation; | |
3497 coding->produced_char += produced_chars; | |
3498 coding->produced = dst - coding->destination; | |
3499 return 0; | |
17052 | 3500 } |
3501 | |
3502 | |
88365 | 3503 /*** 8,9. SJIS and BIG5 handlers ***/ |
3504 | |
3505 /* Although SJIS and BIG5 are not ISO's coding system, they are used | |
17052 | 3506 quite widely. So, for the moment, Emacs supports them in the bare |
3507 C code. But, in the future, they may be supported only by CCL. */ | |
3508 | |
3509 /* SJIS is a coding system encoding three character sets: ASCII, right | |
3510 half of JISX0201-Kana, and JISX0208. An ASCII character is encoded | |
3511 as is. A character of charset katakana-jisx0201 is encoded by | |
3512 "position-code + 0x80". A character of charset japanese-jisx0208 | |
3513 is encoded in 2-byte but two position-codes are divided and shifted | |
88365 | 3514 so that it fit in the range below. |
17052 | 3515 |
3516 --- CODE RANGE of SJIS --- | |
3517 (character set) (range) | |
3518 ASCII 0x00 .. 0x7F | |
88365 | 3519 KATAKANA-JISX0201 0xA0 .. 0xDF |
24324
2eec590faf26
(Fdecode_sjis_char, Fencode_sjis_char): Hanlde
Kenichi Handa <handa@m17n.org>
parents:
24316
diff
changeset
|
3520 JISX0208 (1st byte) 0x81 .. 0x9F and 0xE0 .. 0xEF |
23564
6eb3e346d1fd
(DECODE_CHARACTER_ASCII): Check validity of inserted
Kenichi Handa <handa@m17n.org>
parents:
23542
diff
changeset
|
3521 (2nd byte) 0x40 .. 0x7E and 0x80 .. 0xFC |
17052 | 3522 ------------------------------- |
3523 | |
3524 */ | |
3525 | |
3526 /* BIG5 is a coding system encoding two character sets: ASCII and | |
3527 Big5. An ASCII character is encoded as is. Big5 is a two-byte | |
88365 | 3528 character set and is encoded in two-byte. |
17052 | 3529 |
3530 --- CODE RANGE of BIG5 --- | |
3531 (character set) (range) | |
3532 ASCII 0x00 .. 0x7F | |
3533 Big5 (1st byte) 0xA1 .. 0xFE | |
3534 (2nd byte) 0x40 .. 0x7E and 0xA1 .. 0xFE | |
3535 -------------------------- | |
3536 | |
88365 | 3537 */ |
17052 | 3538 |
3539 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions". | |
3540 Check if a text is encoded in SJIS. If it is, return | |
88365 | 3541 CATEGORY_MASK_SJIS, else return 0. */ |
17052 | 3542 |
34531
37f85e931855
(ONE_MORE_BYTE_CHECK_MULTIBYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34197
diff
changeset
|
3543 static int |
88365 | 3544 detect_coding_sjis (coding, mask) |
3545 struct coding_system *coding; | |
3546 int *mask; | |
17052 | 3547 { |
88365 | 3548 unsigned char *src = coding->source, *src_base = src; |
3549 unsigned char *src_end = coding->source + coding->src_bytes; | |
3550 int multibytep = coding->src_multibyte; | |
3551 int consumed_chars = 0; | |
3552 int found = 0; | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
3553 int c; |
88365 | 3554 |
3555 /* A coding system of this category is always ASCII compatible. */ | |
3556 src += coding->head_ascii; | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
3557 |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
3558 while (1) |
17052 | 3559 { |
88365 | 3560 ONE_MORE_BYTE (c); |
36647
0a75ccbe42b2
(detect_coding_sjis): Do more rigid check.
Kenichi Handa <handa@m17n.org>
parents:
36520
diff
changeset
|
3561 if (c < 0x80) |
0a75ccbe42b2
(detect_coding_sjis): Do more rigid check.
Kenichi Handa <handa@m17n.org>
parents:
36520
diff
changeset
|
3562 continue; |
88365 | 3563 if ((c >= 0x81 && c <= 0x9F) || (c >= 0xE0 && c <= 0xEF)) |
17052 | 3564 { |
88365 | 3565 ONE_MORE_BYTE (c); |
36647
0a75ccbe42b2
(detect_coding_sjis): Do more rigid check.
Kenichi Handa <handa@m17n.org>
parents:
36520
diff
changeset
|
3566 if (c < 0x40 || c == 0x7F || c > 0xFC) |
88365 | 3567 break; |
3568 found = 1; | |
17052 | 3569 } |
88365 | 3570 else if (c >= 0xA0 && c < 0xE0) |
3571 found = 1; | |
3572 else | |
3573 break; | |
3574 } | |
3575 *mask &= ~CATEGORY_MASK_SJIS; | |
3576 return 0; | |
3577 | |
3578 no_more_source: | |
3579 if (!found) | |
3580 return 0; | |
3581 *mask &= CATEGORY_MASK_SJIS; | |
3582 return 1; | |
17052 | 3583 } |
3584 | |
3585 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions". | |
3586 Check if a text is encoded in BIG5. If it is, return | |
88365 | 3587 CATEGORY_MASK_BIG5, else return 0. */ |
17052 | 3588 |
34531
37f85e931855
(ONE_MORE_BYTE_CHECK_MULTIBYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34197
diff
changeset
|
3589 static int |
88365 | 3590 detect_coding_big5 (coding, mask) |
3591 struct coding_system *coding; | |
3592 int *mask; | |
17052 | 3593 { |
88365 | 3594 unsigned char *src = coding->source, *src_base = src; |
3595 unsigned char *src_end = coding->source + coding->src_bytes; | |
3596 int multibytep = coding->src_multibyte; | |
3597 int consumed_chars = 0; | |
3598 int found = 0; | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
3599 int c; |
88365 | 3600 |
3601 /* A coding system of this category is always ASCII compatible. */ | |
3602 src += coding->head_ascii; | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
3603 |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
3604 while (1) |
28022
6c41f3276340
Add comments on coding-category-utf-8,
Kenichi Handa <handa@m17n.org>
parents:
27943
diff
changeset
|
3605 { |
88365 | 3606 ONE_MORE_BYTE (c); |
3607 if (c < 0x80) | |
28022
6c41f3276340
Add comments on coding-category-utf-8,
Kenichi Handa <handa@m17n.org>
parents:
27943
diff
changeset
|
3608 continue; |
88365 | 3609 if (c >= 0xA1) |
28022
6c41f3276340
Add comments on coding-category-utf-8,
Kenichi Handa <handa@m17n.org>
parents:
27943
diff
changeset
|
3610 { |
88365 | 3611 ONE_MORE_BYTE (c); |
3612 if (c < 0x40 || (c >= 0x7F && c <= 0xA0)) | |
28022
6c41f3276340
Add comments on coding-category-utf-8,
Kenichi Handa <handa@m17n.org>
parents:
27943
diff
changeset
|
3613 return 0; |
88365 | 3614 found = 1; |
28022
6c41f3276340
Add comments on coding-category-utf-8,
Kenichi Handa <handa@m17n.org>
parents:
27943
diff
changeset
|
3615 } |
88365 | 3616 else |
3617 break; | |
3618 } | |
3619 *mask &= ~CATEGORY_MASK_BIG5; | |
28022
6c41f3276340
Add comments on coding-category-utf-8,
Kenichi Handa <handa@m17n.org>
parents:
27943
diff
changeset
|
3620 return 0; |
88365 | 3621 |
3622 no_more_source: | |
3623 if (!found) | |
3624 return 0; | |
3625 *mask &= CATEGORY_MASK_BIG5; | |
3626 return 1; | |
28022
6c41f3276340
Add comments on coding-category-utf-8,
Kenichi Handa <handa@m17n.org>
parents:
27943
diff
changeset
|
3627 } |
6c41f3276340
Add comments on coding-category-utf-8,
Kenichi Handa <handa@m17n.org>
parents:
27943
diff
changeset
|
3628 |
17052 | 3629 /* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions". |
3630 If SJIS_P is 1, decode SJIS text, else decode BIG5 test. */ | |
3631 | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
3632 static void |
88365 | 3633 decode_coding_sjis (coding) |
17052 | 3634 struct coding_system *coding; |
3635 { | |
88365 | 3636 unsigned char *src = coding->source + coding->consumed; |
3637 unsigned char *src_end = coding->source + coding->src_bytes; | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
3638 unsigned char *src_base; |
88365 | 3639 int *charbuf = coding->charbuf; |
3640 int *charbuf_end = charbuf + coding->charbuf_size; | |
3641 int consumed_chars = 0, consumed_chars_base; | |
3642 int multibytep = coding->src_multibyte; | |
3643 struct charset *charset_roman, *charset_kanji, *charset_kana; | |
3644 Lisp_Object attrs, eol_type, charset_list, val; | |
3645 | |
3646 CODING_GET_INFO (coding, attrs, eol_type, charset_list); | |
3647 | |
3648 val = charset_list; | |
3649 charset_roman = CHARSET_FROM_ID (XINT (XCAR (val))), val = XCDR (val); | |
3650 charset_kanji = CHARSET_FROM_ID (XINT (XCAR (val))), val = XCDR (val); | |
3651 charset_kana = CHARSET_FROM_ID (XINT (XCAR (val))); | |
3652 | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
3653 while (1) |
17052 | 3654 { |
88365 | 3655 int c, c1; |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
3656 |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
3657 src_base = src; |
88365 | 3658 consumed_chars_base = consumed_chars; |
3659 | |
3660 if (charbuf >= charbuf_end) | |
3661 break; | |
3662 | |
3663 ONE_MORE_BYTE (c); | |
3664 | |
3665 if (c == '\r') | |
17052 | 3666 { |
88365 | 3667 if (EQ (eol_type, Qdos)) |
17052 | 3668 { |
88365 | 3669 if (src == src_end) |
3670 goto no_more_source; | |
3671 if (*src == '\n') | |
3672 ONE_MORE_BYTE (c); | |
17052 | 3673 } |
88365 | 3674 else if (EQ (eol_type, Qmac)) |
3675 c = '\n'; | |
17052 | 3676 } |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
3677 else |
88365 | 3678 { |
3679 struct charset *charset; | |
3680 | |
3681 if (c < 0x80) | |
3682 charset = charset_roman; | |
3683 else | |
17052 | 3684 { |
88365 | 3685 if (c >= 0xF0) |
3686 goto invalid_code; | |
3687 if (c < 0xA0 || c >= 0xE0) | |
20931
068eb408c911
(decode_coding_iso2022): Update coding->fake_multibyte.
Kenichi Handa <handa@m17n.org>
parents:
20803
diff
changeset
|
3688 { |
22616
c493ce6a31e4
(setup_raw_text_coding_system): New function.
Kenichi Handa <handa@m17n.org>
parents:
22529
diff
changeset
|
3689 /* SJIS -> JISX0208 */ |
88365 | 3690 ONE_MORE_BYTE (c1); |
3691 if (c1 < 0x40 || c1 == 0x7F || c1 > 0xFC) | |
3692 goto invalid_code; | |
3693 c = (c << 8) | c1; | |
3694 SJIS_TO_JIS (c); | |
3695 charset = charset_kanji; | |
24870
b0f6eab5deeb
(decode_coding_sjis_big5): Avoid compiler warning.
Kenichi Handa <handa@m17n.org>
parents:
24822
diff
changeset
|
3696 } |
20931
068eb408c911
(decode_coding_iso2022): Update coding->fake_multibyte.
Kenichi Handa <handa@m17n.org>
parents:
20803
diff
changeset
|
3697 else |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
3698 /* SJIS -> JISX0201-Kana */ |
88365 | 3699 charset = charset_kana; |
20931
068eb408c911
(decode_coding_iso2022): Update coding->fake_multibyte.
Kenichi Handa <handa@m17n.org>
parents:
20803
diff
changeset
|
3700 } |
88365 | 3701 CODING_DECODE_CHAR (coding, src, src_base, src_end, charset, c, c); |
3702 } | |
3703 *charbuf++ = c; | |
3704 continue; | |
3705 | |
3706 invalid_code: | |
3707 src = src_base; | |
3708 consumed_chars = consumed_chars_base; | |
3709 ONE_MORE_BYTE (c); | |
3710 *charbuf++ = ASCII_BYTE_P (c) ? c : BYTE8_TO_CHAR (c); | |
3711 coding->errors++; | |
3712 } | |
3713 | |
3714 no_more_source: | |
3715 coding->consumed_char += consumed_chars_base; | |
3716 coding->consumed = src_base - coding->source; | |
3717 coding->charbuf_used = charbuf - coding->charbuf; | |
3718 } | |
3719 | |
3720 static void | |
3721 decode_coding_big5 (coding) | |
3722 struct coding_system *coding; | |
3723 { | |
3724 unsigned char *src = coding->source + coding->consumed; | |
3725 unsigned char *src_end = coding->source + coding->src_bytes; | |
3726 unsigned char *src_base; | |
3727 int *charbuf = coding->charbuf; | |
3728 int *charbuf_end = charbuf + coding->charbuf_size; | |
3729 int consumed_chars = 0, consumed_chars_base; | |
3730 int multibytep = coding->src_multibyte; | |
3731 struct charset *charset_roman, *charset_big5; | |
3732 Lisp_Object attrs, eol_type, charset_list, val; | |
3733 | |
3734 CODING_GET_INFO (coding, attrs, eol_type, charset_list); | |
3735 val = charset_list; | |
3736 charset_roman = CHARSET_FROM_ID (XINT (XCAR (val))), val = XCDR (val); | |
3737 charset_big5 = CHARSET_FROM_ID (XINT (XCAR (val))); | |
3738 | |
3739 while (1) | |
3740 { | |
3741 int c, c1; | |
3742 | |
3743 src_base = src; | |
3744 consumed_chars_base = consumed_chars; | |
3745 | |
3746 if (charbuf >= charbuf_end) | |
3747 break; | |
3748 | |
3749 ONE_MORE_BYTE (c); | |
3750 | |
3751 if (c == '\r') | |
3752 { | |
3753 if (EQ (eol_type, Qdos)) | |
3754 { | |
3755 if (src == src_end) | |
3756 goto no_more_source; | |
3757 if (*src == '\n') | |
3758 ONE_MORE_BYTE (c); | |
3759 } | |
3760 else if (EQ (eol_type, Qmac)) | |
3761 c = '\n'; | |
3762 } | |
3763 else | |
3764 { | |
3765 struct charset *charset; | |
3766 if (c < 0x80) | |
3767 charset = charset_roman; | |
20931
068eb408c911
(decode_coding_iso2022): Update coding->fake_multibyte.
Kenichi Handa <handa@m17n.org>
parents:
20803
diff
changeset
|
3768 else |
068eb408c911
(decode_coding_iso2022): Update coding->fake_multibyte.
Kenichi Handa <handa@m17n.org>
parents:
20803
diff
changeset
|
3769 { |
22616
c493ce6a31e4
(setup_raw_text_coding_system): New function.
Kenichi Handa <handa@m17n.org>
parents:
22529
diff
changeset
|
3770 /* BIG5 -> Big5 */ |
88365 | 3771 if (c < 0xA1 || c > 0xFE) |
3772 goto invalid_code; | |
3773 ONE_MORE_BYTE (c1); | |
3774 if (c1 < 0x40 || (c1 > 0x7E && c1 < 0xA1) || c1 > 0xFE) | |
3775 goto invalid_code; | |
3776 c = c << 8 | c1; | |
3777 charset = charset_big5; | |
20931
068eb408c911
(decode_coding_iso2022): Update coding->fake_multibyte.
Kenichi Handa <handa@m17n.org>
parents:
20803
diff
changeset
|
3778 } |
88365 | 3779 CODING_DECODE_CHAR (coding, src, src_base, src_end, charset, c, c); |
20931
068eb408c911
(decode_coding_iso2022): Update coding->fake_multibyte.
Kenichi Handa <handa@m17n.org>
parents:
20803
diff
changeset
|
3780 } |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
3781 |
88365 | 3782 *charbuf++ = c; |
20931
068eb408c911
(decode_coding_iso2022): Update coding->fake_multibyte.
Kenichi Handa <handa@m17n.org>
parents:
20803
diff
changeset
|
3783 continue; |
068eb408c911
(decode_coding_iso2022): Update coding->fake_multibyte.
Kenichi Handa <handa@m17n.org>
parents:
20803
diff
changeset
|
3784 |
88365 | 3785 invalid_code: |
17052 | 3786 src = src_base; |
88365 | 3787 consumed_chars = consumed_chars_base; |
3788 ONE_MORE_BYTE (c); | |
3789 *charbuf++ = ASCII_BYTE_P (c) ? c : BYTE8_TO_CHAR (c); | |
3790 coding->errors++; | |
3791 } | |
3792 | |
3793 no_more_source: | |
3794 coding->consumed_char += consumed_chars_base; | |
3795 coding->consumed = src_base - coding->source; | |
3796 coding->charbuf_used = charbuf - coding->charbuf; | |
17052 | 3797 } |
3798 | |
3799 /* See the above "GENERAL NOTES on `encode_coding_XXX ()' functions". | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
3800 This function can encode charsets `ascii', `katakana-jisx0201', |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
3801 `japanese-jisx0208', `chinese-big5-1', and `chinese-big5-2'. We |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
3802 are sure that all these charsets are registered as official charset |
17052 | 3803 (i.e. do not have extended leading-codes). Characters of other |
3804 charsets are produced without any encoding. If SJIS_P is 1, encode | |
3805 SJIS text, else encode BIG5 text. */ | |
3806 | |
88365 | 3807 static int |
3808 encode_coding_sjis (coding) | |
17052 | 3809 struct coding_system *coding; |
3810 { | |
88365 | 3811 int multibytep = coding->dst_multibyte; |
3812 int *charbuf = coding->charbuf; | |
3813 int *charbuf_end = charbuf + coding->charbuf_used; | |
3814 unsigned char *dst = coding->destination + coding->produced; | |
3815 unsigned char *dst_end = coding->destination + coding->dst_bytes; | |
3816 int safe_room = 4; | |
3817 int produced_chars = 0; | |
3818 Lisp_Object attrs, eol_type, charset_list, val; | |
3819 int ascii_compatible; | |
3820 struct charset *charset_roman, *charset_kanji, *charset_kana; | |
3821 int c; | |
3822 | |
3823 CODING_GET_INFO (coding, attrs, eol_type, charset_list); | |
3824 val = charset_list; | |
3825 charset_roman = CHARSET_FROM_ID (XINT (XCAR (val))), val = XCDR (val); | |
3826 charset_kana = CHARSET_FROM_ID (XINT (XCAR (val))), val = XCDR (val); | |
3827 charset_kanji = CHARSET_FROM_ID (XINT (XCAR (val))); | |
3828 | |
3829 ascii_compatible = ! NILP (CODING_ATTR_ASCII_COMPAT (attrs)); | |
3830 | |
3831 while (charbuf < charbuf_end) | |
3832 { | |
3833 ASSURE_DESTINATION (safe_room); | |
3834 c = *charbuf++; | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
3835 /* Now encode the character C. */ |
88365 | 3836 if (ASCII_CHAR_P (c) && ascii_compatible) |
3837 EMIT_ONE_ASCII_BYTE (c); | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
3838 else |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
3839 { |
88365 | 3840 unsigned code; |
3841 struct charset *charset = char_charset (c, charset_list, &code); | |
3842 | |
3843 if (!charset) | |
3844 { | |
3845 c = coding->default_char; | |
3846 charset = char_charset (c, charset_list, &code); | |
3847 } | |
3848 if (code == CHARSET_INVALID_CODE (charset)) | |
3849 abort (); | |
3850 if (charset == charset_kanji) | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
3851 { |
88365 | 3852 int c1, c2; |
3853 JIS_TO_SJIS (code); | |
3854 c1 = code >> 8, c2 = code & 0xFF; | |
3855 EMIT_TWO_BYTES (c1, c2); | |
3856 } | |
3857 else if (charset == charset_kana) | |
3858 EMIT_ONE_BYTE (code | 0x80); | |
3859 else | |
3860 EMIT_ONE_ASCII_BYTE (code & 0x7F); | |
3861 } | |
3862 } | |
3863 coding->result = CODING_RESULT_SUCCESS; | |
3864 coding->produced_char += produced_chars; | |
3865 coding->produced = dst - coding->destination; | |
3866 return 0; | |
3867 } | |
3868 | |
3869 static int | |
3870 encode_coding_big5 (coding) | |
3871 struct coding_system *coding; | |
3872 { | |
3873 int multibytep = coding->dst_multibyte; | |
3874 int *charbuf = coding->charbuf; | |
3875 int *charbuf_end = charbuf + coding->charbuf_used; | |
3876 unsigned char *dst = coding->destination + coding->produced; | |
3877 unsigned char *dst_end = coding->destination + coding->dst_bytes; | |
3878 int safe_room = 4; | |
3879 int produced_chars = 0; | |
3880 Lisp_Object attrs, eol_type, charset_list, val; | |
3881 int ascii_compatible; | |
3882 struct charset *charset_roman, *charset_big5; | |
3883 int c; | |
3884 | |
3885 CODING_GET_INFO (coding, attrs, eol_type, charset_list); | |
3886 val = charset_list; | |
3887 charset_roman = CHARSET_FROM_ID (XINT (XCAR (val))), val = XCDR (val); | |
3888 charset_big5 = CHARSET_FROM_ID (XINT (XCAR (val))); | |
3889 ascii_compatible = ! NILP (CODING_ATTR_ASCII_COMPAT (attrs)); | |
3890 | |
3891 while (charbuf < charbuf_end) | |
3892 { | |
3893 ASSURE_DESTINATION (safe_room); | |
3894 c = *charbuf++; | |
3895 /* Now encode the character C. */ | |
3896 if (ASCII_CHAR_P (c) && ascii_compatible) | |
3897 EMIT_ONE_ASCII_BYTE (c); | |
3898 else | |
3899 { | |
3900 unsigned code; | |
3901 struct charset *charset = char_charset (c, charset_list, &code); | |
3902 | |
3903 if (! charset) | |
3904 { | |
3905 c = coding->default_char; | |
3906 charset = char_charset (c, charset_list, &code); | |
3907 } | |
3908 if (code == CHARSET_INVALID_CODE (charset)) | |
3909 abort (); | |
3910 if (charset == charset_big5) | |
3911 { | |
3912 int c1, c2; | |
3913 | |
3914 c1 = code >> 8, c2 = code & 0xFF; | |
3915 EMIT_TWO_BYTES (c1, c2); | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
3916 } |
17052 | 3917 else |
88365 | 3918 EMIT_ONE_ASCII_BYTE (code & 0x7F); |
17052 | 3919 } |
88365 | 3920 } |
3921 coding->result = CODING_RESULT_SUCCESS; | |
3922 coding->produced_char += produced_chars; | |
3923 coding->produced = dst - coding->destination; | |
3924 return 0; | |
17052 | 3925 } |
3926 | |
3927 | |
88365 | 3928 /*** 10. CCL handlers ***/ |
22874
b133f07a76db
(Qvalid_codes): New variable.
Kenichi Handa <handa@m17n.org>
parents:
22812
diff
changeset
|
3929 |
b133f07a76db
(Qvalid_codes): New variable.
Kenichi Handa <handa@m17n.org>
parents:
22812
diff
changeset
|
3930 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions". |
b133f07a76db
(Qvalid_codes): New variable.
Kenichi Handa <handa@m17n.org>
parents:
22812
diff
changeset
|
3931 Check if a text is encoded in a coding system of which |
b133f07a76db
(Qvalid_codes): New variable.
Kenichi Handa <handa@m17n.org>
parents:
22812
diff
changeset
|
3932 encoder/decoder are written in CCL program. If it is, return |
88365 | 3933 CATEGORY_MASK_CCL, else return 0. */ |
22874
b133f07a76db
(Qvalid_codes): New variable.
Kenichi Handa <handa@m17n.org>
parents:
22812
diff
changeset
|
3934 |
34531
37f85e931855
(ONE_MORE_BYTE_CHECK_MULTIBYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34197
diff
changeset
|
3935 static int |
88365 | 3936 detect_coding_ccl (coding, mask) |
3937 struct coding_system *coding; | |
3938 int *mask; | |
3939 { | |
3940 unsigned char *src = coding->source, *src_base = src; | |
3941 unsigned char *src_end = coding->source + coding->src_bytes; | |
3942 int multibytep = coding->src_multibyte; | |
3943 int consumed_chars = 0; | |
3944 int found = 0; | |
3945 unsigned char *valids = CODING_CCL_VALIDS (coding); | |
3946 int head_ascii = coding->head_ascii; | |
3947 Lisp_Object attrs; | |
3948 | |
3949 coding = &coding_categories[coding_category_ccl]; | |
3950 attrs = CODING_ID_ATTRS (coding->id); | |
3951 if (! NILP (CODING_ATTR_ASCII_COMPAT (attrs))) | |
3952 src += head_ascii; | |
3953 | |
3954 while (1) | |
3955 { | |
3956 int c; | |
3957 ONE_MORE_BYTE (c); | |
3958 if (! valids[c]) | |
3959 break; | |
3960 if (!found && valids[c] > 1) | |
3961 found = 1; | |
3962 } | |
3963 *mask &= ~CATEGORY_MASK_CCL; | |
3964 return 0; | |
3965 | |
3966 no_more_source: | |
3967 if (!found) | |
3968 return 0; | |
3969 *mask &= CATEGORY_MASK_CCL; | |
3970 return 1; | |
3971 } | |
3972 | |
3973 static void | |
3974 decode_coding_ccl (coding) | |
3975 struct coding_system *coding; | |
22874
b133f07a76db
(Qvalid_codes): New variable.
Kenichi Handa <handa@m17n.org>
parents:
22812
diff
changeset
|
3976 { |
88365 | 3977 unsigned char *src = coding->source + coding->consumed; |
3978 unsigned char *src_end = coding->source + coding->src_bytes; | |
3979 int *charbuf = coding->charbuf; | |
3980 int *charbuf_end = charbuf + coding->charbuf_size; | |
3981 int consumed_chars = 0; | |
3982 int multibytep = coding->src_multibyte; | |
3983 struct ccl_program ccl; | |
3984 int source_charbuf[1024]; | |
3985 int source_byteidx[1024]; | |
3986 | |
3987 setup_ccl_program (&ccl, CODING_CCL_DECODER (coding)); | |
3988 | |
3989 while (src < src_end) | |
3990 { | |
3991 unsigned char *p = src; | |
3992 int *source, *source_end; | |
3993 int i = 0; | |
3994 | |
3995 if (multibytep) | |
3996 while (i < 1024 && p < src_end) | |
3997 { | |
3998 source_byteidx[i] = p - src; | |
3999 source_charbuf[i++] = STRING_CHAR_ADVANCE (p); | |
4000 } | |
4001 else | |
4002 while (i < 1024 && p < src_end) | |
4003 source_charbuf[i++] = *p++; | |
4004 | |
4005 if (p == src_end && coding->mode & CODING_MODE_LAST_BLOCK) | |
4006 ccl.last_block = 1; | |
4007 | |
4008 source = source_charbuf; | |
4009 source_end = source + i; | |
4010 while (source < source_end) | |
4011 { | |
4012 ccl_driver (&ccl, source, charbuf, | |
4013 source_end - source, charbuf_end - charbuf); | |
4014 source += ccl.consumed; | |
4015 charbuf += ccl.produced; | |
4016 if (ccl.status != CCL_STAT_SUSPEND_BY_DST) | |
4017 break; | |
4018 } | |
4019 if (source < source_end) | |
4020 src += source_byteidx[source - source_charbuf]; | |
4021 else | |
4022 src = p; | |
4023 consumed_chars += source - source_charbuf; | |
4024 | |
4025 if (ccl.status != CCL_STAT_SUSPEND_BY_SRC | |
4026 && ccl.status != CODING_RESULT_INSUFFICIENT_SRC) | |
4027 break; | |
4028 } | |
4029 | |
4030 switch (ccl.status) | |
4031 { | |
4032 case CCL_STAT_SUSPEND_BY_SRC: | |
4033 coding->result = CODING_RESULT_INSUFFICIENT_SRC; | |
4034 break; | |
4035 case CCL_STAT_SUSPEND_BY_DST: | |
4036 break; | |
4037 case CCL_STAT_QUIT: | |
4038 case CCL_STAT_INVALID_CMD: | |
4039 coding->result = CODING_RESULT_INTERRUPT; | |
4040 break; | |
4041 default: | |
4042 coding->result = CODING_RESULT_SUCCESS; | |
4043 break; | |
4044 } | |
4045 coding->consumed_char += consumed_chars; | |
4046 coding->consumed = src - coding->source; | |
4047 coding->charbuf_used = charbuf - coding->charbuf; | |
22874
b133f07a76db
(Qvalid_codes): New variable.
Kenichi Handa <handa@m17n.org>
parents:
22812
diff
changeset
|
4048 } |
b133f07a76db
(Qvalid_codes): New variable.
Kenichi Handa <handa@m17n.org>
parents:
22812
diff
changeset
|
4049 |
88365 | 4050 static int |
4051 encode_coding_ccl (coding) | |
4052 struct coding_system *coding; | |
4053 { | |
4054 struct ccl_program ccl; | |
4055 int multibytep = coding->dst_multibyte; | |
4056 int *charbuf = coding->charbuf; | |
4057 int *charbuf_end = charbuf + coding->charbuf_used; | |
4058 unsigned char *dst = coding->destination + coding->produced; | |
4059 unsigned char *dst_end = coding->destination + coding->dst_bytes; | |
4060 unsigned char *adjusted_dst_end = dst_end - 1; | |
4061 int destination_charbuf[1024]; | |
4062 int i, produced_chars = 0; | |
4063 | |
4064 setup_ccl_program (&ccl, CODING_CCL_ENCODER (coding)); | |
4065 | |
4066 ccl.last_block = coding->mode & CODING_MODE_LAST_BLOCK; | |
4067 ccl.dst_multibyte = coding->dst_multibyte; | |
4068 | |
4069 while (charbuf < charbuf_end && dst < adjusted_dst_end) | |
4070 { | |
4071 int dst_bytes = dst_end - dst; | |
4072 if (dst_bytes > 1024) | |
4073 dst_bytes = 1024; | |
4074 | |
4075 ccl_driver (&ccl, charbuf, destination_charbuf, | |
4076 charbuf_end - charbuf, dst_bytes); | |
4077 charbuf += ccl.consumed; | |
4078 if (multibytep) | |
4079 for (i = 0; i < ccl.produced; i++) | |
4080 EMIT_ONE_BYTE (destination_charbuf[i] & 0xFF); | |
4081 else | |
4082 { | |
4083 for (i = 0; i < ccl.produced; i++) | |
4084 *dst++ = destination_charbuf[i] & 0xFF; | |
4085 produced_chars += ccl.produced; | |
4086 } | |
4087 } | |
4088 | |
4089 switch (ccl.status) | |
4090 { | |
4091 case CCL_STAT_SUSPEND_BY_SRC: | |
4092 coding->result = CODING_RESULT_INSUFFICIENT_SRC; | |
4093 break; | |
4094 case CCL_STAT_SUSPEND_BY_DST: | |
4095 coding->result = CODING_RESULT_INSUFFICIENT_DST; | |
4096 break; | |
4097 case CCL_STAT_QUIT: | |
4098 case CCL_STAT_INVALID_CMD: | |
4099 coding->result = CODING_RESULT_INTERRUPT; | |
4100 break; | |
4101 default: | |
4102 coding->result = CODING_RESULT_SUCCESS; | |
4103 break; | |
4104 } | |
4105 | |
4106 coding->produced_char += produced_chars; | |
4107 coding->produced = dst - coding->destination; | |
4108 return 0; | |
4109 } | |
4110 | |
4111 | |
22874
b133f07a76db
(Qvalid_codes): New variable.
Kenichi Handa <handa@m17n.org>
parents:
22812
diff
changeset
|
4112 |
88365 | 4113 /*** 10, 11. no-conversion handlers ***/ |
17052 | 4114 |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
4115 /* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions". */ |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
4116 |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
4117 static void |
88365 | 4118 decode_coding_raw_text (coding) |
17052 | 4119 struct coding_system *coding; |
4120 { | |
88365 | 4121 coding->chars_at_source = 1; |
4122 coding->consumed_char = coding->src_chars; | |
4123 coding->consumed = coding->src_bytes; | |
4124 coding->result = CODING_RESULT_SUCCESS; | |
4125 } | |
4126 | |
4127 static int | |
4128 encode_coding_raw_text (coding) | |
4129 struct coding_system *coding; | |
4130 { | |
4131 int multibytep = coding->dst_multibyte; | |
4132 int *charbuf = coding->charbuf; | |
4133 int *charbuf_end = coding->charbuf + coding->charbuf_used; | |
4134 unsigned char *dst = coding->destination + coding->produced; | |
4135 unsigned char *dst_end = coding->destination + coding->dst_bytes; | |
4136 int produced_chars = 0; | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
4137 int c; |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
4138 |
88365 | 4139 if (multibytep) |
4140 { | |
4141 int safe_room = MAX_MULTIBYTE_LENGTH * 2; | |
4142 | |
4143 if (coding->src_multibyte) | |
4144 while (charbuf < charbuf_end) | |
4145 { | |
4146 ASSURE_DESTINATION (safe_room); | |
4147 c = *charbuf++; | |
4148 if (ASCII_CHAR_P (c)) | |
4149 EMIT_ONE_ASCII_BYTE (c); | |
4150 else if (CHAR_BYTE8_P (c)) | |
4151 { | |
4152 c = CHAR_TO_BYTE8 (c); | |
4153 EMIT_ONE_BYTE (c); | |
4154 } | |
4155 else | |
4156 { | |
4157 unsigned char str[MAX_MULTIBYTE_LENGTH], *p0 = str, *p1 = str; | |
4158 | |
4159 CHAR_STRING_ADVANCE (c, p1); | |
4160 while (p0 < p1) | |
4161 EMIT_ONE_BYTE (*p0); | |
4162 } | |
4163 } | |
4164 else | |
4165 while (charbuf < charbuf_end) | |
4166 { | |
4167 ASSURE_DESTINATION (safe_room); | |
4168 c = *charbuf++; | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
4169 EMIT_ONE_BYTE (c); |
88365 | 4170 } |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
4171 } |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
4172 else |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
4173 { |
88365 | 4174 if (coding->src_multibyte) |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
4175 { |
88365 | 4176 int safe_room = MAX_MULTIBYTE_LENGTH; |
4177 | |
4178 while (charbuf < charbuf_end) | |
4179 { | |
4180 ASSURE_DESTINATION (safe_room); | |
4181 c = *charbuf++; | |
4182 if (ASCII_CHAR_P (c)) | |
4183 *dst++ = c; | |
4184 else if (CHAR_BYTE8_P (c)) | |
4185 *dst++ = CHAR_TO_BYTE8 (c); | |
4186 else | |
4187 CHAR_STRING_ADVANCE (c, dst); | |
4188 produced_chars++; | |
4189 } | |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
4190 } |
20931
068eb408c911
(decode_coding_iso2022): Update coding->fake_multibyte.
Kenichi Handa <handa@m17n.org>
parents:
20803
diff
changeset
|
4191 else |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
4192 { |
88365 | 4193 ASSURE_DESTINATION (charbuf_end - charbuf); |
4194 while (charbuf < charbuf_end && dst < dst_end) | |
4195 *dst++ = *charbuf++; | |
4196 produced_chars = dst - (coding->destination + coding->dst_bytes); | |
4197 } | |
4198 } | |
4199 coding->result = CODING_RESULT_SUCCESS; | |
4200 coding->produced_char += produced_chars; | |
4201 coding->produced = dst - coding->destination; | |
4202 return 0; | |
4203 } | |
4204 | |
4205 static int | |
4206 detect_coding_charset (coding, mask) | |
4207 struct coding_system *coding; | |
4208 int *mask; | |
4209 { | |
4210 unsigned char *src = coding->source, *src_base = src; | |
4211 unsigned char *src_end = coding->source + coding->src_bytes; | |
4212 int multibytep = coding->src_multibyte; | |
4213 int consumed_chars = 0; | |
4214 Lisp_Object attrs, valids; | |
4215 | |
4216 coding = &coding_categories[coding_category_charset]; | |
4217 attrs = CODING_ID_ATTRS (coding->id); | |
4218 valids = AREF (attrs, coding_attr_charset_valids); | |
4219 | |
4220 if (! NILP (CODING_ATTR_ASCII_COMPAT (attrs))) | |
4221 src += coding->head_ascii; | |
4222 | |
4223 while (1) | |
4224 { | |
4225 int c; | |
4226 | |
4227 ONE_MORE_BYTE (c); | |
4228 if (NILP (AREF (valids, c))) | |
4229 break; | |
4230 } | |
4231 *mask &= ~CATEGORY_MASK_CHARSET; | |
4232 return 0; | |
4233 | |
4234 no_more_source: | |
4235 *mask &= CATEGORY_MASK_CHARSET; | |
4236 return 1; | |
4237 } | |
4238 | |
4239 static void | |
4240 decode_coding_charset (coding) | |
4241 struct coding_system *coding; | |
4242 { | |
4243 unsigned char *src = coding->source + coding->consumed; | |
4244 unsigned char *src_end = coding->source + coding->src_bytes; | |
4245 unsigned char *src_base; | |
4246 int *charbuf = coding->charbuf; | |
4247 int *charbuf_end = charbuf + coding->charbuf_size; | |
4248 int consumed_chars = 0, consumed_chars_base; | |
4249 int multibytep = coding->src_multibyte; | |
4250 struct charset *charset; | |
4251 Lisp_Object attrs, eol_type, charset_list; | |
4252 | |
4253 CODING_GET_INFO (coding, attrs, eol_type, charset_list); | |
4254 charset = CHARSET_FROM_ID (XINT (XCAR (charset_list))); | |
4255 | |
4256 while (1) | |
4257 { | |
4258 int c, c1; | |
4259 | |
4260 src_base = src; | |
4261 consumed_chars_base = consumed_chars; | |
4262 | |
4263 if (charbuf >= charbuf_end) | |
4264 break; | |
4265 | |
4266 ONE_MORE_BYTE (c1); | |
4267 if (c == '\r') | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
4268 { |
88365 | 4269 if (EQ (eol_type, Qdos)) |
4270 { | |
4271 if (src == src_end) | |
4272 goto no_more_source; | |
4273 if (*src == '\n') | |
4274 ONE_MORE_BYTE (c); | |
4275 } | |
4276 else if (EQ (eol_type, Qmac)) | |
4277 c = '\n'; | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
4278 } |
88365 | 4279 else |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
4280 { |
88365 | 4281 CODING_DECODE_CHAR (coding, src, src_base, src_end, charset, c1, c); |
4282 if (c < 0) | |
4283 goto invalid_code; | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
4284 } |
88365 | 4285 *charbuf++ = c; |
4286 continue; | |
4287 | |
4288 invalid_code: | |
4289 src = src_base; | |
4290 consumed_chars = consumed_chars_base; | |
4291 ONE_MORE_BYTE (c); | |
4292 *charbuf++ = ASCII_BYTE_P (c) ? c : BYTE8_TO_CHAR (c); | |
4293 coding->errors++; | |
4294 } | |
4295 | |
4296 no_more_source: | |
4297 coding->consumed_char += consumed_chars_base; | |
4298 coding->consumed = src_base - coding->source; | |
4299 coding->charbuf_used = charbuf - coding->charbuf; | |
4300 } | |
4301 | |
4302 static int | |
4303 encode_coding_charset (coding) | |
4304 struct coding_system *coding; | |
4305 { | |
4306 int multibytep = coding->dst_multibyte; | |
4307 int *charbuf = coding->charbuf; | |
4308 int *charbuf_end = charbuf + coding->charbuf_used; | |
4309 unsigned char *dst = coding->destination + coding->produced; | |
4310 unsigned char *dst_end = coding->destination + coding->dst_bytes; | |
4311 int safe_room = MAX_MULTIBYTE_LENGTH; | |
4312 int produced_chars = 0; | |
4313 struct charset *charset; | |
4314 Lisp_Object attrs, eol_type, charset_list; | |
4315 int ascii_compatible; | |
4316 int c; | |
4317 | |
4318 CODING_GET_INFO (coding, attrs, eol_type, charset_list); | |
4319 charset = CHARSET_FROM_ID (XINT (XCAR (charset_list))); | |
4320 ascii_compatible = ! NILP (CODING_ATTR_ASCII_COMPAT (attrs)); | |
4321 | |
4322 while (charbuf < charbuf_end) | |
4323 { | |
4324 unsigned code; | |
4325 | |
4326 ASSURE_DESTINATION (safe_room); | |
4327 c = *charbuf++; | |
4328 if (ascii_compatible && ASCII_CHAR_P (c)) | |
4329 EMIT_ONE_ASCII_BYTE (c); | |
4330 else if ((code = ENCODE_CHAR (charset, c)) | |
4331 != CHARSET_INVALID_CODE (charset)) | |
4332 EMIT_ONE_BYTE (code); | |
4333 else | |
4334 EMIT_ONE_BYTE (coding->default_char); | |
4335 } | |
4336 | |
4337 coding->result = CODING_RESULT_SUCCESS; | |
4338 coding->produced_char += produced_chars; | |
4339 coding->produced = dst - coding->destination; | |
4340 return 0; | |
17052 | 4341 } |
4342 | |
4343 | |
22874
b133f07a76db
(Qvalid_codes): New variable.
Kenichi Handa <handa@m17n.org>
parents:
22812
diff
changeset
|
4344 /*** 7. C library functions ***/ |
17052 | 4345 |
88365 | 4346 /* In Emacs Lisp, coding system is represented by a Lisp symbol which |
17052 | 4347 has a property `coding-system'. The value of this property is a |
88365 | 4348 vector of length 5 (called as coding-vector). Among elements of |
17052 | 4349 this vector, the first (element[0]) and the fifth (element[4]) |
4350 carry important information for decoding/encoding. Before | |
4351 decoding/encoding, this information should be set in fields of a | |
4352 structure of type `coding_system'. | |
4353 | |
88365 | 4354 A value of property `coding-system' can be a symbol of another |
17052 | 4355 subsidiary coding-system. In that case, Emacs gets coding-vector |
4356 from that symbol. | |
4357 | |
4358 `element[0]' contains information to be set in `coding->type'. The | |
4359 value and its meaning is as follows: | |
4360 | |
17835
f36ffb6f1208
Name change through the code:
Kenichi Handa <handa@m17n.org>
parents:
17725
diff
changeset
|
4361 0 -- coding_type_emacs_mule |
f36ffb6f1208
Name change through the code:
Kenichi Handa <handa@m17n.org>
parents:
17725
diff
changeset
|
4362 1 -- coding_type_sjis |
88365 | 4363 2 -- coding_type_iso_2022 |
17835
f36ffb6f1208
Name change through the code:
Kenichi Handa <handa@m17n.org>
parents:
17725
diff
changeset
|
4364 3 -- coding_type_big5 |
f36ffb6f1208
Name change through the code:
Kenichi Handa <handa@m17n.org>
parents:
17725
diff
changeset
|
4365 4 -- coding_type_ccl encoder/decoder written in CCL |
f36ffb6f1208
Name change through the code:
Kenichi Handa <handa@m17n.org>
parents:
17725
diff
changeset
|
4366 nil -- coding_type_no_conversion |
f36ffb6f1208
Name change through the code:
Kenichi Handa <handa@m17n.org>
parents:
17725
diff
changeset
|
4367 t -- coding_type_undecided (automatic conversion on decoding, |
f36ffb6f1208
Name change through the code:
Kenichi Handa <handa@m17n.org>
parents:
17725
diff
changeset
|
4368 no-conversion on encoding) |
17052 | 4369 |
4370 `element[4]' contains information to be set in `coding->flags' and | |
4371 `coding->spec'. The meaning varies by `coding->type'. | |
4372 | |
88365 | 4373 If `coding->type' is `coding_type_iso_2022', element[4] is a vector |
17052 | 4374 of length 32 (of which the first 13 sub-elements are used now). |
4375 Meanings of these sub-elements are: | |
4376 | |
88365 | 4377 sub-element[N] where N is 0 through 3: to be set in `coding->spec.iso_2022' |
17052 | 4378 If the value is an integer of valid charset, the charset is |
4379 assumed to be designated to graphic register N initially. | |
4380 | |
4381 If the value is minus, it is a minus value of charset which | |
4382 reserves graphic register N, which means that the charset is | |
4383 not designated initially but should be designated to graphic | |
4384 register N just before encoding a character in that charset. | |
4385 | |
4386 If the value is nil, graphic register N is never used on | |
4387 encoding. | |
88365 | 4388 |
17052 | 4389 sub-element[N] where N is 4 through 11: to be set in `coding->flags' |
4390 Each value takes t or nil. See the section ISO2022 of | |
4391 `coding.h' for more information. | |
4392 | |
4393 If `coding->type' is `coding_type_big5', element[4] is t to denote | |
4394 BIG5-ETen or nil to denote BIG5-HKU. | |
4395 | |
4396 If `coding->type' takes the other value, element[4] is ignored. | |
4397 | |
88365 | 4398 Emacs Lisp's coding system also carries information about format of |
17052 | 4399 end-of-line in a value of property `eol-type'. If the value is |
88365 | 4400 integer, 0 means eol_lf, 1 means eol_crlf, and 2 means eol_cr. If |
4401 it is not integer, it should be a vector of subsidiary coding | |
4402 systems of which property `eol-type' has one of above values. | |
17052 | 4403 |
4404 */ | |
4405 | |
88365 | 4406 /* Setup coding context CODING from information about CODING_SYSTEM. |
4407 If CODING_SYSTEM is nil, `no-conversion' is assumed. If | |
4408 CODING_SYSTEM is invalid, signal an error. */ | |
4409 | |
4410 void | |
17119
2cfb31c15ced
(create_process, Fopen_network_stream): Typo in indexes
Kenichi Handa <handa@m17n.org>
parents:
17071
diff
changeset
|
4411 setup_coding_system (coding_system, coding) |
2cfb31c15ced
(create_process, Fopen_network_stream): Typo in indexes
Kenichi Handa <handa@m17n.org>
parents:
17071
diff
changeset
|
4412 Lisp_Object coding_system; |
17052 | 4413 struct coding_system *coding; |
4414 { | |
88365 | 4415 Lisp_Object attrs; |
4416 Lisp_Object eol_type; | |
4417 Lisp_Object coding_type; | |
20105
c017642863c2
(Qcoding_system_spec): Deleted.
Kenichi Handa <handa@m17n.org>
parents:
19824
diff
changeset
|
4418 Lisp_Object val; |
17052 | 4419 |
24460
be35d27a4bfb
(setup_coding_system): Check for CODING_SYSTEM = nil.
Kenichi Handa <handa@m17n.org>
parents:
24425
diff
changeset
|
4420 if (NILP (coding_system)) |
88365 | 4421 coding_system = Qno_conversion; |
4422 | |
4423 CHECK_CODING_SYSTEM_GET_ID (coding_system, coding->id); | |
4424 | |
4425 attrs = CODING_ID_ATTRS (coding->id); | |
4426 eol_type = CODING_ID_EOL_TYPE (coding->id); | |
4427 | |
4428 coding->mode = 0; | |
4429 coding->head_ascii = -1; | |
4430 coding->common_flags | |
4431 = (VECTORP (eol_type) ? CODING_REQUIRE_DETECTION_MASK : 0); | |
4432 | |
4433 val = CODING_ATTR_SAFE_CHARSETS (attrs); | |
4434 coding->max_charset_id = XSTRING (val)->size - 1; | |
4435 coding->safe_charsets = (char *) XSTRING (val)->data; | |
4436 coding->default_char = XINT (CODING_ATTR_DEFAULT_CHAR (attrs)); | |
4437 | |
4438 coding_type = CODING_ATTR_TYPE (attrs); | |
4439 if (EQ (coding_type, Qundecided)) | |
4440 { | |
4441 coding->detector = NULL; | |
4442 coding->decoder = decode_coding_raw_text; | |
4443 coding->encoder = encode_coding_raw_text; | |
4444 coding->common_flags |= CODING_REQUIRE_DETECTION_MASK; | |
4445 } | |
4446 else if (EQ (coding_type, Qiso_2022)) | |
4447 { | |
4448 int i; | |
4449 int flags = XINT (AREF (attrs, coding_attr_iso_flags)); | |
4450 | |
4451 /* Invoke graphic register 0 to plane 0. */ | |
4452 CODING_ISO_INVOCATION (coding, 0) = 0; | |
4453 /* Invoke graphic register 1 to plane 1 if we can use 8-bit. */ | |
4454 CODING_ISO_INVOCATION (coding, 1) | |
4455 = (flags & CODING_ISO_FLAG_SEVEN_BITS ? -1 : 1); | |
4456 /* Setup the initial status of designation. */ | |
4457 for (i = 0; i < 4; i++) | |
4458 CODING_ISO_DESIGNATION (coding, i) = CODING_ISO_INITIAL (coding, i); | |
4459 /* Not single shifting initially. */ | |
4460 CODING_ISO_SINGLE_SHIFTING (coding) = 0; | |
4461 /* Beginning of buffer should also be regarded as bol. */ | |
4462 CODING_ISO_BOL (coding) = 1; | |
4463 coding->detector = detect_coding_iso_2022; | |
4464 coding->decoder = decode_coding_iso_2022; | |
4465 coding->encoder = encode_coding_iso_2022; | |
4466 if (flags & CODING_ISO_FLAG_SAFE) | |
4467 coding->mode |= CODING_MODE_SAFE_ENCODING; | |
20227
71008f909642
(setup_coding_system): Initialize common_flags member
Kenichi Handa <handa@m17n.org>
parents:
20150
diff
changeset
|
4468 coding->common_flags |
88365 | 4469 |= (CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK |
4470 | CODING_REQUIRE_FLUSHING_MASK); | |
4471 if (flags & CODING_ISO_FLAG_COMPOSITION) | |
4472 coding->common_flags |= CODING_ANNOTATE_COMPOSITION_MASK; | |
4473 if (flags & CODING_ISO_FLAG_FULL_SUPPORT) | |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
4474 { |
88365 | 4475 setup_iso_safe_charsets (attrs); |
4476 val = CODING_ATTR_SAFE_CHARSETS (attrs); | |
4477 coding->max_charset_id = XSTRING (val)->size - 1; | |
4478 coding->safe_charsets = (char *) XSTRING (val)->data; | |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
4479 } |
88365 | 4480 CODING_ISO_FLAGS (coding) = flags; |
4481 } | |
4482 else if (EQ (coding_type, Qcharset)) | |
4483 { | |
4484 coding->detector = detect_coding_charset; | |
4485 coding->decoder = decode_coding_charset; | |
4486 coding->encoder = encode_coding_charset; | |
4487 coding->common_flags | |
4488 |= (CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK); | |
4489 } | |
4490 else if (EQ (coding_type, Qutf_8)) | |
4491 { | |
4492 coding->detector = detect_coding_utf_8; | |
4493 coding->decoder = decode_coding_utf_8; | |
4494 coding->encoder = encode_coding_utf_8; | |
34888
b469d29c0815
(SAFE_ONE_MORE_BYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34813
diff
changeset
|
4495 coding->common_flags |
88365 | 4496 |= (CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK); |
4497 } | |
4498 else if (EQ (coding_type, Qutf_16)) | |
4499 { | |
4500 val = AREF (attrs, coding_attr_utf_16_bom); | |
4501 CODING_UTF_16_BOM (coding) = (CONSP (val) ? utf_16_detect_bom | |
4502 : EQ (val, Qt) ? utf_16_with_bom | |
4503 : utf_16_without_bom); | |
4504 val = AREF (attrs, coding_attr_utf_16_endian); | |
4505 CODING_UTF_16_ENDIAN (coding) = (NILP (val) ? utf_16_big_endian | |
4506 : utf_16_little_endian); | |
4507 coding->detector = detect_coding_utf_16; | |
4508 coding->decoder = decode_coding_utf_16; | |
4509 coding->encoder = encode_coding_utf_16; | |
20227
71008f909642
(setup_coding_system): Initialize common_flags member
Kenichi Handa <handa@m17n.org>
parents:
20150
diff
changeset
|
4510 coding->common_flags |
88365 | 4511 |= (CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK); |
4512 } | |
4513 else if (EQ (coding_type, Qccl)) | |
4514 { | |
4515 coding->detector = detect_coding_ccl; | |
4516 coding->decoder = decode_coding_ccl; | |
4517 coding->encoder = encode_coding_ccl; | |
20227
71008f909642
(setup_coding_system): Initialize common_flags member
Kenichi Handa <handa@m17n.org>
parents:
20150
diff
changeset
|
4518 coding->common_flags |
88365 | 4519 |= (CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK |
4520 | CODING_REQUIRE_FLUSHING_MASK); | |
4521 } | |
4522 else if (EQ (coding_type, Qemacs_mule)) | |
4523 { | |
4524 coding->detector = detect_coding_emacs_mule; | |
4525 coding->decoder = decode_coding_emacs_mule; | |
4526 coding->encoder = encode_coding_emacs_mule; | |
4527 coding->common_flags | |
4528 |= (CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK); | |
4529 if (! NILP (AREF (attrs, coding_attr_emacs_mule_full)) | |
4530 && ! EQ (CODING_ATTR_CHARSET_LIST (attrs), Vemacs_mule_charset_list)) | |
4531 { | |
4532 Lisp_Object tail, safe_charsets; | |
4533 int max_charset_id = 0; | |
4534 | |
4535 for (tail = Vemacs_mule_charset_list; CONSP (tail); | |
4536 tail = XCDR (tail)) | |
4537 if (max_charset_id < XFASTINT (XCAR (tail))) | |
4538 max_charset_id = XFASTINT (XCAR (tail)); | |
4539 safe_charsets = Fmake_string (make_number (max_charset_id + 1), | |
4540 make_number (255)); | |
4541 for (tail = Vemacs_mule_charset_list; CONSP (tail); | |
4542 tail = XCDR (tail)) | |
4543 XSTRING (safe_charsets)->data[XFASTINT (XCAR (tail))] = 0; | |
4544 coding->max_charset_id = max_charset_id; | |
4545 coding->safe_charsets = (char *) XSTRING (safe_charsets)->data; | |
4546 } | |
4547 } | |
4548 else if (EQ (coding_type, Qshift_jis)) | |
4549 { | |
4550 coding->detector = detect_coding_sjis; | |
4551 coding->decoder = decode_coding_sjis; | |
4552 coding->encoder = encode_coding_sjis; | |
4553 coding->common_flags | |
4554 |= (CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK); | |
4555 } | |
4556 else if (EQ (coding_type, Qbig5)) | |
4557 { | |
4558 coding->detector = detect_coding_big5; | |
4559 coding->decoder = decode_coding_big5; | |
4560 coding->encoder = encode_coding_big5; | |
20227
71008f909642
(setup_coding_system): Initialize common_flags member
Kenichi Handa <handa@m17n.org>
parents:
20150
diff
changeset
|
4561 coding->common_flags |
88365 | 4562 |= (CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK); |
4563 } | |
4564 else /* EQ (coding_type, Qraw_text) */ | |
4565 { | |
4566 coding->detector = NULL; | |
4567 coding->decoder = decode_coding_raw_text; | |
4568 coding->encoder = encode_coding_raw_text; | |
4569 coding->common_flags |= CODING_FOR_UNIBYTE_MASK; | |
4570 } | |
4571 | |
4572 return; | |
17052 | 4573 } |
4574 | |
88365 | 4575 /* Return raw-text or one of its subsidiaries that has the same |
4576 eol_type as CODING-SYSTEM. */ | |
4577 | |
4578 Lisp_Object | |
4579 raw_text_coding_system (coding_system) | |
4580 Lisp_Object coding_system; | |
26847 | 4581 { |
88430
6418a272b97e
* coding.c: Delete unused variables.
Kenichi Handa <handa@m17n.org>
parents:
88365
diff
changeset
|
4582 Lisp_Object spec, attrs; |
88365 | 4583 Lisp_Object eol_type, raw_text_eol_type; |
4584 | |
4585 spec = CODING_SYSTEM_SPEC (coding_system); | |
4586 attrs = AREF (spec, 0); | |
4587 | |
4588 if (EQ (CODING_ATTR_TYPE (attrs), Qraw_text)) | |
4589 return coding_system; | |
4590 | |
4591 eol_type = AREF (spec, 2); | |
4592 if (VECTORP (eol_type)) | |
4593 return Qraw_text; | |
4594 spec = CODING_SYSTEM_SPEC (Qraw_text); | |
4595 raw_text_eol_type = AREF (spec, 2); | |
4596 return (EQ (eol_type, Qunix) ? AREF (raw_text_eol_type, 0) | |
4597 : EQ (eol_type, Qdos) ? AREF (raw_text_eol_type, 1) | |
4598 : AREF (raw_text_eol_type, 2)); | |
26847 | 4599 } |
4600 | |
88365 | 4601 |
4602 /* If CODING_SYSTEM doesn't specify end-of-line format but PARENT | |
4603 does, return one of the subsidiary that has the same eol-spec as | |
4604 PARENT. Otherwise, return CODING_SYSTEM. */ | |
4605 | |
4606 Lisp_Object | |
4607 coding_inherit_eol_type (coding_system, parent) | |
22616
c493ce6a31e4
(setup_raw_text_coding_system): New function.
Kenichi Handa <handa@m17n.org>
parents:
22529
diff
changeset
|
4608 { |
88365 | 4609 Lisp_Object spec, attrs, eol_type; |
4610 | |
4611 spec = CODING_SYSTEM_SPEC (coding_system); | |
4612 attrs = AREF (spec, 0); | |
4613 eol_type = AREF (spec, 2); | |
4614 if (VECTORP (eol_type)) | |
4615 { | |
4616 Lisp_Object parent_spec; | |
4617 Lisp_Object parent_eol_type; | |
4618 | |
4619 parent_spec | |
4620 = CODING_SYSTEM_SPEC (buffer_defaults.buffer_file_coding_system); | |
4621 parent_eol_type = AREF (parent_spec, 2); | |
4622 if (EQ (parent_eol_type, Qunix)) | |
4623 coding_system = AREF (eol_type, 0); | |
4624 else if (EQ (parent_eol_type, Qdos)) | |
4625 coding_system = AREF (eol_type, 1); | |
4626 else if (EQ (parent_eol_type, Qmac)) | |
4627 coding_system = AREF (eol_type, 2); | |
4628 } | |
4629 return coding_system; | |
22616
c493ce6a31e4
(setup_raw_text_coding_system): New function.
Kenichi Handa <handa@m17n.org>
parents:
22529
diff
changeset
|
4630 } |
c493ce6a31e4
(setup_raw_text_coding_system): New function.
Kenichi Handa <handa@m17n.org>
parents:
22529
diff
changeset
|
4631 |
17052 | 4632 /* Emacs has a mechanism to automatically detect a coding system if it |
4633 is one of Emacs' internal format, ISO2022, SJIS, and BIG5. But, | |
4634 it's impossible to distinguish some coding systems accurately | |
4635 because they use the same range of codes. So, at first, coding | |
4636 systems are categorized into 7, those are: | |
4637 | |
17835
f36ffb6f1208
Name change through the code:
Kenichi Handa <handa@m17n.org>
parents:
17725
diff
changeset
|
4638 o coding-category-emacs-mule |
17052 | 4639 |
4640 The category for a coding system which has the same code range | |
4641 as Emacs' internal format. Assigned the coding-system (Lisp | |
17835
f36ffb6f1208
Name change through the code:
Kenichi Handa <handa@m17n.org>
parents:
17725
diff
changeset
|
4642 symbol) `emacs-mule' by default. |
17052 | 4643 |
4644 o coding-category-sjis | |
4645 | |
4646 The category for a coding system which has the same code range | |
4647 as SJIS. Assigned the coding-system (Lisp | |
18787
954e6be0a757
(detect_coding_iso2022): Distinguish coding-category-iso-7-else and
Kenichi Handa <handa@m17n.org>
parents:
18766
diff
changeset
|
4648 symbol) `japanese-shift-jis' by default. |
17052 | 4649 |
4650 o coding-category-iso-7 | |
4651 | |
4652 The category for a coding system which has the same code range | |
18787
954e6be0a757
(detect_coding_iso2022): Distinguish coding-category-iso-7-else and
Kenichi Handa <handa@m17n.org>
parents:
18766
diff
changeset
|
4653 as ISO2022 of 7-bit environment. This doesn't use any locking |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
4654 shift and single shift functions. This can encode/decode all |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
4655 charsets. Assigned the coding-system (Lisp symbol) |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
4656 `iso-2022-7bit' by default. |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
4657 |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
4658 o coding-category-iso-7-tight |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
4659 |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
4660 Same as coding-category-iso-7 except that this can |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
4661 encode/decode only the specified charsets. |
17052 | 4662 |
4663 o coding-category-iso-8-1 | |
4664 | |
4665 The category for a coding system which has the same code range | |
4666 as ISO2022 of 8-bit environment and graphic plane 1 used only | |
18787
954e6be0a757
(detect_coding_iso2022): Distinguish coding-category-iso-7-else and
Kenichi Handa <handa@m17n.org>
parents:
18766
diff
changeset
|
4667 for DIMENSION1 charset. This doesn't use any locking shift |
954e6be0a757
(detect_coding_iso2022): Distinguish coding-category-iso-7-else and
Kenichi Handa <handa@m17n.org>
parents:
18766
diff
changeset
|
4668 and single shift functions. Assigned the coding-system (Lisp |
954e6be0a757
(detect_coding_iso2022): Distinguish coding-category-iso-7-else and
Kenichi Handa <handa@m17n.org>
parents:
18766
diff
changeset
|
4669 symbol) `iso-latin-1' by default. |
17052 | 4670 |
4671 o coding-category-iso-8-2 | |
4672 | |
4673 The category for a coding system which has the same code range | |
4674 as ISO2022 of 8-bit environment and graphic plane 1 used only | |
18787
954e6be0a757
(detect_coding_iso2022): Distinguish coding-category-iso-7-else and
Kenichi Handa <handa@m17n.org>
parents:
18766
diff
changeset
|
4675 for DIMENSION2 charset. This doesn't use any locking shift |
954e6be0a757
(detect_coding_iso2022): Distinguish coding-category-iso-7-else and
Kenichi Handa <handa@m17n.org>
parents:
18766
diff
changeset
|
4676 and single shift functions. Assigned the coding-system (Lisp |
954e6be0a757
(detect_coding_iso2022): Distinguish coding-category-iso-7-else and
Kenichi Handa <handa@m17n.org>
parents:
18766
diff
changeset
|
4677 symbol) `japanese-iso-8bit' by default. |
954e6be0a757
(detect_coding_iso2022): Distinguish coding-category-iso-7-else and
Kenichi Handa <handa@m17n.org>
parents:
18766
diff
changeset
|
4678 |
954e6be0a757
(detect_coding_iso2022): Distinguish coding-category-iso-7-else and
Kenichi Handa <handa@m17n.org>
parents:
18766
diff
changeset
|
4679 o coding-category-iso-7-else |
17052 | 4680 |
4681 The category for a coding system which has the same code range | |
88365 | 4682 as ISO2022 of 7-bit environemnt but uses locking shift or |
18787
954e6be0a757
(detect_coding_iso2022): Distinguish coding-category-iso-7-else and
Kenichi Handa <handa@m17n.org>
parents:
18766
diff
changeset
|
4683 single shift functions. Assigned the coding-system (Lisp |
954e6be0a757
(detect_coding_iso2022): Distinguish coding-category-iso-7-else and
Kenichi Handa <handa@m17n.org>
parents:
18766
diff
changeset
|
4684 symbol) `iso-2022-7bit-lock' by default. |
954e6be0a757
(detect_coding_iso2022): Distinguish coding-category-iso-7-else and
Kenichi Handa <handa@m17n.org>
parents:
18766
diff
changeset
|
4685 |
954e6be0a757
(detect_coding_iso2022): Distinguish coding-category-iso-7-else and
Kenichi Handa <handa@m17n.org>
parents:
18766
diff
changeset
|
4686 o coding-category-iso-8-else |
954e6be0a757
(detect_coding_iso2022): Distinguish coding-category-iso-7-else and
Kenichi Handa <handa@m17n.org>
parents:
18766
diff
changeset
|
4687 |
954e6be0a757
(detect_coding_iso2022): Distinguish coding-category-iso-7-else and
Kenichi Handa <handa@m17n.org>
parents:
18766
diff
changeset
|
4688 The category for a coding system which has the same code range |
88365 | 4689 as ISO2022 of 8-bit environemnt but uses locking shift or |
18787
954e6be0a757
(detect_coding_iso2022): Distinguish coding-category-iso-7-else and
Kenichi Handa <handa@m17n.org>
parents:
18766
diff
changeset
|
4690 single shift functions. Assigned the coding-system (Lisp |
954e6be0a757
(detect_coding_iso2022): Distinguish coding-category-iso-7-else and
Kenichi Handa <handa@m17n.org>
parents:
18766
diff
changeset
|
4691 symbol) `iso-2022-8bit-ss2' by default. |
17052 | 4692 |
4693 o coding-category-big5 | |
4694 | |
4695 The category for a coding system which has the same code range | |
4696 as BIG5. Assigned the coding-system (Lisp symbol) | |
17119
2cfb31c15ced
(create_process, Fopen_network_stream): Typo in indexes
Kenichi Handa <handa@m17n.org>
parents:
17071
diff
changeset
|
4697 `cn-big5' by default. |
17052 | 4698 |
28022
6c41f3276340
Add comments on coding-category-utf-8,
Kenichi Handa <handa@m17n.org>
parents:
27943
diff
changeset
|
4699 o coding-category-utf-8 |
6c41f3276340
Add comments on coding-category-utf-8,
Kenichi Handa <handa@m17n.org>
parents:
27943
diff
changeset
|
4700 |
6c41f3276340
Add comments on coding-category-utf-8,
Kenichi Handa <handa@m17n.org>
parents:
27943
diff
changeset
|
4701 The category for a coding system which has the same code range |
6c41f3276340
Add comments on coding-category-utf-8,
Kenichi Handa <handa@m17n.org>
parents:
27943
diff
changeset
|
4702 as UTF-8 (cf. RFC2279). Assigned the coding-system (Lisp |
6c41f3276340
Add comments on coding-category-utf-8,
Kenichi Handa <handa@m17n.org>
parents:
27943
diff
changeset
|
4703 symbol) `utf-8' by default. |
6c41f3276340
Add comments on coding-category-utf-8,
Kenichi Handa <handa@m17n.org>
parents:
27943
diff
changeset
|
4704 |
6c41f3276340
Add comments on coding-category-utf-8,
Kenichi Handa <handa@m17n.org>
parents:
27943
diff
changeset
|
4705 o coding-category-utf-16-be |
6c41f3276340
Add comments on coding-category-utf-8,
Kenichi Handa <handa@m17n.org>
parents:
27943
diff
changeset
|
4706 |
6c41f3276340
Add comments on coding-category-utf-8,
Kenichi Handa <handa@m17n.org>
parents:
27943
diff
changeset
|
4707 The category for a coding system in which a text has an |
6c41f3276340
Add comments on coding-category-utf-8,
Kenichi Handa <handa@m17n.org>
parents:
27943
diff
changeset
|
4708 Unicode signature (cf. Unicode Standard) in the order of BIG |
6c41f3276340
Add comments on coding-category-utf-8,
Kenichi Handa <handa@m17n.org>
parents:
27943
diff
changeset
|
4709 endian at the head. Assigned the coding-system (Lisp symbol) |
6c41f3276340
Add comments on coding-category-utf-8,
Kenichi Handa <handa@m17n.org>
parents:
27943
diff
changeset
|
4710 `utf-16-be' by default. |
6c41f3276340
Add comments on coding-category-utf-8,
Kenichi Handa <handa@m17n.org>
parents:
27943
diff
changeset
|
4711 |
6c41f3276340
Add comments on coding-category-utf-8,
Kenichi Handa <handa@m17n.org>
parents:
27943
diff
changeset
|
4712 o coding-category-utf-16-le |
6c41f3276340
Add comments on coding-category-utf-8,
Kenichi Handa <handa@m17n.org>
parents:
27943
diff
changeset
|
4713 |
6c41f3276340
Add comments on coding-category-utf-8,
Kenichi Handa <handa@m17n.org>
parents:
27943
diff
changeset
|
4714 The category for a coding system in which a text has an |
6c41f3276340
Add comments on coding-category-utf-8,
Kenichi Handa <handa@m17n.org>
parents:
27943
diff
changeset
|
4715 Unicode signature (cf. Unicode Standard) in the order of |
6c41f3276340
Add comments on coding-category-utf-8,
Kenichi Handa <handa@m17n.org>
parents:
27943
diff
changeset
|
4716 LITTLE endian at the head. Assigned the coding-system (Lisp |
6c41f3276340
Add comments on coding-category-utf-8,
Kenichi Handa <handa@m17n.org>
parents:
27943
diff
changeset
|
4717 symbol) `utf-16-le' by default. |
6c41f3276340
Add comments on coding-category-utf-8,
Kenichi Handa <handa@m17n.org>
parents:
27943
diff
changeset
|
4718 |
22874
b133f07a76db
(Qvalid_codes): New variable.
Kenichi Handa <handa@m17n.org>
parents:
22812
diff
changeset
|
4719 o coding-category-ccl |
b133f07a76db
(Qvalid_codes): New variable.
Kenichi Handa <handa@m17n.org>
parents:
22812
diff
changeset
|
4720 |
b133f07a76db
(Qvalid_codes): New variable.
Kenichi Handa <handa@m17n.org>
parents:
22812
diff
changeset
|
4721 The category for a coding system of which encoder/decoder is |
b133f07a76db
(Qvalid_codes): New variable.
Kenichi Handa <handa@m17n.org>
parents:
22812
diff
changeset
|
4722 written in CCL programs. The default value is nil, i.e., no |
b133f07a76db
(Qvalid_codes): New variable.
Kenichi Handa <handa@m17n.org>
parents:
22812
diff
changeset
|
4723 coding system is assigned. |
b133f07a76db
(Qvalid_codes): New variable.
Kenichi Handa <handa@m17n.org>
parents:
22812
diff
changeset
|
4724 |
17052 | 4725 o coding-category-binary |
4726 | |
4727 The category for a coding system not categorized in any of the | |
4728 above. Assigned the coding-system (Lisp symbol) | |
17119
2cfb31c15ced
(create_process, Fopen_network_stream): Typo in indexes
Kenichi Handa <handa@m17n.org>
parents:
17071
diff
changeset
|
4729 `no-conversion' by default. |
17052 | 4730 |
4731 Each of them is a Lisp symbol and the value is an actual | |
88365 | 4732 `coding-system's (this is also a Lisp symbol) assigned by a user. |
17052 | 4733 What Emacs does actually is to detect a category of coding system. |
4734 Then, it uses a `coding-system' assigned to it. If Emacs can't | |
88365 | 4735 decide only one possible category, it selects a category of the |
17052 | 4736 highest priority. Priorities of categories are also specified by a |
4737 user in a Lisp variable `coding-category-list'. | |
4738 | |
4739 */ | |
4740 | |
88365 | 4741 #define EOL_SEEN_NONE 0 |
4742 #define EOL_SEEN_LF 1 | |
4743 #define EOL_SEEN_CR 2 | |
4744 #define EOL_SEEN_CRLF 4 | |
4745 | |
4746 /* Detect how end-of-line of a text of length CODING->src_bytes | |
4747 pointed by CODING->source is encoded. Return one of | |
4748 EOL_SEEN_XXX. */ | |
17052 | 4749 |
19173
04ed7c3f5cee
(detect_eol_type): If EOL representation does not
Richard M. Stallman <rms@gnu.org>
parents:
19134
diff
changeset
|
4750 #define MAX_EOL_CHECK_COUNT 3 |
04ed7c3f5cee
(detect_eol_type): If EOL representation does not
Richard M. Stallman <rms@gnu.org>
parents:
19134
diff
changeset
|
4751 |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
4752 static int |
88365 | 4753 detect_eol (coding, source, src_bytes) |
4754 struct coding_system *coding; | |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
4755 unsigned char *source; |
88365 | 4756 EMACS_INT src_bytes; |
17052 | 4757 { |
88365 | 4758 Lisp_Object attrs, coding_type; |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
4759 unsigned char *src = source, *src_end = src + src_bytes; |
17052 | 4760 unsigned char c; |
88365 | 4761 int total = 0; |
4762 int eol_seen = EOL_SEEN_NONE; | |
4763 | |
4764 attrs = CODING_ID_ATTRS (coding->id); | |
4765 coding_type = CODING_ATTR_TYPE (attrs); | |
4766 | |
4767 if (EQ (coding_type, Qccl)) | |
4768 { | |
4769 int msb, lsb; | |
4770 | |
4771 msb = coding->spec.utf_16.endian == utf_16_little_endian; | |
4772 lsb = 1 - msb; | |
4773 | |
4774 while (src + 1 < src_end) | |
17052 | 4775 { |
88365 | 4776 c = src[lsb]; |
4777 if (src[msb] == 0 && (c == '\n' || c == '\r')) | |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
4778 { |
88365 | 4779 int this_eol; |
4780 | |
4781 if (c == '\n') | |
4782 this_eol = EOL_SEEN_LF; | |
4783 else if (src + 3 >= src_end | |
4784 || src[msb + 2] != 0 | |
4785 || src[lsb + 2] != '\n') | |
4786 this_eol = EOL_SEEN_CR; | |
4787 else | |
4788 this_eol = EOL_SEEN_CRLF; | |
4789 | |
4790 if (eol_seen == EOL_SEEN_NONE) | |
4791 /* This is the first end-of-line. */ | |
4792 eol_seen = this_eol; | |
4793 else if (eol_seen != this_eol) | |
4794 { | |
4795 /* The found type is different from what found before. */ | |
4796 eol_seen = EOL_SEEN_LF; | |
4797 break; | |
4798 } | |
4799 if (++total == MAX_EOL_CHECK_COUNT) | |
4800 break; | |
4801 } | |
4802 src += 2; | |
4803 } | |
4804 } | |
4805 else | |
4806 { | |
4807 while (src < src_end) | |
4808 { | |
4809 c = *src++; | |
4810 if (c == '\n' || c == '\r') | |
4811 { | |
4812 int this_eol; | |
4813 | |
4814 if (c == '\n') | |
4815 this_eol = EOL_SEEN_LF; | |
4816 else if (src >= src_end || *src != '\n') | |
4817 this_eol = EOL_SEEN_CR; | |
4818 else | |
4819 this_eol = EOL_SEEN_CRLF, src++; | |
4820 | |
4821 if (eol_seen == EOL_SEEN_NONE) | |
4822 /* This is the first end-of-line. */ | |
4823 eol_seen = this_eol; | |
4824 else if (eol_seen != this_eol) | |
4825 { | |
4826 /* The found type is different from what found before. */ | |
4827 eol_seen = EOL_SEEN_LF; | |
4828 break; | |
4829 } | |
4830 if (++total == MAX_EOL_CHECK_COUNT) | |
4831 break; | |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
4832 } |
17052 | 4833 } |
4834 } | |
88365 | 4835 return eol_seen; |
17052 | 4836 } |
4837 | |
88365 | 4838 |
4839 static void | |
4840 adjust_coding_eol_type (coding, eol_seen) | |
4841 struct coding_system *coding; | |
4842 int eol_seen; | |
4843 { | |
88430
6418a272b97e
* coding.c: Delete unused variables.
Kenichi Handa <handa@m17n.org>
parents:
88365
diff
changeset
|
4844 Lisp_Object eol_type; |
88365 | 4845 |
4846 eol_type = CODING_ID_EOL_TYPE (coding->id); | |
4847 if (eol_seen & EOL_SEEN_LF) | |
4848 coding->id = CODING_SYSTEM_ID (AREF (eol_type, 0)); | |
4849 else if (eol_type & EOL_SEEN_CRLF) | |
4850 coding->id = CODING_SYSTEM_ID (AREF (eol_type, 1)); | |
4851 else if (eol_type & EOL_SEEN_CR) | |
4852 coding->id = CODING_SYSTEM_ID (AREF (eol_type, 2)); | |
4853 } | |
4854 | |
4855 /* Detect how a text specified in CODING is encoded. If a coding | |
4856 system is detected, update fields of CODING by the detected coding | |
4857 system. */ | |
4858 | |
4859 void | |
4860 detect_coding (coding) | |
4861 struct coding_system *coding; | |
28022
6c41f3276340
Add comments on coding-category-utf-8,
Kenichi Handa <handa@m17n.org>
parents:
27943
diff
changeset
|
4862 { |
88365 | 4863 unsigned char *src, *src_end; |
4864 Lisp_Object attrs, coding_type; | |
4865 | |
4866 coding->consumed = coding->consumed_char = 0; | |
4867 coding->produced = coding->produced_char = 0; | |
4868 coding_set_source (coding); | |
4869 | |
4870 src_end = coding->source + coding->src_bytes; | |
4871 | |
4872 /* If we have not yet decided the text encoding type, detect it | |
4873 now. */ | |
4874 if (EQ (CODING_ATTR_TYPE (CODING_ID_ATTRS (coding->id)), Qundecided)) | |
4875 { | |
4876 int mask = CATEGORY_MASK_ANY; | |
4877 int c, i; | |
4878 | |
4879 for (src = coding->source; src < src_end; src++) | |
28022
6c41f3276340
Add comments on coding-category-utf-8,
Kenichi Handa <handa@m17n.org>
parents:
27943
diff
changeset
|
4880 { |
88365 | 4881 c = *src; |
4882 if (c & 0x80 || (c < 0x20 && (c == ISO_CODE_ESC | |
4883 || c == ISO_CODE_SI | |
4884 || c == ISO_CODE_SO))) | |
4885 break; | |
4886 } | |
4887 coding->head_ascii = src - (coding->source + coding->consumed); | |
4888 | |
4889 if (coding->head_ascii < coding->src_bytes) | |
4890 { | |
4891 int detected = 0; | |
4892 | |
4893 for (i = 0; i < coding_category_raw_text; i++) | |
28022
6c41f3276340
Add comments on coding-category-utf-8,
Kenichi Handa <handa@m17n.org>
parents:
27943
diff
changeset
|
4894 { |
88365 | 4895 enum coding_category category = coding_priorities[i]; |
4896 struct coding_system *this = coding_categories + category; | |
4897 | |
4898 if (category >= coding_category_raw_text | |
4899 || detected & (1 << category)) | |
4900 continue; | |
4901 | |
4902 if (this->id < 0) | |
28022
6c41f3276340
Add comments on coding-category-utf-8,
Kenichi Handa <handa@m17n.org>
parents:
27943
diff
changeset
|
4903 { |
88365 | 4904 /* No coding system of this category is defined. */ |
4905 mask &= ~(1 << category); | |
28022
6c41f3276340
Add comments on coding-category-utf-8,
Kenichi Handa <handa@m17n.org>
parents:
27943
diff
changeset
|
4906 } |
6c41f3276340
Add comments on coding-category-utf-8,
Kenichi Handa <handa@m17n.org>
parents:
27943
diff
changeset
|
4907 else |
6c41f3276340
Add comments on coding-category-utf-8,
Kenichi Handa <handa@m17n.org>
parents:
27943
diff
changeset
|
4908 { |
88365 | 4909 detected |= detected_mask[category]; |
4910 if ((*(this->detector)) (coding, &mask)) | |
4911 break; | |
28022
6c41f3276340
Add comments on coding-category-utf-8,
Kenichi Handa <handa@m17n.org>
parents:
27943
diff
changeset
|
4912 } |
6c41f3276340
Add comments on coding-category-utf-8,
Kenichi Handa <handa@m17n.org>
parents:
27943
diff
changeset
|
4913 } |
88365 | 4914 if (! mask) |
4915 setup_coding_system (Qraw_text, coding); | |
4916 else if (mask != CATEGORY_MASK_ANY) | |
4917 for (i = 0; i < coding_category_raw_text; i++) | |
4918 { | |
4919 enum coding_category category = coding_priorities[i]; | |
4920 struct coding_system *this = coding_categories + category; | |
4921 | |
4922 if (mask & (1 << category)) | |
4923 { | |
4924 setup_coding_system (CODING_ID_NAME (this->id), coding); | |
4925 break; | |
4926 } | |
4927 } | |
4928 } | |
4929 } | |
4930 | |
4931 attrs = CODING_ID_ATTRS (coding->id); | |
4932 coding_type = CODING_ATTR_TYPE (attrs); | |
4933 | |
4934 /* If we have not yet decided the EOL type, detect it now. But, the | |
4935 detection is impossible for a CCL based coding system, in which | |
4936 case, we detct the EOL type after decoding. */ | |
4937 if (VECTORP (CODING_ID_EOL_TYPE (coding->id)) | |
4938 && ! EQ (coding_type, Qccl)) | |
4939 { | |
4940 int eol_seen = detect_eol (coding, coding->source, coding->src_bytes); | |
4941 | |
4942 if (eol_seen != EOL_SEEN_NONE) | |
4943 adjust_coding_eol_type (coding, eol_seen); | |
4944 } | |
4945 } | |
4946 | |
4947 | |
4948 static void | |
4949 decode_eol (coding) | |
4950 struct coding_system *coding; | |
4951 { | |
4952 if (VECTORP (CODING_ID_EOL_TYPE (coding->id))) | |
4953 { | |
4954 unsigned char *p = CHAR_POS_ADDR (coding->dst_pos); | |
4955 unsigned char *pend = p + coding->produced; | |
4956 int eol_seen = EOL_SEEN_NONE; | |
4957 | |
4958 for (; p < pend; p++) | |
4959 { | |
4960 if (*p == '\n') | |
4961 eol_seen |= EOL_SEEN_LF; | |
4962 else if (*p == '\r') | |
28022
6c41f3276340
Add comments on coding-category-utf-8,
Kenichi Handa <handa@m17n.org>
parents:
27943
diff
changeset
|
4963 { |
88365 | 4964 if (p + 1 < pend && *(p + 1) == '\n') |
4965 { | |
4966 eol_seen |= EOL_SEEN_CRLF; | |
4967 p++; | |
4968 } | |
4969 else | |
4970 eol_seen |= EOL_SEEN_CR; | |
28022
6c41f3276340
Add comments on coding-category-utf-8,
Kenichi Handa <handa@m17n.org>
parents:
27943
diff
changeset
|
4971 } |
6c41f3276340
Add comments on coding-category-utf-8,
Kenichi Handa <handa@m17n.org>
parents:
27943
diff
changeset
|
4972 } |
88365 | 4973 if (eol_seen != EOL_SEEN_NONE) |
4974 adjust_coding_eol_type (coding, eol_seen); | |
4975 } | |
4976 | |
4977 if (EQ (CODING_ID_EOL_TYPE (coding->id), Qmac)) | |
4978 { | |
4979 unsigned char *p = CHAR_POS_ADDR (coding->dst_pos); | |
4980 unsigned char *pend = p + coding->produced; | |
4981 | |
4982 for (; p < pend; p++) | |
4983 if (*p == '\r') | |
4984 *p = '\n'; | |
4985 } | |
4986 else if (EQ (CODING_ID_EOL_TYPE (coding->id), Qdos)) | |
4987 { | |
4988 unsigned char *p, *pbeg, *pend; | |
4989 Lisp_Object undo_list; | |
4990 | |
4991 move_gap_both (coding->dst_pos + coding->produced_char, | |
4992 coding->dst_pos_byte + coding->produced); | |
4993 undo_list = current_buffer->undo_list; | |
4994 current_buffer->undo_list = Qt; | |
4995 del_range_2 (coding->dst_pos, coding->dst_pos_byte, GPT, GPT_BYTE, Qnil); | |
4996 current_buffer->undo_list = undo_list; | |
4997 pbeg = GPT_ADDR; | |
4998 pend = pbeg + coding->produced; | |
4999 | |
5000 for (p = pend - 1; p >= pbeg; p--) | |
5001 if (*p == '\r') | |
5002 { | |
5003 safe_bcopy ((char *) (p + 1), (char *) p, pend - p - 1); | |
5004 pend--; | |
5005 } | |
5006 coding->produced_char -= coding->produced - (pend - pbeg); | |
5007 coding->produced = pend - pbeg; | |
5008 insert_from_gap (coding->produced_char, coding->produced); | |
17052 | 5009 } |
5010 } | |
5011 | |
88365 | 5012 static void |
5013 translate_chars (coding, table) | |
17052 | 5014 struct coding_system *coding; |
88365 | 5015 Lisp_Object table; |
17052 | 5016 { |
88365 | 5017 int *charbuf = coding->charbuf; |
5018 int *charbuf_end = charbuf + coding->charbuf_used; | |
5019 int c; | |
5020 | |
5021 if (coding->chars_at_source) | |
5022 return; | |
5023 | |
5024 while (charbuf < charbuf_end) | |
5025 { | |
5026 c = *charbuf; | |
5027 if (c < 0) | |
5028 charbuf += c; | |
5029 else | |
5030 *charbuf++ = translate_char (table, c); | |
5031 } | |
17052 | 5032 } |
5033 | |
88365 | 5034 static int |
5035 produce_chars (coding) | |
5036 struct coding_system *coding; | |
17052 | 5037 { |
88365 | 5038 unsigned char *dst = coding->destination + coding->produced; |
5039 unsigned char *dst_end = coding->destination + coding->dst_bytes; | |
5040 int produced; | |
5041 int produced_chars = 0; | |
5042 | |
5043 if (! coding->chars_at_source) | |
5044 { | |
5045 /* Characters are in coding->charbuf. */ | |
5046 int *buf = coding->charbuf; | |
5047 int *buf_end = buf + coding->charbuf_used; | |
5048 unsigned char *adjusted_dst_end; | |
5049 | |
5050 if (BUFFERP (coding->src_object) | |
5051 && EQ (coding->src_object, coding->dst_object)) | |
5052 dst_end = coding->source + coding->consumed; | |
5053 adjusted_dst_end = dst_end - MAX_MULTIBYTE_LENGTH; | |
5054 | |
5055 while (buf < buf_end) | |
5056 { | |
5057 int c = *buf++; | |
5058 | |
5059 if (dst >= adjusted_dst_end) | |
5060 { | |
5061 dst = alloc_destination (coding, | |
5062 buf_end - buf + MAX_MULTIBYTE_LENGTH, | |
5063 dst); | |
5064 dst_end = coding->destination + coding->dst_bytes; | |
5065 adjusted_dst_end = dst_end - MAX_MULTIBYTE_LENGTH; | |
5066 } | |
5067 if (c >= 0) | |
5068 { | |
5069 if (coding->dst_multibyte | |
5070 || ! CHAR_BYTE8_P (c)) | |
5071 CHAR_STRING_ADVANCE (c, dst); | |
5072 else | |
5073 *dst++ = CHAR_TO_BYTE8 (c); | |
5074 produced_chars++; | |
5075 } | |
5076 else | |
5077 /* This is an annotation data. */ | |
5078 buf -= c + 1; | |
5079 } | |
30833
2db6e42a6ba3
(MINIMUM_CONVERSION_BUFFER_SIZE): Macro deleted.
Kenichi Handa <handa@m17n.org>
parents:
30756
diff
changeset
|
5080 } |
2db6e42a6ba3
(MINIMUM_CONVERSION_BUFFER_SIZE): Macro deleted.
Kenichi Handa <handa@m17n.org>
parents:
30756
diff
changeset
|
5081 else |
2db6e42a6ba3
(MINIMUM_CONVERSION_BUFFER_SIZE): Macro deleted.
Kenichi Handa <handa@m17n.org>
parents:
30756
diff
changeset
|
5082 { |
88365 | 5083 int multibytep = coding->src_multibyte; |
5084 unsigned char *src = coding->source; | |
5085 unsigned char *src_end = src + coding->src_bytes; | |
5086 Lisp_Object eol_type; | |
5087 | |
5088 eol_type = CODING_ID_EOL_TYPE (coding->id); | |
5089 | |
5090 if (coding->src_multibyte != coding->dst_multibyte) | |
34892
3868f2e7355a
(setup_coding_system): Initialize
Kenichi Handa <handa@m17n.org>
parents:
34888
diff
changeset
|
5091 { |
88365 | 5092 if (coding->src_multibyte) |
34892
3868f2e7355a
(setup_coding_system): Initialize
Kenichi Handa <handa@m17n.org>
parents:
34888
diff
changeset
|
5093 { |
88365 | 5094 int consumed_chars; |
5095 | |
5096 while (1) | |
34892
3868f2e7355a
(setup_coding_system): Initialize
Kenichi Handa <handa@m17n.org>
parents:
34888
diff
changeset
|
5097 { |
88365 | 5098 unsigned char *src_base = src; |
5099 int c; | |
5100 | |
5101 ONE_MORE_BYTE (c); | |
5102 if (c == '\r') | |
5103 { | |
5104 if (EQ (eol_type, Qdos)) | |
5105 { | |
5106 if (src < src_end | |
5107 && *src == '\n') | |
5108 c = *src++; | |
5109 } | |
5110 else if (EQ (eol_type, Qmac)) | |
5111 c = '\n'; | |
5112 } | |
5113 if (dst == dst_end) | |
5114 { | |
5115 EMACS_INT offset = src - coding->source; | |
5116 | |
5117 dst = alloc_destination (coding, src_end - src + 1, dst); | |
5118 dst_end = coding->destination + coding->dst_bytes; | |
5119 coding_set_source (coding); | |
5120 src = coding->source + offset; | |
5121 src_end = coding->source + coding->src_bytes; | |
5122 } | |
5123 *dst++ = c; | |
5124 produced_chars++; | |
5125 } | |
5126 no_more_source: | |
5127 ; | |
5128 } | |
5129 else | |
5130 while (src < src_end) | |
5131 { | |
5132 int c = *src++; | |
5133 | |
5134 if (c == '\r') | |
5135 { | |
5136 if (EQ (eol_type, Qdos)) | |
5137 { | |
5138 if (src < src_end | |
5139 && *src == '\n') | |
5140 c = *src++; | |
5141 } | |
5142 else if (EQ (eol_type, Qmac)) | |
5143 c = '\n'; | |
5144 } | |
5145 if (dst >= dst_end - 1) | |
5146 { | |
5147 EMACS_INT offset = src - coding->source; | |
5148 | |
5149 dst = alloc_destination (coding, src_end - src + 2, dst); | |
5150 dst_end = coding->destination + coding->dst_bytes; | |
5151 coding_set_source (coding); | |
5152 src = coding->source + offset; | |
5153 src_end = coding->source + coding->src_bytes; | |
5154 } | |
5155 EMIT_ONE_BYTE (c); | |
5156 } | |
5157 } | |
5158 else | |
5159 { | |
5160 if (!EQ (coding->src_object, coding->dst_object)) | |
5161 { | |
5162 int require = coding->src_bytes - coding->dst_bytes; | |
5163 | |
5164 if (require > 0) | |
5165 { | |
5166 EMACS_INT offset = src - coding->source; | |
5167 | |
5168 dst = alloc_destination (coding, require, dst); | |
5169 coding_set_source (coding); | |
5170 src = coding->source + offset; | |
5171 src_end = coding->source + coding->src_bytes; | |
34892
3868f2e7355a
(setup_coding_system): Initialize
Kenichi Handa <handa@m17n.org>
parents:
34888
diff
changeset
|
5172 } |
3868f2e7355a
(setup_coding_system): Initialize
Kenichi Handa <handa@m17n.org>
parents:
34888
diff
changeset
|
5173 } |
88365 | 5174 produced_chars = coding->src_chars; |
5175 while (src < src_end) | |
34892
3868f2e7355a
(setup_coding_system): Initialize
Kenichi Handa <handa@m17n.org>
parents:
34888
diff
changeset
|
5176 { |
88365 | 5177 int c = *src++; |
5178 | |
5179 if (c == '\r') | |
5180 { | |
5181 if (EQ (eol_type, Qdos)) | |
5182 { | |
5183 if (src < src_end | |
5184 && *src == '\n') | |
5185 c = *src++; | |
5186 produced_chars--; | |
5187 } | |
5188 else if (EQ (eol_type, Qmac)) | |
5189 c = '\n'; | |
5190 } | |
5191 *dst++ = c; | |
34892
3868f2e7355a
(setup_coding_system): Initialize
Kenichi Handa <handa@m17n.org>
parents:
34888
diff
changeset
|
5192 } |
3868f2e7355a
(setup_coding_system): Initialize
Kenichi Handa <handa@m17n.org>
parents:
34888
diff
changeset
|
5193 } |
88365 | 5194 } |
5195 | |
5196 produced = dst - (coding->destination + coding->produced); | |
5197 if (BUFFERP (coding->dst_object)) | |
5198 insert_from_gap (produced_chars, produced); | |
5199 coding->produced += produced; | |
5200 coding->produced_char += produced_chars; | |
5201 return produced_chars; | |
5202 } | |
5203 | |
5204 /* [ -LENGTH CHAR_POS_OFFSET MASK METHOD COMP_LEN ] | |
5205 or | |
5206 [ -LENGTH CHAR_POS_OFFSET MASK METHOD COMP_LEN COMPONENTS... ] | |
5207 */ | |
5208 | |
5209 static INLINE void | |
5210 produce_composition (coding, charbuf) | |
5211 struct coding_system *coding; | |
5212 int *charbuf; | |
5213 { | |
5214 Lisp_Object buffer; | |
5215 int len; | |
5216 EMACS_INT pos; | |
5217 enum composition_method method; | |
5218 int cmp_len; | |
5219 Lisp_Object components; | |
5220 | |
5221 buffer = coding->dst_object; | |
5222 len = -charbuf[0]; | |
5223 pos = coding->dst_pos + charbuf[1]; | |
5224 method = (enum composition_method) (charbuf[3]); | |
5225 cmp_len = charbuf[4]; | |
5226 | |
5227 if (method == COMPOSITION_RELATIVE) | |
5228 components = Qnil; | |
5229 else | |
5230 { | |
5231 Lisp_Object args[MAX_COMPOSITION_COMPONENTS * 2 - 1]; | |
5232 int i; | |
5233 | |
5234 len -= 5; | |
5235 charbuf += 5; | |
5236 for (i = 0; i < len; i++) | |
5237 args[i] = make_number (charbuf[i]); | |
5238 components = (method == COMPOSITION_WITH_ALTCHARS | |
5239 ? Fstring (len, args) : Fvector (len, args)); | |
5240 } | |
5241 compose_text (pos, pos + cmp_len, components, Qnil, Qnil); | |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
5242 } |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
5243 |
88365 | 5244 static int * |
5245 save_composition_data (buf, buf_end, prop) | |
5246 int *buf, *buf_end; | |
5247 Lisp_Object prop; | |
5248 { | |
5249 enum composition_method method = COMPOSITION_METHOD (prop); | |
5250 int cmp_len = COMPOSITION_LENGTH (prop); | |
5251 | |
5252 if (buf + 4 + (MAX_COMPOSITION_COMPONENTS * 2 - 1) > buf_end) | |
5253 return NULL; | |
5254 | |
5255 buf[1] = CODING_ANNOTATE_COMPOSITION_MASK; | |
5256 buf[2] = method; | |
5257 buf[3] = cmp_len; | |
5258 | |
5259 if (method == COMPOSITION_RELATIVE) | |
5260 buf[0] = 4; | |
5261 else | |
5262 { | |
5263 Lisp_Object components; | |
5264 int len, i; | |
5265 | |
5266 components = COMPOSITION_COMPONENTS (prop); | |
5267 if (VECTORP (components)) | |
5268 { | |
5269 len = XVECTOR (components)->size; | |
5270 for (i = 0; i < len; i++) | |
5271 buf[4 + i] = XINT (AREF (components, i)); | |
5272 } | |
5273 else if (STRINGP (components)) | |
5274 { | |
5275 int i_byte; | |
5276 | |
5277 len = XSTRING (components)->size; | |
5278 i = i_byte = 0; | |
5279 while (i < len) | |
5280 FETCH_STRING_CHAR_ADVANCE (buf[4 + i], components, i, i_byte); | |
5281 } | |
5282 else if (INTEGERP (components)) | |
5283 { | |
5284 len = 1; | |
5285 buf[4] = XINT (components); | |
5286 } | |
5287 else if (CONSP (components)) | |
5288 { | |
5289 for (len = 0; CONSP (components); | |
5290 len++, components = XCDR (components)) | |
5291 buf[4 + len] = XINT (XCAR (components)); | |
5292 } | |
5293 else | |
5294 abort (); | |
5295 buf[0] = 4 + len; | |
5296 } | |
5297 return (buf + buf[0]); | |
5298 } | |
5299 | |
5300 #define CHARBUF_SIZE 0x4000 | |
5301 | |
5302 #define ALLOC_CONVERSION_WORK_AREA(coding) \ | |
5303 do { \ | |
5304 int size = CHARBUF_SIZE;; \ | |
5305 \ | |
5306 coding->charbuf = NULL; \ | |
5307 while (size > 1024) \ | |
5308 { \ | |
5309 coding->charbuf = (int *) alloca (sizeof (int) * size); \ | |
5310 if (coding->charbuf) \ | |
5311 break; \ | |
5312 size >>= 1; \ | |
5313 } \ | |
5314 if (! coding->charbuf) \ | |
5315 { \ | |
5316 coding->result = CODING_RESULT_INSUFFICIENT_MEM; \ | |
5317 return coding->result; \ | |
5318 } \ | |
5319 coding->charbuf_size = size; \ | |
5320 } while (0) | |
5321 | |
29725
2bc397e9b09a
(setup_coding_system) <4>: Reset member `cr_carryover'.
Kenichi Handa <handa@m17n.org>
parents:
29663
diff
changeset
|
5322 |
2bc397e9b09a
(setup_coding_system) <4>: Reset member `cr_carryover'.
Kenichi Handa <handa@m17n.org>
parents:
29663
diff
changeset
|
5323 static void |
88365 | 5324 produce_annotation (coding) |
29725
2bc397e9b09a
(setup_coding_system) <4>: Reset member `cr_carryover'.
Kenichi Handa <handa@m17n.org>
parents:
29663
diff
changeset
|
5325 struct coding_system *coding; |
2bc397e9b09a
(setup_coding_system) <4>: Reset member `cr_carryover'.
Kenichi Handa <handa@m17n.org>
parents:
29663
diff
changeset
|
5326 { |
88365 | 5327 int *charbuf = coding->charbuf; |
5328 int *charbuf_end = charbuf + coding->charbuf_used; | |
5329 | |
5330 while (charbuf < charbuf_end) | |
5331 { | |
5332 if (*charbuf >= 0) | |
5333 charbuf++; | |
5334 else | |
29877
7b43e1fb478a
(decode_eol_post_ccl): Special handling for undecided
Eli Zaretskii <eliz@gnu.org>
parents:
29725
diff
changeset
|
5335 { |
88365 | 5336 int len = -*charbuf; |
5337 switch (charbuf[2]) | |
29725
2bc397e9b09a
(setup_coding_system) <4>: Reset member `cr_carryover'.
Kenichi Handa <handa@m17n.org>
parents:
29663
diff
changeset
|
5338 { |
88365 | 5339 case CODING_ANNOTATE_COMPOSITION_MASK: |
5340 produce_composition (coding, charbuf); | |
5341 break; | |
5342 default: | |
5343 abort (); | |
29725
2bc397e9b09a
(setup_coding_system) <4>: Reset member `cr_carryover'.
Kenichi Handa <handa@m17n.org>
parents:
29663
diff
changeset
|
5344 } |
88365 | 5345 charbuf += len; |
29725
2bc397e9b09a
(setup_coding_system) <4>: Reset member `cr_carryover'.
Kenichi Handa <handa@m17n.org>
parents:
29663
diff
changeset
|
5346 } |
2bc397e9b09a
(setup_coding_system) <4>: Reset member `cr_carryover'.
Kenichi Handa <handa@m17n.org>
parents:
29663
diff
changeset
|
5347 } |
2bc397e9b09a
(setup_coding_system) <4>: Reset member `cr_carryover'.
Kenichi Handa <handa@m17n.org>
parents:
29663
diff
changeset
|
5348 } |
2bc397e9b09a
(setup_coding_system) <4>: Reset member `cr_carryover'.
Kenichi Handa <handa@m17n.org>
parents:
29663
diff
changeset
|
5349 |
88365 | 5350 /* Decode the data at CODING->src_object into CODING->dst_object. |
5351 CODING->src_object is a buffer, a string, or nil. | |
5352 CODING->dst_object is a buffer. | |
5353 | |
5354 If CODING->src_object is a buffer, it must be the current buffer. | |
5355 In this case, if CODING->src_pos is positive, it is a position of | |
5356 the source text in the buffer, otherwise, the source text is in the | |
5357 gap area of the buffer, and CODING->src_pos specifies the offset of | |
5358 the text from GPT (which must be the same as PT). If this is the | |
5359 same buffer as CODING->dst_object, CODING->src_pos must be | |
5360 negative. | |
5361 | |
5362 If CODING->src_object is a string, CODING->src_pos in an index to | |
5363 that string. | |
5364 | |
5365 If CODING->src_object is nil, CODING->source must already point to | |
5366 the non-relocatable memory area. In this case, CODING->src_pos is | |
5367 an offset from CODING->source. | |
5368 | |
5369 The decoded data is inserted at the current point of the buffer | |
5370 CODING->dst_object. | |
5371 */ | |
5372 | |
5373 static int | |
5374 decode_coding (coding) | |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
5375 struct coding_system *coding; |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
5376 { |
88365 | 5377 Lisp_Object attrs; |
5378 | |
5379 if (BUFFERP (coding->src_object) | |
5380 && coding->src_pos > 0 | |
5381 && coding->src_pos < GPT | |
5382 && coding->src_pos + coding->src_chars > GPT) | |
5383 move_gap_both (coding->src_pos, coding->src_pos_byte); | |
5384 | |
5385 if (BUFFERP (coding->dst_object)) | |
5386 { | |
5387 if (current_buffer != XBUFFER (coding->dst_object)) | |
5388 set_buffer_internal (XBUFFER (coding->dst_object)); | |
5389 if (GPT != PT) | |
5390 move_gap_both (PT, PT_BYTE); | |
5391 } | |
5392 | |
5393 coding->consumed = coding->consumed_char = 0; | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
5394 coding->produced = coding->produced_char = 0; |
88365 | 5395 coding->chars_at_source = 0; |
5396 coding->result = CODING_RESULT_SUCCESS; | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
5397 coding->errors = 0; |
88365 | 5398 |
5399 ALLOC_CONVERSION_WORK_AREA (coding); | |
5400 | |
5401 attrs = CODING_ID_ATTRS (coding->id); | |
5402 | |
5403 do | |
5404 { | |
5405 coding_set_source (coding); | |
5406 coding->annotated = 0; | |
5407 (*(coding->decoder)) (coding); | |
5408 if (!NILP (CODING_ATTR_DECODE_TBL (attrs))) | |
5409 translate_chars (CODING_ATTR_DECODE_TBL (attrs), coding); | |
5410 coding_set_destination (coding); | |
5411 produce_chars (coding); | |
5412 if (coding->annotated) | |
5413 produce_annotation (coding); | |
5414 } | |
5415 while (coding->consumed < coding->src_bytes | |
5416 && ! coding->result); | |
5417 | |
5418 if (EQ (CODING_ATTR_TYPE (CODING_ID_ATTRS (coding->id)), Qccl) | |
5419 && SYMBOLP (CODING_ID_EOL_TYPE (coding->id)) | |
5420 && ! EQ (CODING_ID_EOL_TYPE (coding->id), Qunix)) | |
5421 decode_eol (coding); | |
5422 | |
5423 coding->carryover_bytes = 0; | |
5424 if (coding->consumed < coding->src_bytes) | |
5425 { | |
5426 int nbytes = coding->src_bytes - coding->consumed; | |
5427 unsigned char *src; | |
5428 | |
5429 coding_set_source (coding); | |
5430 coding_set_destination (coding); | |
5431 src = coding->source + coding->consumed; | |
5432 | |
5433 if (coding->mode & CODING_MODE_LAST_BLOCK) | |
29725
2bc397e9b09a
(setup_coding_system) <4>: Reset member `cr_carryover'.
Kenichi Handa <handa@m17n.org>
parents:
29663
diff
changeset
|
5434 { |
88365 | 5435 /* Flush out unprocessed data as binary chars. We are sure |
5436 that the number of data is less than the size of | |
5437 coding->charbuf. */ | |
5438 int *charbuf = coding->charbuf; | |
5439 | |
5440 while (nbytes-- > 0) | |
5441 { | |
5442 int c = *src++; | |
5443 *charbuf++ = (c & 0x80 ? - c : c); | |
5444 } | |
5445 produce_chars (coding); | |
29725
2bc397e9b09a
(setup_coding_system) <4>: Reset member `cr_carryover'.
Kenichi Handa <handa@m17n.org>
parents:
29663
diff
changeset
|
5446 } |
88365 | 5447 else |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
5448 { |
88365 | 5449 /* Record unprocessed bytes in coding->carryover. We are |
5450 sure that the number of data is less than the size of | |
5451 coding->carryover. */ | |
5452 unsigned char *p = coding->carryover; | |
5453 | |
5454 coding->carryover_bytes = nbytes; | |
5455 while (nbytes-- > 0) | |
5456 *p++ = *src++; | |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
5457 } |
88365 | 5458 coding->consumed = coding->src_bytes; |
5459 } | |
5460 | |
5461 if (BUFFERP (coding->dst_object)) | |
5462 { | |
5463 record_insert (coding->dst_pos, coding->produced_char); | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
5464 } |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
5465 |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
5466 return coding->result; |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
5467 } |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
5468 |
88365 | 5469 static void |
5470 consume_chars (coding) | |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
5471 struct coding_system *coding; |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
5472 { |
88365 | 5473 int *buf = coding->charbuf; |
5474 /* -1 is to compensate for CRLF. */ | |
5475 int *buf_end = coding->charbuf + coding->charbuf_size - 1; | |
5476 unsigned char *src = coding->source + coding->consumed; | |
5477 int pos = coding->src_pos + coding->consumed_char; | |
5478 int end_pos = coding->src_pos + coding->src_chars; | |
5479 int multibytep = coding->src_multibyte; | |
5480 Lisp_Object eol_type; | |
5481 int c; | |
5482 int start, end, stop; | |
5483 Lisp_Object object, prop; | |
5484 | |
5485 eol_type = CODING_ID_EOL_TYPE (coding->id); | |
5486 if (VECTORP (eol_type)) | |
5487 eol_type = Qunix; | |
5488 | |
5489 object = coding->src_object; | |
5490 | |
5491 /* Note: composition handling is not yet implemented. */ | |
5492 coding->common_flags &= ~CODING_ANNOTATE_COMPOSITION_MASK; | |
5493 | |
5494 if (coding->common_flags & CODING_ANNOTATE_COMPOSITION_MASK | |
5495 && find_composition (pos, end_pos, &start, &end, &prop, object) | |
5496 && end <= end_pos | |
5497 && (start >= pos | |
5498 || (find_composition (end, end_pos, &start, &end, &prop, object) | |
5499 && end <= end_pos))) | |
5500 stop = start; | |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
5501 else |
88365 | 5502 stop = end_pos; |
5503 | |
5504 while (buf < buf_end) | |
5505 { | |
5506 if (pos == stop) | |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
5507 { |
88365 | 5508 int *p; |
5509 | |
5510 if (pos == end_pos) | |
5511 break; | |
5512 p = save_composition_data (buf, buf_end, prop); | |
5513 if (p == NULL) | |
5514 break; | |
5515 buf = p; | |
5516 if (find_composition (end, end_pos, &start, &end, &prop, object) | |
5517 && end <= end_pos) | |
5518 stop = start; | |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
5519 else |
88365 | 5520 stop = end_pos; |
5521 } | |
5522 | |
5523 if (! multibytep) | |
5524 c = *src++; | |
5525 else | |
5526 c = STRING_CHAR_ADVANCE (src); | |
5527 if ((c == '\r') && (coding->mode & CODING_MODE_SELECTIVE_DISPLAY)) | |
5528 c = '\n'; | |
5529 if (! EQ (eol_type, Qunix)) | |
5530 { | |
5531 if (c == '\n') | |
5532 { | |
5533 if (EQ (eol_type, Qdos)) | |
5534 *buf++ = '\r'; | |
5535 else | |
5536 c = '\r'; | |
5537 } | |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
5538 } |
88365 | 5539 *buf++ = c; |
5540 pos++; | |
5541 } | |
5542 | |
5543 coding->consumed = src - coding->source; | |
5544 coding->consumed_char = pos - coding->src_pos; | |
5545 coding->charbuf_used = buf - coding->charbuf; | |
5546 coding->chars_at_source = 0; | |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
5547 } |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
5548 |
88365 | 5549 |
5550 /* Encode the text at CODING->src_object into CODING->dst_object. | |
5551 CODING->src_object is a buffer or a string. | |
5552 CODING->dst_object is a buffer or nil. | |
5553 | |
5554 If CODING->src_object is a buffer, it must be the current buffer. | |
5555 In this case, if CODING->src_pos is positive, it is a position of | |
5556 the source text in the buffer, otherwise. the source text is in the | |
5557 gap area of the buffer, and coding->src_pos specifies the offset of | |
5558 the text from GPT (which must be the same as PT). If this is the | |
5559 same buffer as CODING->dst_object, CODING->src_pos must be | |
5560 negative and CODING should not have `pre-write-conversion'. | |
5561 | |
5562 If CODING->src_object is a string, CODING should not have | |
5563 `pre-write-conversion'. | |
5564 | |
5565 If CODING->dst_object is a buffer, the encoded data is inserted at | |
5566 the current point of that buffer. | |
5567 | |
5568 If CODING->dst_object is nil, the encoded data is placed at the | |
5569 memory area specified by CODING->destination. */ | |
5570 | |
5571 static int | |
5572 encode_coding (coding) | |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
5573 struct coding_system *coding; |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
5574 { |
88365 | 5575 Lisp_Object attrs; |
5576 | |
5577 attrs = CODING_ID_ATTRS (coding->id); | |
5578 | |
5579 if (BUFFERP (coding->dst_object)) | |
5580 { | |
5581 set_buffer_internal (XBUFFER (coding->dst_object)); | |
5582 coding->dst_multibyte | |
5583 = ! NILP (current_buffer->enable_multibyte_characters); | |
5584 } | |
5585 | |
5586 coding->consumed = coding->consumed_char = 0; | |
5587 coding->produced = coding->produced_char = 0; | |
5588 coding->result = CODING_RESULT_SUCCESS; | |
5589 coding->errors = 0; | |
5590 | |
5591 ALLOC_CONVERSION_WORK_AREA (coding); | |
5592 | |
5593 do { | |
5594 coding_set_source (coding); | |
5595 consume_chars (coding); | |
5596 | |
5597 if (!NILP (CODING_ATTR_ENCODE_TBL (attrs))) | |
5598 translate_chars (CODING_ATTR_ENCODE_TBL (attrs), coding); | |
5599 | |
5600 coding_set_destination (coding); | |
5601 (*(coding->encoder)) (coding); | |
5602 } while (coding->consumed_char < coding->src_chars); | |
5603 | |
5604 if (BUFFERP (coding->dst_object)) | |
5605 insert_from_gap (coding->produced_char, coding->produced); | |
5606 | |
5607 return (coding->result); | |
5608 } | |
5609 | |
5610 /* Work buffer */ | |
5611 | |
5612 /* List of currently used working buffer. */ | |
5613 Lisp_Object Vcode_conversion_work_buf_list; | |
5614 | |
5615 /* A working buffer used by the top level conversion. */ | |
5616 Lisp_Object Vcode_conversion_reused_work_buf; | |
5617 | |
5618 | |
5619 /* Return a working buffer that can be freely used by the following | |
5620 code conversion. MULTIBYTEP specifies the multibyteness of the | |
5621 buffer. */ | |
5622 | |
5623 Lisp_Object | |
5624 make_conversion_work_buffer (multibytep) | |
5625 int multibytep; | |
5626 { | |
5627 struct buffer *current = current_buffer; | |
5628 Lisp_Object buf; | |
5629 | |
5630 if (NILP (Vcode_conversion_work_buf_list)) | |
5631 { | |
5632 if (NILP (Vcode_conversion_reused_work_buf)) | |
5633 Vcode_conversion_reused_work_buf | |
5634 = Fget_buffer_create (build_string (" *code-conversion-work*")); | |
5635 Vcode_conversion_work_buf_list | |
5636 = Fcons (Vcode_conversion_reused_work_buf, Qnil); | |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
5637 } |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
5638 else |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
5639 { |
88365 | 5640 int depth = Flength (Vcode_conversion_work_buf_list); |
5641 char str[128]; | |
5642 | |
5643 sprintf (str, " *code-conversion-work*<%d>", depth); | |
5644 Vcode_conversion_work_buf_list | |
5645 = Fcons (Fget_buffer_create (build_string (str)), | |
5646 Vcode_conversion_work_buf_list); | |
5647 } | |
5648 | |
5649 buf = XCAR (Vcode_conversion_work_buf_list); | |
5650 set_buffer_internal (XBUFFER (buf)); | |
5651 current_buffer->undo_list = Qt; | |
5652 Ferase_buffer (); | |
5653 Fset_buffer_multibyte (multibytep ? Qt : Qnil); | |
5654 set_buffer_internal (current); | |
5655 return buf; | |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
5656 } |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
5657 |
88365 | 5658 static struct coding_system *saved_coding; |
5659 | |
5660 Lisp_Object | |
5661 code_conversion_restore (info) | |
5662 Lisp_Object info; | |
26067
f54ca66e2571
(code_convert_string): Add record_unwind_protect to
Kenichi Handa <handa@m17n.org>
parents:
25860
diff
changeset
|
5663 { |
88365 | 5664 int depth = Flength (Vcode_conversion_work_buf_list); |
5665 Lisp_Object buf; | |
5666 | |
5667 if (depth > 0) | |
5668 { | |
5669 buf = XCAR (Vcode_conversion_work_buf_list); | |
5670 Vcode_conversion_work_buf_list = XCDR (Vcode_conversion_work_buf_list); | |
5671 if (depth > 1 && !NILP (Fbuffer_live_p (buf))) | |
5672 Fkill_buffer (buf); | |
5673 } | |
5674 | |
5675 if (saved_coding->dst_object == Qt | |
5676 && saved_coding->destination) | |
5677 xfree (saved_coding->destination); | |
5678 | |
5679 return save_excursion_restore (info); | |
26067
f54ca66e2571
(code_convert_string): Add record_unwind_protect to
Kenichi Handa <handa@m17n.org>
parents:
25860
diff
changeset
|
5680 } |
f54ca66e2571
(code_convert_string): Add record_unwind_protect to
Kenichi Handa <handa@m17n.org>
parents:
25860
diff
changeset
|
5681 |
88365 | 5682 |
5683 int | |
5684 decode_coding_gap (coding, chars, bytes) | |
26847 | 5685 struct coding_system *coding; |
88365 | 5686 EMACS_INT chars, bytes; |
5687 { | |
5688 int count = specpdl_ptr - specpdl; | |
5689 | |
5690 saved_coding = coding; | |
5691 record_unwind_protect (code_conversion_restore, save_excursion_save ()); | |
5692 | |
5693 coding->src_object = Fcurrent_buffer (); | |
5694 coding->src_chars = chars; | |
5695 coding->src_bytes = bytes; | |
5696 coding->src_pos = -chars; | |
5697 coding->src_pos_byte = -bytes; | |
5698 coding->src_multibyte = chars < bytes; | |
5699 coding->dst_object = coding->src_object; | |
5700 coding->dst_pos = PT; | |
5701 coding->dst_pos_byte = PT_BYTE; | |
5702 | |
5703 if (CODING_REQUIRE_DETECTION (coding)) | |
5704 detect_coding (coding); | |
5705 | |
5706 decode_coding (coding); | |
5707 | |
5708 unbind_to (count, Qnil); | |
5709 return coding->result; | |
5710 } | |
5711 | |
5712 int | |
5713 encode_coding_gap (coding, chars, bytes) | |
5714 struct coding_system *coding; | |
5715 EMACS_INT chars, bytes; | |
26847 | 5716 { |
88365 | 5717 int count = specpdl_ptr - specpdl; |
5718 Lisp_Object buffer; | |
5719 | |
5720 saved_coding = coding; | |
5721 record_unwind_protect (code_conversion_restore, save_excursion_save ()); | |
5722 | |
5723 buffer = Fcurrent_buffer (); | |
5724 coding->src_object = buffer; | |
5725 coding->src_chars = chars; | |
5726 coding->src_bytes = bytes; | |
5727 coding->src_pos = -chars; | |
5728 coding->src_pos_byte = -bytes; | |
5729 coding->src_multibyte = chars < bytes; | |
5730 coding->dst_object = coding->src_object; | |
5731 coding->dst_pos = PT; | |
5732 coding->dst_pos_byte = PT_BYTE; | |
5733 | |
5734 encode_coding (coding); | |
5735 | |
5736 unbind_to (count, Qnil); | |
5737 return coding->result; | |
26847 | 5738 } |
5739 | |
88365 | 5740 |
5741 /* Decode the text in the range FROM/FROM_BYTE and TO/TO_BYTE in | |
5742 SRC_OBJECT into DST_OBJECT by coding context CODING. | |
5743 | |
5744 SRC_OBJECT is a buffer, a string, or Qnil. | |
5745 | |
5746 If it is a buffer, the text is at point of the buffer. FROM and TO | |
5747 are positions in the buffer. | |
5748 | |
5749 If it is a string, the text is at the beginning of the string. | |
5750 FROM and TO are indices to the string. | |
5751 | |
5752 If it is nil, the text is at coding->source. FROM and TO are | |
5753 indices to coding->source. | |
5754 | |
5755 DST_OBJECT is a buffer, Qt, or Qnil. | |
5756 | |
5757 If it is a buffer, the decoded text is inserted at point of the | |
5758 buffer. If the buffer is the same as SRC_OBJECT, the source text | |
5759 is deleted. | |
5760 | |
5761 If it is Qt, a string is made from the decoded text, and | |
5762 set in CODING->dst_object. | |
5763 | |
5764 If it is Qnil, the decoded text is stored at CODING->destination. | |
5765 The called must allocate CODING->dst_bytes bytes at | |
5766 CODING->destination by xmalloc. If the decoded text is longer than | |
5767 CODING->dst_bytes, CODING->destination is relocated by xrealloc. | |
5768 */ | |
26847 | 5769 |
29275
b4ea9178e480
(DECODE_COMPOSITION_START): If coding->cmp_data is not
Kenichi Handa <handa@m17n.org>
parents:
29247
diff
changeset
|
5770 void |
88365 | 5771 decode_coding_object (coding, src_object, from, from_byte, to, to_byte, |
5772 dst_object) | |
26847 | 5773 struct coding_system *coding; |
88365 | 5774 Lisp_Object src_object; |
5775 EMACS_INT from, from_byte, to, to_byte; | |
5776 Lisp_Object dst_object; | |
26847 | 5777 { |
88365 | 5778 int count = specpdl_ptr - specpdl; |
5779 unsigned char *destination; | |
5780 EMACS_INT dst_bytes; | |
5781 EMACS_INT chars = to - from; | |
5782 EMACS_INT bytes = to_byte - from_byte; | |
5783 Lisp_Object attrs; | |
5784 | |
5785 saved_coding = coding; | |
5786 record_unwind_protect (code_conversion_restore, save_excursion_save ()); | |
5787 | |
5788 if (NILP (dst_object)) | |
5789 { | |
5790 destination = coding->destination; | |
5791 dst_bytes = coding->dst_bytes; | |
5792 } | |
5793 | |
5794 coding->src_object = src_object; | |
5795 coding->src_chars = chars; | |
5796 coding->src_bytes = bytes; | |
5797 coding->src_multibyte = chars < bytes; | |
5798 | |
5799 if (STRINGP (src_object)) | |
5800 { | |
5801 coding->src_pos = from; | |
5802 coding->src_pos_byte = from_byte; | |
5803 } | |
5804 else if (BUFFERP (src_object)) | |
5805 { | |
5806 set_buffer_internal (XBUFFER (src_object)); | |
5807 if (from != GPT) | |
5808 move_gap_both (from, from_byte); | |
5809 if (EQ (src_object, dst_object)) | |
26847 | 5810 { |
88365 | 5811 TEMP_SET_PT_BOTH (from, from_byte); |
5812 del_range_both (from, from_byte, to, to_byte, 1); | |
5813 coding->src_pos = -chars; | |
5814 coding->src_pos_byte = -bytes; | |
20931
068eb408c911
(decode_coding_iso2022): Update coding->fake_multibyte.
Kenichi Handa <handa@m17n.org>
parents:
20803
diff
changeset
|
5815 } |
42661
e85e4d9494b1
(code_convert_region): Don't copy old text if undo disabled.
Richard M. Stallman <rms@gnu.org>
parents:
42105
diff
changeset
|
5816 else |
e85e4d9494b1
(code_convert_region): Don't copy old text if undo disabled.
Richard M. Stallman <rms@gnu.org>
parents:
42105
diff
changeset
|
5817 { |
88365 | 5818 coding->src_pos = from; |
5819 coding->src_pos_byte = from_byte; | |
29985
c17e78d8c720
(code_convert_region): Even if the length of text is
Kenichi Handa <handa@m17n.org>
parents:
29932
diff
changeset
|
5820 } |
88365 | 5821 } |
5822 | |
5823 if (CODING_REQUIRE_DETECTION (coding)) | |
5824 detect_coding (coding); | |
5825 attrs = CODING_ID_ATTRS (coding->id); | |
5826 | |
5827 if (! NILP (CODING_ATTR_POST_READ (attrs)) | |
5828 || EQ (dst_object, Qt)) | |
5829 { | |
5830 coding->dst_object = make_conversion_work_buffer (1); | |
5831 coding->dst_pos = BEG; | |
5832 coding->dst_pos_byte = BEG_BYTE; | |
5833 coding->dst_multibyte = 1; | |
5834 } | |
5835 else if (BUFFERP (dst_object)) | |
5836 { | |
5837 coding->dst_object = dst_object; | |
5838 coding->dst_pos = BUF_PT (XBUFFER (dst_object)); | |
5839 coding->dst_pos_byte = BUF_PT_BYTE (XBUFFER (dst_object)); | |
5840 coding->dst_multibyte | |
5841 = ! NILP (XBUFFER (dst_object)->enable_multibyte_characters); | |
5842 } | |
5843 else | |
5844 { | |
5845 coding->dst_object = Qnil; | |
5846 coding->dst_multibyte = 1; | |
5847 } | |
5848 | |
5849 decode_coding (coding); | |
5850 | |
5851 if (BUFFERP (coding->dst_object)) | |
5852 set_buffer_internal (XBUFFER (coding->dst_object)); | |
5853 | |
5854 if (! NILP (CODING_ATTR_POST_READ (attrs))) | |
5855 { | |
5856 struct gcpro gcpro1, gcpro2; | |
5857 EMACS_INT prev_Z = Z, prev_Z_BYTE = Z_BYTE; | |
5858 Lisp_Object val; | |
5859 | |
5860 GCPRO2 (coding->src_object, coding->dst_object); | |
5861 val = call1 (CODING_ATTR_POST_READ (attrs), | |
5862 make_number (coding->produced_char)); | |
5863 UNGCPRO; | |
5864 CHECK_NATNUM (val); | |
5865 coding->produced_char += Z - prev_Z; | |
5866 coding->produced += Z_BYTE - prev_Z_BYTE; | |
5867 } | |
5868 | |
5869 if (EQ (dst_object, Qt)) | |
5870 { | |
5871 coding->dst_object = Fbuffer_string (); | |
5872 } | |
5873 else if (NILP (dst_object) && BUFFERP (coding->dst_object)) | |
5874 { | |
5875 set_buffer_internal (XBUFFER (coding->dst_object)); | |
5876 if (dst_bytes < coding->produced) | |
42105
09cc243e2d14
(code_convert_region): Update coding->cmp_data->char_offset
Richard M. Stallman <rms@gnu.org>
parents:
42104
diff
changeset
|
5877 { |
88365 | 5878 destination |
5879 = (unsigned char *) xrealloc (destination, coding->produced); | |
5880 if (! destination) | |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
5881 { |
88365 | 5882 coding->result = CODING_RESULT_INSUFFICIENT_DST; |
5883 unbind_to (count, Qnil); | |
5884 return; | |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
5885 } |
88365 | 5886 if (BEGV < GPT && GPT < BEGV + coding->produced_char) |
5887 move_gap_both (BEGV, BEGV_BYTE); | |
5888 bcopy (BEGV_ADDR, destination, coding->produced); | |
5889 coding->destination = destination; | |
23279
ca159e828a68
(ccl_coding_driver): If ccl_driver is interrupted by a
Kenichi Handa <handa@m17n.org>
parents:
23258
diff
changeset
|
5890 } |
88365 | 5891 } |
5892 | |
5893 unbind_to (count, Qnil); | |
5894 } | |
5895 | |
5896 | |
5897 void | |
5898 encode_coding_object (coding, src_object, from, from_byte, to, to_byte, | |
5899 dst_object) | |
5900 struct coding_system *coding; | |
5901 Lisp_Object src_object; | |
5902 EMACS_INT from, from_byte, to, to_byte; | |
5903 Lisp_Object dst_object; | |
5904 { | |
5905 int count = specpdl_ptr - specpdl; | |
5906 EMACS_INT chars = to - from; | |
5907 EMACS_INT bytes = to_byte - from_byte; | |
5908 Lisp_Object attrs; | |
5909 | |
5910 saved_coding = coding; | |
5911 record_unwind_protect (code_conversion_restore, save_excursion_save ()); | |
5912 | |
5913 coding->src_object = src_object; | |
5914 coding->src_chars = chars; | |
5915 coding->src_bytes = bytes; | |
5916 coding->src_multibyte = chars < bytes; | |
5917 | |
5918 attrs = CODING_ID_ATTRS (coding->id); | |
5919 | |
5920 if (! NILP (CODING_ATTR_PRE_WRITE (attrs))) | |
21062
839b22ad1e42
(code_convert_region): Handle the case that codes
Kenichi Handa <handa@m17n.org>
parents:
20999
diff
changeset
|
5921 { |
21140
179c73d91f70
(code_convert_region): Adjusted for the change of
Kenichi Handa <handa@m17n.org>
parents:
21132
diff
changeset
|
5922 Lisp_Object val; |
23514
7bad909cd6f1
(setup_coding_system): Fix setting up
Kenichi Handa <handa@m17n.org>
parents:
23475
diff
changeset
|
5923 |
88365 | 5924 coding->src_object = make_conversion_work_buffer (coding->src_multibyte); |
5925 set_buffer_internal (XBUFFER (coding->src_object)); | |
5926 if (STRINGP (src_object)) | |
5927 insert_from_string (src_object, from, from_byte, chars, bytes, 0); | |
5928 else if (BUFFERP (src_object)) | |
5929 insert_from_buffer (XBUFFER (src_object), from, chars, 0); | |
5930 else | |
5931 insert_1_both (coding->source + from, chars, bytes, 0, 0, 0); | |
5932 | |
5933 if (EQ (src_object, dst_object)) | |
5934 { | |
5935 set_buffer_internal (XBUFFER (src_object)); | |
5936 del_range_both (from, from_byte, to, to_byte, 1); | |
5937 set_buffer_internal (XBUFFER (coding->src_object)); | |
5938 } | |
5939 | |
5940 val = call2 (CODING_ATTR_PRE_WRITE (attrs), | |
5941 make_number (1), make_number (chars)); | |
5942 CHECK_NATNUM (val); | |
5943 if (BEG != GPT) | |
5944 move_gap_both (BEG, BEG_BYTE); | |
5945 coding->src_chars = Z - BEG; | |
5946 coding->src_bytes = Z_BYTE - BEG_BYTE; | |
5947 coding->src_pos = BEG; | |
5948 coding->src_pos_byte = BEG_BYTE; | |
5949 coding->src_multibyte = Z < Z_BYTE; | |
5950 } | |
5951 else if (STRINGP (src_object)) | |
5952 { | |
5953 coding->src_pos = from; | |
5954 coding->src_pos_byte = from_byte; | |
5955 } | |
5956 else if (BUFFERP (src_object)) | |
5957 { | |
5958 set_buffer_internal (XBUFFER (src_object)); | |
5959 if (from != GPT) | |
5960 move_gap_both (from, from_byte); | |
5961 if (EQ (src_object, dst_object)) | |
5962 { | |
5963 del_range_both (from, from_byte, to, to_byte, 1); | |
5964 coding->src_pos = -chars; | |
5965 coding->src_pos_byte = -bytes; | |
5966 } | |
23514
7bad909cd6f1
(setup_coding_system): Fix setting up
Kenichi Handa <handa@m17n.org>
parents:
23475
diff
changeset
|
5967 else |
88365 | 5968 { |
5969 coding->src_pos = from; | |
5970 coding->src_pos_byte = from_byte; | |
5971 } | |
5972 } | |
5973 | |
5974 if (BUFFERP (dst_object)) | |
5975 { | |
5976 coding->dst_object = dst_object; | |
5977 coding->dst_pos = BUF_PT (XBUFFER (dst_object)); | |
5978 coding->dst_pos_byte = BUF_PT_BYTE (XBUFFER (dst_object)); | |
5979 coding->dst_multibyte | |
5980 = ! NILP (XBUFFER (dst_object)->enable_multibyte_characters); | |
5981 } | |
5982 else if (EQ (dst_object, Qt)) | |
5983 { | |
5984 coding->dst_object = Qnil; | |
5985 coding->destination = (unsigned char *) xmalloc (coding->src_chars); | |
5986 coding->dst_bytes = coding->src_chars; | |
5987 coding->dst_multibyte = 0; | |
5988 } | |
5989 else | |
5990 { | |
5991 coding->dst_object = Qnil; | |
5992 coding->dst_multibyte = 0; | |
5993 } | |
5994 | |
5995 encode_coding (coding); | |
5996 | |
5997 if (EQ (dst_object, Qt)) | |
5998 { | |
5999 if (BUFFERP (coding->dst_object)) | |
6000 coding->dst_object = Fbuffer_string (); | |
6001 else | |
6002 { | |
6003 coding->dst_object | |
6004 = make_unibyte_string ((char *) coding->destination, | |
6005 coding->produced); | |
6006 xfree (coding->destination); | |
6007 } | |
6008 } | |
6009 | |
6010 unbind_to (count, Qnil); | |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
6011 } |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
6012 |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
6013 |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
6014 Lisp_Object |
88365 | 6015 preferred_coding_system () |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
6016 { |
88365 | 6017 int id = coding_categories[coding_priorities[0]].id; |
6018 | |
6019 return CODING_ID_NAME (id); | |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
6020 } |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
6021 |
17052 | 6022 |
6023 #ifdef emacs | |
22874
b133f07a76db
(Qvalid_codes): New variable.
Kenichi Handa <handa@m17n.org>
parents:
22812
diff
changeset
|
6024 /*** 8. Emacs Lisp library functions ***/ |
17052 | 6025 |
6026 DEFUN ("coding-system-p", Fcoding_system_p, Scoding_system_p, 1, 1, 0, | |
40713
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6027 doc: /* Return t if OBJECT is nil or a coding-system. |
88365 | 6028 See the documentation of `define-coding-system' for information |
40713
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6029 about coding-system objects. */) |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6030 (obj) |
17052 | 6031 Lisp_Object obj; |
6032 { | |
88365 | 6033 return ((NILP (obj) || CODING_SYSTEM_P (obj)) ? Qt : Qnil); |
17052 | 6034 } |
6035 | |
17717
4891aaecc5cc
(Fread_coding_system, Fread_non_nil_coding_system):
Richard M. Stallman <rms@gnu.org>
parents:
17485
diff
changeset
|
6036 DEFUN ("read-non-nil-coding-system", Fread_non_nil_coding_system, |
4891aaecc5cc
(Fread_coding_system, Fread_non_nil_coding_system):
Richard M. Stallman <rms@gnu.org>
parents:
17485
diff
changeset
|
6037 Sread_non_nil_coding_system, 1, 1, 0, |
40713
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6038 doc: /* Read a coding system from the minibuffer, prompting with string PROMPT. */) |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6039 (prompt) |
17052 | 6040 Lisp_Object prompt; |
6041 { | |
17119
2cfb31c15ced
(create_process, Fopen_network_stream): Typo in indexes
Kenichi Handa <handa@m17n.org>
parents:
17071
diff
changeset
|
6042 Lisp_Object val; |
17717
4891aaecc5cc
(Fread_coding_system, Fread_non_nil_coding_system):
Richard M. Stallman <rms@gnu.org>
parents:
17485
diff
changeset
|
6043 do |
4891aaecc5cc
(Fread_coding_system, Fread_non_nil_coding_system):
Richard M. Stallman <rms@gnu.org>
parents:
17485
diff
changeset
|
6044 { |
20105
c017642863c2
(Qcoding_system_spec): Deleted.
Kenichi Handa <handa@m17n.org>
parents:
19824
diff
changeset
|
6045 val = Fcompleting_read (prompt, Vcoding_system_alist, Qnil, |
c017642863c2
(Qcoding_system_spec): Deleted.
Kenichi Handa <handa@m17n.org>
parents:
19824
diff
changeset
|
6046 Qt, Qnil, Qcoding_system_history, Qnil, Qnil); |
17717
4891aaecc5cc
(Fread_coding_system, Fread_non_nil_coding_system):
Richard M. Stallman <rms@gnu.org>
parents:
17485
diff
changeset
|
6047 } |
4891aaecc5cc
(Fread_coding_system, Fread_non_nil_coding_system):
Richard M. Stallman <rms@gnu.org>
parents:
17485
diff
changeset
|
6048 while (XSTRING (val)->size == 0); |
17119
2cfb31c15ced
(create_process, Fopen_network_stream): Typo in indexes
Kenichi Handa <handa@m17n.org>
parents:
17071
diff
changeset
|
6049 return (Fintern (val, Qnil)); |
17052 | 6050 } |
6051 | |
19758
49a1662b68dd
(Fread_coding_system): New optional arg DEFAULT_CODING_SYSTEM.
Richard M. Stallman <rms@gnu.org>
parents:
19750
diff
changeset
|
6052 DEFUN ("read-coding-system", Fread_coding_system, Sread_coding_system, 1, 2, 0, |
40713
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6053 doc: /* Read a coding system from the minibuffer, prompting with string PROMPT. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6054 If the user enters null input, return second argument DEFAULT-CODING-SYSTEM. */) |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6055 (prompt, default_coding_system) |
19758
49a1662b68dd
(Fread_coding_system): New optional arg DEFAULT_CODING_SYSTEM.
Richard M. Stallman <rms@gnu.org>
parents:
19750
diff
changeset
|
6056 Lisp_Object prompt, default_coding_system; |
17052 | 6057 { |
19747
bed06df9cbc5
(setup_coding_system, Ffind_operation_coding_system)
Richard M. Stallman <rms@gnu.org>
parents:
19743
diff
changeset
|
6058 Lisp_Object val; |
19758
49a1662b68dd
(Fread_coding_system): New optional arg DEFAULT_CODING_SYSTEM.
Richard M. Stallman <rms@gnu.org>
parents:
19750
diff
changeset
|
6059 if (SYMBOLP (default_coding_system)) |
49a1662b68dd
(Fread_coding_system): New optional arg DEFAULT_CODING_SYSTEM.
Richard M. Stallman <rms@gnu.org>
parents:
19750
diff
changeset
|
6060 XSETSTRING (default_coding_system, XSYMBOL (default_coding_system)->name); |
20105
c017642863c2
(Qcoding_system_spec): Deleted.
Kenichi Handa <handa@m17n.org>
parents:
19824
diff
changeset
|
6061 val = Fcompleting_read (prompt, Vcoding_system_alist, Qnil, |
19758
49a1662b68dd
(Fread_coding_system): New optional arg DEFAULT_CODING_SYSTEM.
Richard M. Stallman <rms@gnu.org>
parents:
19750
diff
changeset
|
6062 Qt, Qnil, Qcoding_system_history, |
49a1662b68dd
(Fread_coding_system): New optional arg DEFAULT_CODING_SYSTEM.
Richard M. Stallman <rms@gnu.org>
parents:
19750
diff
changeset
|
6063 default_coding_system, Qnil); |
17119
2cfb31c15ced
(create_process, Fopen_network_stream): Typo in indexes
Kenichi Handa <handa@m17n.org>
parents:
17071
diff
changeset
|
6064 return (XSTRING (val)->size == 0 ? Qnil : Fintern (val, Qnil)); |
17052 | 6065 } |
6066 | |
6067 DEFUN ("check-coding-system", Fcheck_coding_system, Scheck_coding_system, | |
6068 1, 1, 0, | |
40713
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6069 doc: /* Check validity of CODING-SYSTEM. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6070 If valid, return CODING-SYSTEM, else signal a `coding-system-error' error. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6071 It is valid if it is a symbol with a non-nil `coding-system' property. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6072 The value of property should be a vector of length 5. */) |
88365 | 6073 (coding_system) |
17052 | 6074 Lisp_Object coding_system; |
6075 { | |
40656
cdfd4d09b79a
Update usage of CHECK_ macros (remove unused second argument).
Pavel Janík <Pavel@Janik.cz>
parents:
40461
diff
changeset
|
6076 CHECK_SYMBOL (coding_system); |
17052 | 6077 if (!NILP (Fcoding_system_p (coding_system))) |
6078 return coding_system; | |
6079 while (1) | |
18180
5f4c4da24e75
(Vcoding_system_alist): Deleted.
Kenichi Handa <handa@m17n.org>
parents:
18002
diff
changeset
|
6080 Fsignal (Qcoding_system_error, Fcons (coding_system, Qnil)); |
17052 | 6081 } |
88365 | 6082 |
20680
dd46027e8412
(code_convert_region): Always count chars inserted
Richard M. Stallman <rms@gnu.org>
parents:
20668
diff
changeset
|
6083 |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
6084 Lisp_Object |
88365 | 6085 detect_coding_system (src, src_bytes, highest, multibytep, coding_system) |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
6086 unsigned char *src; |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
6087 int src_bytes, highest; |
34531
37f85e931855
(ONE_MORE_BYTE_CHECK_MULTIBYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34197
diff
changeset
|
6088 int multibytep; |
88365 | 6089 Lisp_Object coding_system; |
17052 | 6090 { |
88365 | 6091 unsigned char *src_end = src + src_bytes; |
6092 int mask = CATEGORY_MASK_ANY; | |
6093 int detected = 0; | |
6094 int c, i; | |
6095 Lisp_Object attrs, eol_type; | |
6096 Lisp_Object val; | |
6097 struct coding_system coding; | |
6098 | |
6099 if (NILP (coding_system)) | |
6100 coding_system = Qundecided; | |
6101 setup_coding_system (coding_system, &coding); | |
6102 attrs = CODING_ID_ATTRS (coding.id); | |
6103 eol_type = CODING_ID_EOL_TYPE (coding.id); | |
6104 | |
6105 coding.source = src; | |
6106 coding.src_bytes = src_bytes; | |
6107 coding.src_multibyte = multibytep; | |
6108 coding.consumed = 0; | |
6109 | |
6110 if (XINT (CODING_ATTR_CATEGORY (attrs)) != coding_category_undecided) | |
6111 { | |
6112 mask = 1 << XINT (CODING_ATTR_CATEGORY (attrs)); | |
6113 } | |
6114 else | |
6115 { | |
6116 coding_system = Qnil; | |
6117 for (; src < src_end; src++) | |
17052 | 6118 { |
88365 | 6119 c = *src; |
6120 if (c & 0x80 || (c < 0x20 && (c == ISO_CODE_ESC | |
6121 || c == ISO_CODE_SI | |
6122 || c == ISO_CODE_SO))) | |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
6123 break; |
17052 | 6124 } |
88365 | 6125 coding.head_ascii = src - coding.source; |
6126 | |
6127 if (src < src_end) | |
6128 for (i = 0; i < coding_category_raw_text; i++) | |
6129 { | |
6130 enum coding_category category = coding_priorities[i]; | |
6131 struct coding_system *this = coding_categories + category; | |
6132 | |
6133 if (category >= coding_category_raw_text | |
6134 || detected & (1 << category)) | |
6135 continue; | |
6136 | |
6137 if (this->id < 0) | |
6138 { | |
6139 /* No coding system of this category is defined. */ | |
6140 mask &= ~(1 << category); | |
6141 } | |
6142 else | |
6143 { | |
6144 detected |= detected_mask[category]; | |
6145 if ((*(coding_categories[category].detector)) (&coding, &mask) | |
6146 && highest) | |
6147 { | |
6148 mask &= detected_mask[category]; | |
6149 break; | |
6150 } | |
6151 } | |
6152 } | |
6153 } | |
6154 | |
6155 if (!mask) | |
6156 val = Fcons (make_number (coding_category_raw_text), Qnil); | |
6157 else if (mask == CATEGORY_MASK_ANY) | |
6158 val = Fcons (make_number (coding_category_undecided), Qnil); | |
6159 else if (highest) | |
6160 { | |
6161 for (i = 0; i < coding_category_raw_text; i++) | |
6162 if (mask & (1 << coding_priorities[i])) | |
6163 { | |
6164 val = Fcons (make_number (coding_priorities[i]), Qnil); | |
6165 break; | |
6166 } | |
6167 } | |
6168 else | |
6169 { | |
6170 val = Qnil; | |
6171 for (i = coding_category_raw_text - 1; i >= 0; i--) | |
6172 if (mask & (1 << coding_priorities[i])) | |
6173 val = Fcons (make_number (coding_priorities[i]), val); | |
6174 } | |
6175 | |
6176 { | |
6177 int one_byte_eol = -1, two_byte_eol = -1; | |
6178 Lisp_Object tail; | |
6179 | |
6180 for (tail = val; CONSP (tail); tail = XCDR (tail)) | |
6181 { | |
6182 struct coding_system *this | |
6183 = (NILP (coding_system) ? coding_categories + XINT (XCAR (tail)) | |
6184 : &coding); | |
6185 int this_eol; | |
6186 | |
6187 attrs = CODING_ID_ATTRS (this->id); | |
6188 eol_type = CODING_ID_EOL_TYPE (this->id); | |
6189 XSETCAR (tail, CODING_ID_NAME (this->id)); | |
6190 if (VECTORP (eol_type)) | |
6191 { | |
6192 if (EQ (CODING_ATTR_TYPE (attrs), Qutf_16)) | |
6193 { | |
6194 if (two_byte_eol < 0) | |
6195 two_byte_eol = detect_eol (this, coding.source, src_bytes); | |
6196 this_eol = two_byte_eol; | |
6197 } | |
6198 else | |
6199 { | |
6200 if (one_byte_eol < 0) | |
6201 one_byte_eol =detect_eol (this, coding.source, src_bytes); | |
6202 this_eol = one_byte_eol; | |
6203 } | |
6204 if (this_eol == EOL_SEEN_LF) | |
6205 XSETCAR (tail, AREF (eol_type, 0)); | |
6206 else if (this_eol == EOL_SEEN_CRLF) | |
6207 XSETCAR (tail, AREF (eol_type, 1)); | |
6208 else if (this_eol == EOL_SEEN_CR) | |
6209 XSETCAR (tail, AREF (eol_type, 2)); | |
6210 } | |
6211 } | |
6212 } | |
6213 | |
25662
0a7261c1d487
Use XCAR, XCDR, and XFLOAT_DATA instead of explicit member access.
Ken Raeburn <raeburn@raeburn.org>
parents:
25370
diff
changeset
|
6214 return (highest ? XCAR (val) : val); |
42104
d69c2368e549
(DECODE_COMPOSITION_END): Fixed a typo in the last
Sam Steingold <sds@gnu.org>
parents:
42103
diff
changeset
|
6215 } |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
6216 |
88365 | 6217 |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
6218 DEFUN ("detect-coding-region", Fdetect_coding_region, Sdetect_coding_region, |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
6219 2, 3, 0, |
40713
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6220 doc: /* Detect coding system of the text in the region between START and END. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6221 Return a list of possible coding systems ordered by priority. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6222 |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6223 If only ASCII characters are found, it returns a list of single element |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6224 `undecided' or its subsidiary coding system according to a detected |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6225 end-of-line format. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6226 |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6227 If optional argument HIGHEST is non-nil, return the coding system of |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6228 highest priority. */) |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6229 (start, end, highest) |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
6230 Lisp_Object start, end, highest; |
17052 | 6231 { |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
6232 int from, to; |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
6233 int from_byte, to_byte; |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
6234 |
40656
cdfd4d09b79a
Update usage of CHECK_ macros (remove unused second argument).
Pavel Janík <Pavel@Janik.cz>
parents:
40461
diff
changeset
|
6235 CHECK_NUMBER_COERCE_MARKER (start); |
cdfd4d09b79a
Update usage of CHECK_ macros (remove unused second argument).
Pavel Janík <Pavel@Janik.cz>
parents:
40461
diff
changeset
|
6236 CHECK_NUMBER_COERCE_MARKER (end); |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
6237 |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
6238 validate_region (&start, &end); |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
6239 from = XINT (start), to = XINT (end); |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
6240 from_byte = CHAR_TO_BYTE (from); |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
6241 to_byte = CHAR_TO_BYTE (to); |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
6242 |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
6243 if (from < GPT && to >= GPT) |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
6244 move_gap_both (to, to_byte); |
88365 | 6245 |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
6246 return detect_coding_system (BYTE_POS_ADDR (from_byte), |
88365 | 6247 to_byte - from_byte, |
34531
37f85e931855
(ONE_MORE_BYTE_CHECK_MULTIBYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34197
diff
changeset
|
6248 !NILP (highest), |
37f85e931855
(ONE_MORE_BYTE_CHECK_MULTIBYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34197
diff
changeset
|
6249 !NILP (current_buffer |
88365 | 6250 ->enable_multibyte_characters), |
6251 Qnil); | |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
6252 } |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
6253 |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
6254 DEFUN ("detect-coding-string", Fdetect_coding_string, Sdetect_coding_string, |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
6255 1, 2, 0, |
40713
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6256 doc: /* Detect coding system of the text in STRING. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6257 Return a list of possible coding systems ordered by priority. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6258 |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6259 If only ASCII characters are found, it returns a list of single element |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6260 `undecided' or its subsidiary coding system according to a detected |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6261 end-of-line format. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6262 |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6263 If optional argument HIGHEST is non-nil, return the coding system of |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6264 highest priority. */) |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6265 (string, highest) |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
6266 Lisp_Object string, highest; |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
6267 { |
40656
cdfd4d09b79a
Update usage of CHECK_ macros (remove unused second argument).
Pavel Janík <Pavel@Janik.cz>
parents:
40461
diff
changeset
|
6268 CHECK_STRING (string); |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
6269 |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
6270 return detect_coding_system (XSTRING (string)->data, |
88365 | 6271 STRING_BYTES (XSTRING (string)), |
34531
37f85e931855
(ONE_MORE_BYTE_CHECK_MULTIBYTE): New macro.
Kenichi Handa <handa@m17n.org>
parents:
34197
diff
changeset
|
6272 !NILP (highest), |
88365 | 6273 STRING_MULTIBYTE (string), |
6274 Qnil); | |
30487
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
6275 } |
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
6276 |
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
6277 |
88365 | 6278 static INLINE int |
6279 char_encodable_p (c, attrs) | |
6280 int c; | |
6281 Lisp_Object attrs; | |
30487
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
6282 { |
88365 | 6283 Lisp_Object tail; |
6284 struct charset *charset; | |
6285 | |
6286 for (tail = CODING_ATTR_CHARSET_LIST (attrs); | |
6287 CONSP (tail); tail = XCDR (tail)) | |
6288 { | |
6289 charset = CHARSET_FROM_ID (XINT (XCAR (tail))); | |
6290 if (CHAR_CHARSET_P (c, charset)) | |
6291 break; | |
6292 } | |
6293 return (! NILP (tail)); | |
30487
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
6294 } |
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
6295 |
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
6296 |
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
6297 /* Return a list of coding systems that safely encode the text between |
88365 | 6298 START and END. If EXCLUDE is non-nil, it is a list of coding |
6299 systems not to check. The returned list doesn't contain any such | |
6300 coding systems. In any case, If the text contains only ASCII or is | |
6301 unibyte, return t. */ | |
30487
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
6302 |
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
6303 DEFUN ("find-coding-systems-region-internal", |
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
6304 Ffind_coding_systems_region_internal, |
88365 | 6305 Sfind_coding_systems_region_internal, 2, 3, 0, |
40713
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6306 doc: /* Internal use only. */) |
88365 | 6307 (start, end, exclude) |
6308 Lisp_Object start, end, exclude; | |
30487
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
6309 { |
88365 | 6310 Lisp_Object coding_attrs_list, safe_codings; |
6311 EMACS_INT start_byte, end_byte; | |
6312 unsigned char *p, *pbeg, *pend; | |
6313 int c; | |
6314 Lisp_Object tail, elt; | |
30487
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
6315 |
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
6316 if (STRINGP (start)) |
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
6317 { |
88365 | 6318 if (!STRING_MULTIBYTE (start) |
6319 && XSTRING (start)->size != STRING_BYTES (XSTRING (start))) | |
30487
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
6320 return Qt; |
88365 | 6321 start_byte = 0; |
6322 end_byte = STRING_BYTES (XSTRING (start)); | |
30487
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
6323 } |
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
6324 else |
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
6325 { |
40656
cdfd4d09b79a
Update usage of CHECK_ macros (remove unused second argument).
Pavel Janík <Pavel@Janik.cz>
parents:
40461
diff
changeset
|
6326 CHECK_NUMBER_COERCE_MARKER (start); |
cdfd4d09b79a
Update usage of CHECK_ macros (remove unused second argument).
Pavel Janík <Pavel@Janik.cz>
parents:
40461
diff
changeset
|
6327 CHECK_NUMBER_COERCE_MARKER (end); |
30487
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
6328 if (XINT (start) < BEG || XINT (end) > Z || XINT (start) > XINT (end)) |
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
6329 args_out_of_range (start, end); |
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
6330 if (NILP (current_buffer->enable_multibyte_characters)) |
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
6331 return Qt; |
88365 | 6332 start_byte = CHAR_TO_BYTE (XINT (start)); |
6333 end_byte = CHAR_TO_BYTE (XINT (end)); | |
6334 if (XINT (end) - XINT (start) == end_byte - start_byte) | |
6335 return Qt; | |
6336 | |
6337 if (start < GPT && end > GPT) | |
6338 { | |
6339 if ((GPT - start) < (end - GPT)) | |
6340 move_gap_both (start, start_byte); | |
6341 else | |
6342 move_gap_both (end, end_byte); | |
6343 } | |
6344 } | |
6345 | |
6346 coding_attrs_list = Qnil; | |
6347 for (tail = Vcoding_system_list; CONSP (tail); tail = XCDR (tail)) | |
6348 if (NILP (exclude) | |
6349 || NILP (Fmemq (XCAR (tail), exclude))) | |
6350 { | |
6351 Lisp_Object attrs; | |
6352 | |
6353 attrs = AREF (CODING_SYSTEM_SPEC (XCAR (tail)), 0); | |
6354 if (EQ (XCAR (tail), CODING_ATTR_BASE_NAME (attrs)) | |
6355 && ! EQ (CODING_ATTR_TYPE (attrs), Qundecided)) | |
6356 coding_attrs_list = Fcons (attrs, coding_attrs_list); | |
6357 } | |
6358 | |
6359 if (STRINGP (start)) | |
6360 p = pbeg = XSTRING (start)->data; | |
6361 else | |
6362 p = pbeg = BYTE_POS_ADDR (start_byte); | |
6363 pend = p + (end_byte - start_byte); | |
6364 | |
6365 while (p < pend && ASCII_BYTE_P (*p)) p++; | |
6366 while (p < pend && ASCII_BYTE_P (*(pend - 1))) pend--; | |
6367 | |
6368 while (p < pend) | |
6369 { | |
6370 if (ASCII_BYTE_P (*p)) | |
6371 p++; | |
30487
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
6372 else |
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
6373 { |
88365 | 6374 c = STRING_CHAR_ADVANCE (p); |
6375 | |
6376 charset_map_loaded = 0; | |
6377 for (tail = coding_attrs_list; CONSP (tail);) | |
6378 { | |
6379 elt = XCAR (tail); | |
6380 if (NILP (elt)) | |
6381 tail = XCDR (tail); | |
6382 else if (char_encodable_p (c, elt)) | |
6383 tail = XCDR (tail); | |
6384 else if (CONSP (XCDR (tail))) | |
6385 { | |
6386 XSETCAR (tail, XCAR (XCDR (tail))); | |
6387 XSETCDR (tail, XCDR (XCDR (tail))); | |
6388 } | |
6389 else | |
6390 { | |
6391 XSETCAR (tail, Qnil); | |
6392 tail = XCDR (tail); | |
6393 } | |
6394 } | |
6395 if (charset_map_loaded) | |
6396 { | |
6397 EMACS_INT p_offset = p - pbeg, pend_offset = pend - pbeg; | |
6398 | |
6399 if (STRINGP (start)) | |
6400 pbeg = XSTRING (start)->data; | |
6401 else | |
6402 pbeg = BYTE_POS_ADDR (start_byte); | |
6403 p = pbeg + p_offset; | |
6404 pend = pbeg + pend_offset; | |
6405 } | |
30487
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
6406 } |
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
6407 } |
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
6408 |
88365 | 6409 safe_codings = Qnil; |
6410 for (tail = coding_attrs_list; CONSP (tail); tail = XCDR (tail)) | |
6411 if (! NILP (XCAR (tail))) | |
6412 safe_codings = Fcons (CODING_ATTR_BASE_NAME (XCAR (tail)), safe_codings); | |
6413 | |
6414 return safe_codings; | |
6415 } | |
6416 | |
6417 | |
6418 DEFUN ("check-coding-systems-region", Fcheck_coding_systems_region, | |
6419 Scheck_coding_systems_region, 3, 3, 0, | |
6420 doc: /* Check if the region is encodable by coding systems. | |
6421 | |
6422 START and END are buffer positions specifying the region. | |
6423 CODING-SYSTEM-LIST is a list of coding systems to check. | |
6424 | |
6425 The value is an alist ((CODING-SYSTEM POS0 POS1 ...) ...), where | |
6426 CODING-SYSTEM is a member of CODING-SYSTEM-LIst and can't encode the | |
6427 whole region, POS0, POS1, ... are buffer positions where non-encodable | |
6428 characters are found. | |
6429 | |
6430 If all coding systems in CODING-SYSTEM-LIST can encode the region, the | |
6431 value is nil. | |
6432 | |
6433 START may be a string. In that case, check if the string is | |
6434 encodable, and the value contains indices to the string instead of | |
6435 buffer positions. END is ignored. */) | |
6436 (start, end, coding_system_list) | |
6437 Lisp_Object start, end, coding_system_list; | |
6438 { | |
6439 Lisp_Object list; | |
6440 EMACS_INT start_byte, end_byte; | |
6441 int pos; | |
6442 unsigned char *p, *pbeg, *pend; | |
6443 int c; | |
6444 Lisp_Object tail, elt; | |
6445 | |
6446 if (STRINGP (start)) | |
6447 { | |
6448 if (!STRING_MULTIBYTE (start) | |
6449 && XSTRING (start)->size != STRING_BYTES (XSTRING (start))) | |
6450 return Qnil; | |
6451 start_byte = 0; | |
6452 end_byte = STRING_BYTES (XSTRING (start)); | |
6453 pos = 0; | |
30487
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
6454 } |
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
6455 else |
88365 | 6456 { |
6457 CHECK_NUMBER_COERCE_MARKER (start); | |
6458 CHECK_NUMBER_COERCE_MARKER (end); | |
6459 if (XINT (start) < BEG || XINT (end) > Z || XINT (start) > XINT (end)) | |
6460 args_out_of_range (start, end); | |
6461 if (NILP (current_buffer->enable_multibyte_characters)) | |
6462 return Qnil; | |
6463 start_byte = CHAR_TO_BYTE (XINT (start)); | |
6464 end_byte = CHAR_TO_BYTE (XINT (end)); | |
6465 if (XINT (end) - XINT (start) == end_byte - start_byte) | |
6466 return Qt; | |
6467 | |
6468 if (start < GPT && end > GPT) | |
6469 { | |
6470 if ((GPT - start) < (end - GPT)) | |
6471 move_gap_both (start, start_byte); | |
6472 else | |
6473 move_gap_both (end, end_byte); | |
6474 } | |
6475 pos = start; | |
6476 } | |
6477 | |
6478 list = Qnil; | |
6479 for (tail = coding_system_list; CONSP (tail); tail = XCDR (tail)) | |
6480 { | |
6481 elt = XCAR (tail); | |
6482 list = Fcons (Fcons (elt, Fcons (AREF (CODING_SYSTEM_SPEC (elt), 0), | |
6483 Qnil)), | |
6484 list); | |
6485 } | |
6486 | |
6487 if (STRINGP (start)) | |
6488 p = pbeg = XSTRING (start)->data; | |
6489 else | |
6490 p = pbeg = BYTE_POS_ADDR (start_byte); | |
6491 pend = p + (end_byte - start_byte); | |
6492 | |
6493 while (p < pend && ASCII_BYTE_P (*p)) p++, pos++; | |
6494 while (p < pend && ASCII_BYTE_P (*(pend - 1))) pend--; | |
6495 | |
6496 while (p < pend) | |
6497 { | |
6498 if (ASCII_BYTE_P (*p)) | |
6499 p++; | |
6500 else | |
6501 { | |
6502 c = STRING_CHAR_ADVANCE (p); | |
6503 | |
6504 charset_map_loaded = 0; | |
6505 for (tail = list; CONSP (tail); tail = XCDR (tail)) | |
6506 { | |
6507 elt = XCDR (XCAR (tail)); | |
6508 if (! char_encodable_p (c, XCAR (elt))) | |
6509 XSETCDR (elt, Fcons (make_number (pos), XCDR (elt))); | |
6510 } | |
6511 if (charset_map_loaded) | |
6512 { | |
6513 EMACS_INT p_offset = p - pbeg, pend_offset = pend - pbeg; | |
6514 | |
6515 if (STRINGP (start)) | |
6516 pbeg = XSTRING (start)->data; | |
6517 else | |
6518 pbeg = BYTE_POS_ADDR (start_byte); | |
6519 p = pbeg + p_offset; | |
6520 pend = pbeg + pend_offset; | |
6521 } | |
6522 } | |
6523 pos++; | |
6524 } | |
6525 | |
6526 tail = list; | |
6527 list = Qnil; | |
6528 for (; CONSP (tail); tail = XCDR (tail)) | |
6529 { | |
6530 elt = XCAR (tail); | |
6531 if (CONSP (XCDR (XCDR (elt)))) | |
6532 list = Fcons (Fcons (XCAR (elt), Fnreverse (XCDR (XCDR (elt)))), | |
6533 list); | |
6534 } | |
6535 | |
6536 return list; | |
30487
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
6537 } |
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
6538 |
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
6539 |
88365 | 6540 |
20803
0fa2183c587d
(ENCODE_ISO_CHARACTER): Pay attention to
Kenichi Handa <handa@m17n.org>
parents:
20794
diff
changeset
|
6541 Lisp_Object |
88365 | 6542 code_convert_region (start, end, coding_system, dst_object, encodep, norecord) |
6543 Lisp_Object start, end, coding_system, dst_object; | |
6544 int encodep, norecord; | |
20680
dd46027e8412
(code_convert_region): Always count chars inserted
Richard M. Stallman <rms@gnu.org>
parents:
20668
diff
changeset
|
6545 { |
dd46027e8412
(code_convert_region): Always count chars inserted
Richard M. Stallman <rms@gnu.org>
parents:
20668
diff
changeset
|
6546 struct coding_system coding; |
88365 | 6547 EMACS_INT from, from_byte, to, to_byte; |
6548 Lisp_Object src_object; | |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
6549 |
40656
cdfd4d09b79a
Update usage of CHECK_ macros (remove unused second argument).
Pavel Janík <Pavel@Janik.cz>
parents:
40461
diff
changeset
|
6550 CHECK_NUMBER_COERCE_MARKER (start); |
cdfd4d09b79a
Update usage of CHECK_ macros (remove unused second argument).
Pavel Janík <Pavel@Janik.cz>
parents:
40461
diff
changeset
|
6551 CHECK_NUMBER_COERCE_MARKER (end); |
88365 | 6552 if (NILP (coding_system)) |
6553 coding_system = Qno_conversion; | |
6554 else | |
6555 CHECK_CODING_SYSTEM (coding_system); | |
6556 src_object = Fcurrent_buffer (); | |
6557 if (NILP (dst_object)) | |
6558 dst_object = src_object; | |
6559 else if (! EQ (dst_object, Qt)) | |
6560 CHECK_BUFFER (dst_object); | |
20680
dd46027e8412
(code_convert_region): Always count chars inserted
Richard M. Stallman <rms@gnu.org>
parents:
20668
diff
changeset
|
6561 |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
6562 validate_region (&start, &end); |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
6563 from = XFASTINT (start); |
88365 | 6564 from_byte = CHAR_TO_BYTE (from); |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
6565 to = XFASTINT (end); |
88365 | 6566 to_byte = CHAR_TO_BYTE (to); |
6567 | |
6568 setup_coding_system (coding_system, &coding); | |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
6569 coding.mode |= CODING_MODE_LAST_BLOCK; |
88365 | 6570 |
6571 if (encodep) | |
6572 encode_coding_object (&coding, src_object, from, from_byte, to, to_byte, | |
6573 dst_object); | |
6574 else | |
6575 decode_coding_object (&coding, src_object, from, from_byte, to, to_byte, | |
6576 dst_object); | |
6577 if (! norecord) | |
6578 Vlast_coding_system_used = CODING_ID_NAME (coding.id); | |
6579 | |
6580 if (coding.result != CODING_RESULT_SUCCESS) | |
6581 error ("Code conversion error: %d", coding.result); | |
6582 | |
6583 return (BUFFERP (dst_object) | |
6584 ? make_number (coding.produced_char) | |
6585 : coding.dst_object); | |
20803
0fa2183c587d
(ENCODE_ISO_CHARACTER): Pay attention to
Kenichi Handa <handa@m17n.org>
parents:
20794
diff
changeset
|
6586 } |
0fa2183c587d
(ENCODE_ISO_CHARACTER): Pay attention to
Kenichi Handa <handa@m17n.org>
parents:
20794
diff
changeset
|
6587 |
88365 | 6588 |
20803
0fa2183c587d
(ENCODE_ISO_CHARACTER): Pay attention to
Kenichi Handa <handa@m17n.org>
parents:
20794
diff
changeset
|
6589 DEFUN ("decode-coding-region", Fdecode_coding_region, Sdecode_coding_region, |
88365 | 6590 3, 4, "r\nzCoding system: ", |
40713
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6591 doc: /* Decode the current region from the specified coding system. |
88365 | 6592 When called from a program, takes four arguments: |
6593 START, END, CODING-SYSTEM, and DESTINATION. | |
6594 START and END are buffer positions. | |
6595 | |
6596 Optional 4th arguments DESTINATION specifies where the decoded text goes. | |
6597 If nil, the region between START and END is replace by the decoded text. | |
6598 If buffer, the decoded text is inserted in the buffer. | |
6599 If t, the decoded text is returned. | |
6600 | |
40713
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6601 This function sets `last-coding-system-used' to the precise coding system |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6602 used (which may be different from CODING-SYSTEM if CODING-SYSTEM is |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6603 not fully specified.) |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6604 It returns the length of the decoded text. */) |
88365 | 6605 (start, end, coding_system, destination) |
6606 Lisp_Object start, end, coding_system, destination; | |
20803
0fa2183c587d
(ENCODE_ISO_CHARACTER): Pay attention to
Kenichi Handa <handa@m17n.org>
parents:
20794
diff
changeset
|
6607 { |
88365 | 6608 return code_convert_region (start, end, coding_system, destination, 0, 0); |
20680
dd46027e8412
(code_convert_region): Always count chars inserted
Richard M. Stallman <rms@gnu.org>
parents:
20668
diff
changeset
|
6609 } |
dd46027e8412
(code_convert_region): Always count chars inserted
Richard M. Stallman <rms@gnu.org>
parents:
20668
diff
changeset
|
6610 |
dd46027e8412
(code_convert_region): Always count chars inserted
Richard M. Stallman <rms@gnu.org>
parents:
20668
diff
changeset
|
6611 DEFUN ("encode-coding-region", Fencode_coding_region, Sencode_coding_region, |
88365 | 6612 3, 4, "r\nzCoding system: ", |
6613 doc: /* Encode the current region by specified coding system. | |
40713
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6614 When called from a program, takes three arguments: |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6615 START, END, and CODING-SYSTEM. START and END are buffer positions. |
88365 | 6616 |
6617 Optional 4th arguments DESTINATION specifies where the encoded text goes. | |
6618 If nil, the region between START and END is replace by the encoded text. | |
6619 If buffer, the encoded text is inserted in the buffer. | |
6620 If t, the encoded text is returned. | |
6621 | |
40713
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6622 This function sets `last-coding-system-used' to the precise coding system |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6623 used (which may be different from CODING-SYSTEM if CODING-SYSTEM is |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6624 not fully specified.) |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6625 It returns the length of the encoded text. */) |
88365 | 6626 (start, end, coding_system, destination) |
6627 Lisp_Object start, end, coding_system, destination; | |
20680
dd46027e8412
(code_convert_region): Always count chars inserted
Richard M. Stallman <rms@gnu.org>
parents:
20668
diff
changeset
|
6628 { |
88365 | 6629 return code_convert_region (start, end, coding_system, destination, 1, 0); |
17052 | 6630 } |
6631 | |
20803
0fa2183c587d
(ENCODE_ISO_CHARACTER): Pay attention to
Kenichi Handa <handa@m17n.org>
parents:
20794
diff
changeset
|
6632 Lisp_Object |
88365 | 6633 code_convert_string (string, coding_system, dst_object, |
6634 encodep, nocopy, norecord) | |
6635 Lisp_Object string, coding_system, dst_object; | |
6636 int encodep, nocopy, norecord; | |
17052 | 6637 { |
6638 struct coding_system coding; | |
88365 | 6639 EMACS_INT chars, bytes; |
17052 | 6640 |
40656
cdfd4d09b79a
Update usage of CHECK_ macros (remove unused second argument).
Pavel Janík <Pavel@Janik.cz>
parents:
40461
diff
changeset
|
6641 CHECK_STRING (string); |
88365 | 6642 if (NILP (coding_system)) |
6643 { | |
6644 if (! norecord) | |
6645 Vlast_coding_system_used = Qno_conversion; | |
6646 if (NILP (dst_object)) | |
6647 return (nocopy ? Fcopy_sequence (string) : string); | |
6648 } | |
17052 | 6649 |
17119
2cfb31c15ced
(create_process, Fopen_network_stream): Typo in indexes
Kenichi Handa <handa@m17n.org>
parents:
17071
diff
changeset
|
6650 if (NILP (coding_system)) |
88365 | 6651 coding_system = Qno_conversion; |
6652 else | |
6653 CHECK_CODING_SYSTEM (coding_system); | |
6654 if (NILP (dst_object)) | |
6655 dst_object = Qt; | |
6656 else if (! EQ (dst_object, Qt)) | |
6657 CHECK_BUFFER (dst_object); | |
6658 | |
6659 setup_coding_system (coding_system, &coding); | |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
6660 coding.mode |= CODING_MODE_LAST_BLOCK; |
88365 | 6661 chars = XSTRING (string)->size; |
6662 bytes = STRING_BYTES (XSTRING (string)); | |
6663 if (encodep) | |
6664 encode_coding_object (&coding, string, 0, 0, chars, bytes, dst_object); | |
6665 else | |
6666 decode_coding_object (&coding, string, 0, 0, chars, bytes, dst_object); | |
6667 if (! norecord) | |
6668 Vlast_coding_system_used = CODING_ID_NAME (coding.id); | |
6669 | |
6670 if (coding.result != CODING_RESULT_SUCCESS) | |
6671 error ("Code conversion error: %d", coding.result); | |
6672 | |
6673 return (BUFFERP (dst_object) | |
6674 ? make_number (coding.produced_char) | |
6675 : coding.dst_object); | |
20803
0fa2183c587d
(ENCODE_ISO_CHARACTER): Pay attention to
Kenichi Handa <handa@m17n.org>
parents:
20794
diff
changeset
|
6676 } |
0fa2183c587d
(ENCODE_ISO_CHARACTER): Pay attention to
Kenichi Handa <handa@m17n.org>
parents:
20794
diff
changeset
|
6677 |
0fa2183c587d
(ENCODE_ISO_CHARACTER): Pay attention to
Kenichi Handa <handa@m17n.org>
parents:
20794
diff
changeset
|
6678 |
22341
572ba933a4bf
(code_convert_string_norecord): New function.
Karl Heuer <kwzh@gnu.org>
parents:
22329
diff
changeset
|
6679 /* Encode or decode STRING according to CODING_SYSTEM. |
26847 | 6680 Do not set Vlast_coding_system_used. |
6681 | |
6682 This function is called only from macros DECODE_FILE and | |
6683 ENCODE_FILE, thus we ignore character composition. */ | |
22341
572ba933a4bf
(code_convert_string_norecord): New function.
Karl Heuer <kwzh@gnu.org>
parents:
22329
diff
changeset
|
6684 |
572ba933a4bf
(code_convert_string_norecord): New function.
Karl Heuer <kwzh@gnu.org>
parents:
22329
diff
changeset
|
6685 Lisp_Object |
572ba933a4bf
(code_convert_string_norecord): New function.
Karl Heuer <kwzh@gnu.org>
parents:
22329
diff
changeset
|
6686 code_convert_string_norecord (string, coding_system, encodep) |
572ba933a4bf
(code_convert_string_norecord): New function.
Karl Heuer <kwzh@gnu.org>
parents:
22329
diff
changeset
|
6687 Lisp_Object string, coding_system; |
572ba933a4bf
(code_convert_string_norecord): New function.
Karl Heuer <kwzh@gnu.org>
parents:
22329
diff
changeset
|
6688 int encodep; |
572ba933a4bf
(code_convert_string_norecord): New function.
Karl Heuer <kwzh@gnu.org>
parents:
22329
diff
changeset
|
6689 { |
88430
6418a272b97e
* coding.c: Delete unused variables.
Kenichi Handa <handa@m17n.org>
parents:
88365
diff
changeset
|
6690 return code_convert_string (string, coding_system, Qt, encodep, 0, 1); |
22341
572ba933a4bf
(code_convert_string_norecord): New function.
Karl Heuer <kwzh@gnu.org>
parents:
22329
diff
changeset
|
6691 } |
88365 | 6692 |
6693 | |
6694 DEFUN ("decode-coding-string", Fdecode_coding_string, Sdecode_coding_string, | |
6695 2, 4, 0, | |
6696 doc: /* Decode STRING which is encoded in CODING-SYSTEM, and return the result. | |
6697 | |
6698 Optional third arg NOCOPY non-nil means it is OK to return STRING itself | |
6699 if the decoding operation is trivial. | |
6700 | |
6701 Optional fourth arg BUFFER non-nil meant that the decoded text is | |
6702 inserted in BUFFER instead of returned as a astring. In this case, | |
6703 the return value is BUFFER. | |
6704 | |
6705 This function sets `last-coding-system-used' to the precise coding system | |
6706 used (which may be different from CODING-SYSTEM if CODING-SYSTEM is | |
6707 not fully specified. */) | |
6708 (string, coding_system, nocopy, buffer) | |
6709 Lisp_Object string, coding_system, nocopy, buffer; | |
6710 { | |
6711 return code_convert_string (string, coding_system, buffer, | |
6712 0, ! NILP (nocopy), 0); | |
6713 } | |
6714 | |
6715 DEFUN ("encode-coding-string", Fencode_coding_string, Sencode_coding_string, | |
6716 2, 4, 0, | |
6717 doc: /* Encode STRING to CODING-SYSTEM, and return the result. | |
6718 | |
6719 Optional third arg NOCOPY non-nil means it is OK to return STRING | |
6720 itself if the encoding operation is trivial. | |
6721 | |
6722 Optional fourth arg BUFFER non-nil meant that the encoded text is | |
6723 inserted in BUFFER instead of returned as a astring. In this case, | |
6724 the return value is BUFFER. | |
6725 | |
6726 This function sets `last-coding-system-used' to the precise coding system | |
6727 used (which may be different from CODING-SYSTEM if CODING-SYSTEM is | |
6728 not fully specified.) */) | |
6729 (string, coding_system, nocopy, buffer) | |
6730 Lisp_Object string, coding_system, nocopy, buffer; | |
6731 { | |
6732 return code_convert_string (string, coding_system, buffer, | |
6733 nocopy, ! NILP (nocopy), 1); | |
6734 } | |
6735 | |
20680
dd46027e8412
(code_convert_region): Always count chars inserted
Richard M. Stallman <rms@gnu.org>
parents:
20668
diff
changeset
|
6736 |
17052 | 6737 DEFUN ("decode-sjis-char", Fdecode_sjis_char, Sdecode_sjis_char, 1, 1, 0, |
40713
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6738 doc: /* Decode a Japanese character which has CODE in shift_jis encoding. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6739 Return the corresponding character. */) |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6740 (code) |
17052 | 6741 Lisp_Object code; |
6742 { | |
88365 | 6743 Lisp_Object spec, attrs, val; |
6744 struct charset *charset_roman, *charset_kanji, *charset_kana, *charset; | |
6745 int c; | |
6746 | |
6747 CHECK_NATNUM (code); | |
6748 c = XFASTINT (code); | |
6749 CHECK_CODING_SYSTEM_GET_SPEC (Vsjis_coding_system, spec); | |
6750 attrs = AREF (spec, 0); | |
6751 | |
6752 if (ASCII_BYTE_P (c) | |
6753 && ! NILP (CODING_ATTR_ASCII_COMPAT (attrs))) | |
6754 return code; | |
6755 | |
6756 val = CODING_ATTR_CHARSET_LIST (attrs); | |
6757 charset_roman = CHARSET_FROM_ID (XINT (XCAR (val))), val = XCDR (val); | |
6758 charset_kanji = CHARSET_FROM_ID (XINT (XCAR (val))), val = XCDR (val); | |
6759 charset_kana = CHARSET_FROM_ID (XINT (XCAR (val))); | |
6760 | |
6761 if (c <= 0x7F) | |
6762 charset = charset_roman; | |
6763 else if (c >= 0xA0 && c < 0xDF) | |
6764 { | |
6765 charset = charset_kana; | |
6766 c -= 0x80; | |
24065
7e291dea6141
(Fdecode_sjis_char): Decode Japanese Katakana character
Kenichi Handa <handa@m17n.org>
parents:
24056
diff
changeset
|
6767 } |
7e291dea6141
(Fdecode_sjis_char): Decode Japanese Katakana character
Kenichi Handa <handa@m17n.org>
parents:
24056
diff
changeset
|
6768 else |
7e291dea6141
(Fdecode_sjis_char): Decode Japanese Katakana character
Kenichi Handa <handa@m17n.org>
parents:
24056
diff
changeset
|
6769 { |
88365 | 6770 int s1 = c >> 8, s2 = c & 0x7F; |
6771 | |
6772 if (s1 < 0x81 || (s1 > 0x9F && s1 < 0xE0) || s1 > 0xEF | |
6773 || s2 < 0x40 || s2 == 0x7F || s2 > 0xFC) | |
6774 error ("Invalid code: %d", code); | |
6775 SJIS_TO_JIS (c); | |
6776 charset = charset_kanji; | |
6777 } | |
6778 c = DECODE_CHAR (charset, c); | |
6779 if (c < 0) | |
6780 error ("Invalid code: %d", code); | |
6781 return make_number (c); | |
17052 | 6782 } |
6783 | |
88365 | 6784 |
17052 | 6785 DEFUN ("encode-sjis-char", Fencode_sjis_char, Sencode_sjis_char, 1, 1, 0, |
40713
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6786 doc: /* Encode a Japanese character CHAR to shift_jis encoding. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6787 Return the corresponding code in SJIS. */) |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6788 (ch) |
88365 | 6789 Lisp_Object ch; |
17052 | 6790 { |
88365 | 6791 Lisp_Object spec, attrs, charset_list; |
6792 int c; | |
6793 struct charset *charset; | |
6794 unsigned code; | |
6795 | |
6796 CHECK_CHARACTER (ch); | |
6797 c = XFASTINT (ch); | |
6798 CHECK_CODING_SYSTEM_GET_SPEC (Vsjis_coding_system, spec); | |
6799 attrs = AREF (spec, 0); | |
6800 | |
6801 if (ASCII_CHAR_P (c) | |
6802 && ! NILP (CODING_ATTR_ASCII_COMPAT (attrs))) | |
6803 return ch; | |
6804 | |
6805 charset_list = CODING_ATTR_CHARSET_LIST (attrs); | |
6806 charset = char_charset (c, charset_list, &code); | |
6807 if (code == CHARSET_INVALID_CODE (charset)) | |
6808 error ("Can't encode by shift_jis encoding: %d", c); | |
6809 JIS_TO_SJIS (code); | |
6810 | |
6811 return make_number (code); | |
17052 | 6812 } |
6813 | |
6814 DEFUN ("decode-big5-char", Fdecode_big5_char, Sdecode_big5_char, 1, 1, 0, | |
40713
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6815 doc: /* Decode a Big5 character which has CODE in BIG5 coding system. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6816 Return the corresponding character. */) |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6817 (code) |
17052 | 6818 Lisp_Object code; |
6819 { | |
88365 | 6820 Lisp_Object spec, attrs, val; |
6821 struct charset *charset_roman, *charset_big5, *charset; | |
6822 int c; | |
6823 | |
6824 CHECK_NATNUM (code); | |
6825 c = XFASTINT (code); | |
6826 CHECK_CODING_SYSTEM_GET_SPEC (Vbig5_coding_system, spec); | |
6827 attrs = AREF (spec, 0); | |
6828 | |
6829 if (ASCII_BYTE_P (c) | |
6830 && ! NILP (CODING_ATTR_ASCII_COMPAT (attrs))) | |
6831 return code; | |
6832 | |
6833 val = CODING_ATTR_CHARSET_LIST (attrs); | |
6834 charset_roman = CHARSET_FROM_ID (XINT (XCAR (val))), val = XCDR (val); | |
6835 charset_big5 = CHARSET_FROM_ID (XINT (XCAR (val))); | |
6836 | |
6837 if (c <= 0x7F) | |
6838 charset = charset_roman; | |
24324
2eec590faf26
(Fdecode_sjis_char, Fencode_sjis_char): Hanlde
Kenichi Handa <handa@m17n.org>
parents:
24316
diff
changeset
|
6839 else |
2eec590faf26
(Fdecode_sjis_char, Fencode_sjis_char): Hanlde
Kenichi Handa <handa@m17n.org>
parents:
24316
diff
changeset
|
6840 { |
88365 | 6841 int b1 = c >> 8, b2 = c & 0x7F; |
6842 if (b1 < 0xA1 || b1 > 0xFE | |
6843 || b2 < 0x40 || (b2 > 0x7E && b2 < 0xA1) || b2 > 0xFE) | |
6844 error ("Invalid code: %d", code); | |
6845 charset = charset_big5; | |
6846 } | |
6847 c = DECODE_CHAR (charset, (unsigned )c); | |
6848 if (c < 0) | |
6849 error ("Invalid code: %d", code); | |
6850 return make_number (c); | |
17052 | 6851 } |
6852 | |
6853 DEFUN ("encode-big5-char", Fencode_big5_char, Sencode_big5_char, 1, 1, 0, | |
40713
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6854 doc: /* Encode the Big5 character CHAR to BIG5 coding system. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6855 Return the corresponding character code in Big5. */) |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6856 (ch) |
17052 | 6857 Lisp_Object ch; |
6858 { | |
88365 | 6859 Lisp_Object spec, attrs, charset_list; |
6860 struct charset *charset; | |
6861 int c; | |
6862 unsigned code; | |
6863 | |
6864 CHECK_CHARACTER (ch); | |
6865 c = XFASTINT (ch); | |
6866 CHECK_CODING_SYSTEM_GET_SPEC (Vbig5_coding_system, spec); | |
6867 attrs = AREF (spec, 0); | |
6868 if (ASCII_CHAR_P (c) | |
6869 && ! NILP (CODING_ATTR_ASCII_COMPAT (attrs))) | |
6870 return ch; | |
6871 | |
6872 charset_list = CODING_ATTR_CHARSET_LIST (attrs); | |
6873 charset = char_charset (c, charset_list, &code); | |
6874 if (code == CHARSET_INVALID_CODE (charset)) | |
6875 error ("Can't encode by Big5 encoding: %d", c); | |
6876 | |
6877 return make_number (code); | |
17052 | 6878 } |
88365 | 6879 |
20680
dd46027e8412
(code_convert_region): Always count chars inserted
Richard M. Stallman <rms@gnu.org>
parents:
20668
diff
changeset
|
6880 |
18002
a14261786239
(encode_invocation_designation): Use macro
Kenichi Handa <handa@m17n.org>
parents:
17835
diff
changeset
|
6881 DEFUN ("set-terminal-coding-system-internal", |
a14261786239
(encode_invocation_designation): Use macro
Kenichi Handa <handa@m17n.org>
parents:
17835
diff
changeset
|
6882 Fset_terminal_coding_system_internal, |
40713
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6883 Sset_terminal_coding_system_internal, 1, 1, 0, |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6884 doc: /* Internal use only. */) |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6885 (coding_system) |
17052 | 6886 { |
40656
cdfd4d09b79a
Update usage of CHECK_ macros (remove unused second argument).
Pavel Janík <Pavel@Janik.cz>
parents:
40461
diff
changeset
|
6887 CHECK_SYMBOL (coding_system); |
88365 | 6888 setup_coding_system (Fcheck_coding_system (coding_system), |
6889 &terminal_coding); | |
6890 | |
20150
402b6e5f4b58
(encode_designation_at_bol): Fix bug of finding graphic
Kenichi Handa <handa@m17n.org>
parents:
20105
diff
changeset
|
6891 /* We had better not send unsafe characters to terminal. */ |
88365 | 6892 terminal_coding.mode |= CODING_MODE_SAFE_ENCODING; |
6893 /* Characer composition should be disabled. */ | |
6894 terminal_coding.common_flags &= ~CODING_ANNOTATE_COMPOSITION_MASK; | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
6895 terminal_coding.src_multibyte = 1; |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
6896 terminal_coding.dst_multibyte = 0; |
17052 | 6897 return Qnil; |
6898 } | |
6899 | |
19280
e755044718ee
(ENCODE_ISO_CHARACTER_DIMENSION1): Pay attention to
Kenichi Handa <handa@m17n.org>
parents:
19193
diff
changeset
|
6900 DEFUN ("set-safe-terminal-coding-system-internal", |
e755044718ee
(ENCODE_ISO_CHARACTER_DIMENSION1): Pay attention to
Kenichi Handa <handa@m17n.org>
parents:
19193
diff
changeset
|
6901 Fset_safe_terminal_coding_system_internal, |
40713
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6902 Sset_safe_terminal_coding_system_internal, 1, 1, 0, |
41006 | 6903 doc: /* Internal use only. */) |
40713
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6904 (coding_system) |
19280
e755044718ee
(ENCODE_ISO_CHARACTER_DIMENSION1): Pay attention to
Kenichi Handa <handa@m17n.org>
parents:
19193
diff
changeset
|
6905 { |
40656
cdfd4d09b79a
Update usage of CHECK_ macros (remove unused second argument).
Pavel Janík <Pavel@Janik.cz>
parents:
40461
diff
changeset
|
6906 CHECK_SYMBOL (coding_system); |
19280
e755044718ee
(ENCODE_ISO_CHARACTER_DIMENSION1): Pay attention to
Kenichi Handa <handa@m17n.org>
parents:
19193
diff
changeset
|
6907 setup_coding_system (Fcheck_coding_system (coding_system), |
e755044718ee
(ENCODE_ISO_CHARACTER_DIMENSION1): Pay attention to
Kenichi Handa <handa@m17n.org>
parents:
19193
diff
changeset
|
6908 &safe_terminal_coding); |
88365 | 6909 /* Characer composition should be disabled. */ |
6910 safe_terminal_coding.common_flags &= ~CODING_ANNOTATE_COMPOSITION_MASK; | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
6911 safe_terminal_coding.src_multibyte = 1; |
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
6912 safe_terminal_coding.dst_multibyte = 0; |
19280
e755044718ee
(ENCODE_ISO_CHARACTER_DIMENSION1): Pay attention to
Kenichi Handa <handa@m17n.org>
parents:
19193
diff
changeset
|
6913 return Qnil; |
e755044718ee
(ENCODE_ISO_CHARACTER_DIMENSION1): Pay attention to
Kenichi Handa <handa@m17n.org>
parents:
19193
diff
changeset
|
6914 } |
e755044718ee
(ENCODE_ISO_CHARACTER_DIMENSION1): Pay attention to
Kenichi Handa <handa@m17n.org>
parents:
19193
diff
changeset
|
6915 |
17052 | 6916 DEFUN ("terminal-coding-system", |
6917 Fterminal_coding_system, Sterminal_coding_system, 0, 0, 0, | |
40713
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6918 doc: /* Return coding system specified for terminal output. */) |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6919 () |
17052 | 6920 { |
88365 | 6921 return CODING_ID_NAME (terminal_coding.id); |
17052 | 6922 } |
6923 | |
18002
a14261786239
(encode_invocation_designation): Use macro
Kenichi Handa <handa@m17n.org>
parents:
17835
diff
changeset
|
6924 DEFUN ("set-keyboard-coding-system-internal", |
a14261786239
(encode_invocation_designation): Use macro
Kenichi Handa <handa@m17n.org>
parents:
17835
diff
changeset
|
6925 Fset_keyboard_coding_system_internal, |
40713
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6926 Sset_keyboard_coding_system_internal, 1, 1, 0, |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6927 doc: /* Internal use only. */) |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6928 (coding_system) |
17052 | 6929 Lisp_Object coding_system; |
6930 { | |
40656
cdfd4d09b79a
Update usage of CHECK_ macros (remove unused second argument).
Pavel Janík <Pavel@Janik.cz>
parents:
40461
diff
changeset
|
6931 CHECK_SYMBOL (coding_system); |
88365 | 6932 setup_coding_system (Fcheck_coding_system (coding_system), |
6933 &keyboard_coding); | |
6934 /* Characer composition should be disabled. */ | |
6935 keyboard_coding.common_flags &= ~CODING_ANNOTATE_COMPOSITION_MASK; | |
17052 | 6936 return Qnil; |
6937 } | |
6938 | |
6939 DEFUN ("keyboard-coding-system", | |
6940 Fkeyboard_coding_system, Skeyboard_coding_system, 0, 0, 0, | |
40713
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6941 doc: /* Return coding system specified for decoding keyboard input. */) |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6942 () |
17052 | 6943 { |
88365 | 6944 return CODING_ID_NAME (keyboard_coding.id); |
17052 | 6945 } |
6946 | |
6947 | |
18536
69c0e220b626
(Vstandard_character_unification_table_for_decode):
Kenichi Handa <handa@m17n.org>
parents:
18523
diff
changeset
|
6948 DEFUN ("find-operation-coding-system", Ffind_operation_coding_system, |
69c0e220b626
(Vstandard_character_unification_table_for_decode):
Kenichi Handa <handa@m17n.org>
parents:
18523
diff
changeset
|
6949 Sfind_operation_coding_system, 1, MANY, 0, |
40713
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6950 doc: /* Choose a coding system for an operation based on the target name. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6951 The value names a pair of coding systems: (DECODING-SYSTEM . ENCODING-SYSTEM). |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6952 DECODING-SYSTEM is the coding system to use for decoding |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6953 \(in case OPERATION does decoding), and ENCODING-SYSTEM is the coding system |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6954 for encoding (in case OPERATION does encoding). |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6955 |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6956 The first argument OPERATION specifies an I/O primitive: |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6957 For file I/O, `insert-file-contents' or `write-region'. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6958 For process I/O, `call-process', `call-process-region', or `start-process'. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6959 For network I/O, `open-network-stream'. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6960 |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6961 The remaining arguments should be the same arguments that were passed |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6962 to the primitive. Depending on which primitive, one of those arguments |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6963 is selected as the TARGET. For example, if OPERATION does file I/O, |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6964 whichever argument specifies the file name is TARGET. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6965 |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6966 TARGET has a meaning which depends on OPERATION: |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6967 For file I/O, TARGET is a file name. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6968 For process I/O, TARGET is a process name. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6969 For network I/O, TARGET is a service name or a port number |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6970 |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6971 This function looks up what specified for TARGET in, |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6972 `file-coding-system-alist', `process-coding-system-alist', |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6973 or `network-coding-system-alist' depending on OPERATION. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6974 They may specify a coding system, a cons of coding systems, |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6975 or a function symbol to call. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6976 In the last case, we call the function with one argument, |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6977 which is a list of all the arguments given to this function. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6978 |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6979 usage: (find-operation-coding-system OPERATION ARGUMENTS ...) */) |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
6980 (nargs, args) |
17052 | 6981 int nargs; |
6982 Lisp_Object *args; | |
6983 { | |
6984 Lisp_Object operation, target_idx, target, val; | |
6985 register Lisp_Object chain; | |
6986 | |
6987 if (nargs < 2) | |
6988 error ("Too few arguments"); | |
6989 operation = args[0]; | |
6990 if (!SYMBOLP (operation) | |
6991 || !INTEGERP (target_idx = Fget (operation, Qtarget_idx))) | |
88365 | 6992 error ("Invalid first arguement"); |
17052 | 6993 if (nargs < 1 + XINT (target_idx)) |
6994 error ("Too few arguments for operation: %s", | |
6995 XSYMBOL (operation)->name->data); | |
6996 target = args[XINT (target_idx) + 1]; | |
6997 if (!(STRINGP (target) | |
6998 || (EQ (operation, Qopen_network_stream) && INTEGERP (target)))) | |
88365 | 6999 error ("Invalid %dth argument", XINT (target_idx) + 1); |
17052 | 7000 |
18613
614b916ff5bf
Fix bugs with inappropriate mixing of Lisp_Object with int.
Richard M. Stallman <rms@gnu.org>
parents:
18536
diff
changeset
|
7001 chain = ((EQ (operation, Qinsert_file_contents) |
614b916ff5bf
Fix bugs with inappropriate mixing of Lisp_Object with int.
Richard M. Stallman <rms@gnu.org>
parents:
18536
diff
changeset
|
7002 || EQ (operation, Qwrite_region)) |
18180
5f4c4da24e75
(Vcoding_system_alist): Deleted.
Kenichi Handa <handa@m17n.org>
parents:
18002
diff
changeset
|
7003 ? Vfile_coding_system_alist |
18613
614b916ff5bf
Fix bugs with inappropriate mixing of Lisp_Object with int.
Richard M. Stallman <rms@gnu.org>
parents:
18536
diff
changeset
|
7004 : (EQ (operation, Qopen_network_stream) |
18180
5f4c4da24e75
(Vcoding_system_alist): Deleted.
Kenichi Handa <handa@m17n.org>
parents:
18002
diff
changeset
|
7005 ? Vnetwork_coding_system_alist |
5f4c4da24e75
(Vcoding_system_alist): Deleted.
Kenichi Handa <handa@m17n.org>
parents:
18002
diff
changeset
|
7006 : Vprocess_coding_system_alist)); |
17052 | 7007 if (NILP (chain)) |
7008 return Qnil; | |
7009 | |
25662
0a7261c1d487
Use XCAR, XCDR, and XFLOAT_DATA instead of explicit member access.
Ken Raeburn <raeburn@raeburn.org>
parents:
25370
diff
changeset
|
7010 for (; CONSP (chain); chain = XCDR (chain)) |
17052 | 7011 { |
19747
bed06df9cbc5
(setup_coding_system, Ffind_operation_coding_system)
Richard M. Stallman <rms@gnu.org>
parents:
19743
diff
changeset
|
7012 Lisp_Object elt; |
88365 | 7013 |
25662
0a7261c1d487
Use XCAR, XCDR, and XFLOAT_DATA instead of explicit member access.
Ken Raeburn <raeburn@raeburn.org>
parents:
25370
diff
changeset
|
7014 elt = XCAR (chain); |
17052 | 7015 if (CONSP (elt) |
7016 && ((STRINGP (target) | |
25662
0a7261c1d487
Use XCAR, XCDR, and XFLOAT_DATA instead of explicit member access.
Ken Raeburn <raeburn@raeburn.org>
parents:
25370
diff
changeset
|
7017 && STRINGP (XCAR (elt)) |
0a7261c1d487
Use XCAR, XCDR, and XFLOAT_DATA instead of explicit member access.
Ken Raeburn <raeburn@raeburn.org>
parents:
25370
diff
changeset
|
7018 && fast_string_match (XCAR (elt), target) >= 0) |
0a7261c1d487
Use XCAR, XCDR, and XFLOAT_DATA instead of explicit member access.
Ken Raeburn <raeburn@raeburn.org>
parents:
25370
diff
changeset
|
7019 || (INTEGERP (target) && EQ (target, XCAR (elt))))) |
18180
5f4c4da24e75
(Vcoding_system_alist): Deleted.
Kenichi Handa <handa@m17n.org>
parents:
18002
diff
changeset
|
7020 { |
25662
0a7261c1d487
Use XCAR, XCDR, and XFLOAT_DATA instead of explicit member access.
Ken Raeburn <raeburn@raeburn.org>
parents:
25370
diff
changeset
|
7021 val = XCDR (elt); |
19763
ab2fd2c85986
(Ffind_operation_coding_system): If a function in
Kenichi Handa <handa@m17n.org>
parents:
19758
diff
changeset
|
7022 /* Here, if VAL is both a valid coding system and a valid |
ab2fd2c85986
(Ffind_operation_coding_system): If a function in
Kenichi Handa <handa@m17n.org>
parents:
19758
diff
changeset
|
7023 function symbol, we return VAL as a coding system. */ |
18180
5f4c4da24e75
(Vcoding_system_alist): Deleted.
Kenichi Handa <handa@m17n.org>
parents:
18002
diff
changeset
|
7024 if (CONSP (val)) |
5f4c4da24e75
(Vcoding_system_alist): Deleted.
Kenichi Handa <handa@m17n.org>
parents:
18002
diff
changeset
|
7025 return val; |
5f4c4da24e75
(Vcoding_system_alist): Deleted.
Kenichi Handa <handa@m17n.org>
parents:
18002
diff
changeset
|
7026 if (! SYMBOLP (val)) |
5f4c4da24e75
(Vcoding_system_alist): Deleted.
Kenichi Handa <handa@m17n.org>
parents:
18002
diff
changeset
|
7027 return Qnil; |
5f4c4da24e75
(Vcoding_system_alist): Deleted.
Kenichi Handa <handa@m17n.org>
parents:
18002
diff
changeset
|
7028 if (! NILP (Fcoding_system_p (val))) |
5f4c4da24e75
(Vcoding_system_alist): Deleted.
Kenichi Handa <handa@m17n.org>
parents:
18002
diff
changeset
|
7029 return Fcons (val, val); |
19763
ab2fd2c85986
(Ffind_operation_coding_system): If a function in
Kenichi Handa <handa@m17n.org>
parents:
19758
diff
changeset
|
7030 if (! NILP (Ffboundp (val))) |
ab2fd2c85986
(Ffind_operation_coding_system): If a function in
Kenichi Handa <handa@m17n.org>
parents:
19758
diff
changeset
|
7031 { |
ab2fd2c85986
(Ffind_operation_coding_system): If a function in
Kenichi Handa <handa@m17n.org>
parents:
19758
diff
changeset
|
7032 val = call1 (val, Flist (nargs, args)); |
ab2fd2c85986
(Ffind_operation_coding_system): If a function in
Kenichi Handa <handa@m17n.org>
parents:
19758
diff
changeset
|
7033 if (CONSP (val)) |
ab2fd2c85986
(Ffind_operation_coding_system): If a function in
Kenichi Handa <handa@m17n.org>
parents:
19758
diff
changeset
|
7034 return val; |
ab2fd2c85986
(Ffind_operation_coding_system): If a function in
Kenichi Handa <handa@m17n.org>
parents:
19758
diff
changeset
|
7035 if (SYMBOLP (val) && ! NILP (Fcoding_system_p (val))) |
ab2fd2c85986
(Ffind_operation_coding_system): If a function in
Kenichi Handa <handa@m17n.org>
parents:
19758
diff
changeset
|
7036 return Fcons (val, val); |
ab2fd2c85986
(Ffind_operation_coding_system): If a function in
Kenichi Handa <handa@m17n.org>
parents:
19758
diff
changeset
|
7037 } |
18180
5f4c4da24e75
(Vcoding_system_alist): Deleted.
Kenichi Handa <handa@m17n.org>
parents:
18002
diff
changeset
|
7038 return Qnil; |
5f4c4da24e75
(Vcoding_system_alist): Deleted.
Kenichi Handa <handa@m17n.org>
parents:
18002
diff
changeset
|
7039 } |
17052 | 7040 } |
7041 return Qnil; | |
7042 } | |
7043 | |
88365 | 7044 DEFUN ("set-coding-system-priority", Fset_coding_system_priority, |
7045 Sset_coding_system_priority, 1, MANY, 0, | |
7046 doc: /* Put higher priority to coding systems of the arguments. */) | |
7047 (nargs, args) | |
7048 int nargs; | |
7049 Lisp_Object *args; | |
7050 { | |
7051 int i, j; | |
7052 int changed[coding_category_max]; | |
7053 enum coding_category priorities[coding_category_max]; | |
7054 | |
7055 bzero (changed, sizeof changed); | |
7056 | |
7057 for (i = j = 0; i < nargs; i++) | |
7058 { | |
7059 enum coding_category category; | |
7060 Lisp_Object spec, attrs; | |
7061 | |
7062 CHECK_CODING_SYSTEM_GET_SPEC (args[i], spec); | |
7063 attrs = AREF (spec, 0); | |
7064 category = XINT (CODING_ATTR_CATEGORY (attrs)); | |
7065 if (changed[category]) | |
7066 /* Ignore this coding system because a coding system of the | |
7067 same category already had a higher priority. */ | |
7068 continue; | |
7069 changed[category] = 1; | |
7070 priorities[j++] = category; | |
7071 if (coding_categories[category].id >= 0 | |
7072 && ! EQ (args[i], CODING_ID_NAME (coding_categories[category].id))) | |
7073 setup_coding_system (args[i], &coding_categories[category]); | |
7074 } | |
7075 | |
7076 /* Now we have decided top J priorities. Reflect the order of the | |
7077 original priorities to the remaining priorities. */ | |
7078 | |
7079 for (i = j, j = 0; i < coding_category_max; i++, j++) | |
7080 { | |
7081 while (j < coding_category_max | |
7082 && changed[coding_priorities[j]]) | |
7083 j++; | |
7084 if (j == coding_category_max) | |
7085 abort (); | |
7086 priorities[i] = coding_priorities[j]; | |
7087 } | |
7088 | |
7089 bcopy (priorities, coding_priorities, sizeof priorities); | |
7090 return Qnil; | |
7091 } | |
7092 | |
7093 DEFUN ("coding-system-priority-list", Fcoding_system_priority_list, | |
7094 Scoding_system_priority_list, 0, 1, 0, | |
7095 doc: /* Return a list of coding systems ordered by their priorities. */) | |
7096 (highestp) | |
7097 Lisp_Object highestp; | |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
7098 { |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
7099 int i; |
88365 | 7100 Lisp_Object val; |
7101 | |
7102 for (i = 0, val = Qnil; i < coding_category_max; i++) | |
7103 { | |
7104 enum coding_category category = coding_priorities[i]; | |
7105 int id = coding_categories[category].id; | |
7106 Lisp_Object attrs; | |
7107 | |
7108 if (id < 0) | |
7109 continue; | |
7110 attrs = CODING_ID_ATTRS (id); | |
7111 if (! NILP (highestp)) | |
7112 return CODING_ATTR_BASE_NAME (attrs); | |
7113 val = Fcons (CODING_ATTR_BASE_NAME (attrs), val); | |
7114 } | |
7115 return Fnreverse (val); | |
7116 } | |
7117 | |
7118 static Lisp_Object | |
7119 make_subsidiaries (base) | |
7120 Lisp_Object base; | |
7121 { | |
7122 Lisp_Object subsidiaries; | |
7123 char *suffixes[] = { "-unix", "-dos", "-mac" }; | |
7124 int base_name_len = STRING_BYTES (XSYMBOL (base)->name); | |
7125 char *buf = (char *) alloca (base_name_len + 6); | |
7126 int i; | |
7127 | |
7128 bcopy (XSYMBOL (base)->name->data, buf, base_name_len); | |
7129 subsidiaries = Fmake_vector (make_number (3), Qnil); | |
7130 for (i = 0; i < 3; i++) | |
7131 { | |
7132 bcopy (suffixes[i], buf + base_name_len, strlen (suffixes[i]) + 1); | |
7133 ASET (subsidiaries, i, intern (buf)); | |
7134 } | |
7135 return subsidiaries; | |
7136 } | |
7137 | |
7138 | |
7139 DEFUN ("define-coding-system-internal", Fdefine_coding_system_internal, | |
7140 Sdefine_coding_system_internal, coding_arg_max, MANY, 0, | |
7141 doc: /* For internal use only. */) | |
7142 (nargs, args) | |
7143 int nargs; | |
7144 Lisp_Object *args; | |
7145 { | |
7146 Lisp_Object name; | |
7147 Lisp_Object spec_vec; /* [ ATTRS ALIASE EOL_TYPE ] */ | |
7148 Lisp_Object attrs; /* Vector of attributes. */ | |
7149 Lisp_Object eol_type; | |
7150 Lisp_Object aliases; | |
7151 Lisp_Object coding_type, charset_list, safe_charsets; | |
7152 enum coding_category category; | |
7153 Lisp_Object tail, val; | |
7154 int max_charset_id = 0; | |
7155 int i; | |
7156 | |
7157 if (nargs < coding_arg_max) | |
7158 goto short_args; | |
7159 | |
7160 attrs = Fmake_vector (make_number (coding_attr_last_index), Qnil); | |
7161 | |
7162 name = args[coding_arg_name]; | |
7163 CHECK_SYMBOL (name); | |
7164 CODING_ATTR_BASE_NAME (attrs) = name; | |
7165 | |
7166 val = args[coding_arg_mnemonic]; | |
7167 if (! STRINGP (val)) | |
7168 CHECK_CHARACTER (val); | |
7169 CODING_ATTR_MNEMONIC (attrs) = val; | |
7170 | |
7171 coding_type = args[coding_arg_coding_type]; | |
7172 CHECK_SYMBOL (coding_type); | |
7173 CODING_ATTR_TYPE (attrs) = coding_type; | |
7174 | |
7175 charset_list = args[coding_arg_charset_list]; | |
7176 if (SYMBOLP (charset_list)) | |
7177 { | |
7178 if (EQ (charset_list, Qiso_2022)) | |
7179 { | |
7180 if (! EQ (coding_type, Qiso_2022)) | |
7181 error ("Invalid charset-list"); | |
7182 charset_list = Viso_2022_charset_list; | |
7183 } | |
7184 else if (EQ (charset_list, Qemacs_mule)) | |
7185 { | |
7186 if (! EQ (coding_type, Qemacs_mule)) | |
7187 error ("Invalid charset-list"); | |
7188 charset_list = Vemacs_mule_charset_list; | |
7189 } | |
7190 for (tail = charset_list; CONSP (tail); tail = XCDR (tail)) | |
7191 if (max_charset_id < XFASTINT (XCAR (tail))) | |
7192 max_charset_id = XFASTINT (XCAR (tail)); | |
7193 } | |
7194 else | |
7195 { | |
7196 charset_list = Fcopy_sequence (charset_list); | |
7197 for (tail = charset_list; !NILP (tail); tail = Fcdr (tail)) | |
7198 { | |
7199 struct charset *charset; | |
7200 | |
7201 val = Fcar (tail); | |
7202 CHECK_CHARSET_GET_CHARSET (val, charset); | |
7203 if (EQ (coding_type, Qiso_2022) | |
7204 ? CHARSET_ISO_FINAL (charset) < 0 | |
7205 : EQ (coding_type, Qemacs_mule) | |
7206 ? CHARSET_EMACS_MULE_ID (charset) < 0 | |
7207 : 0) | |
7208 error ("Can't handle charset `%s'", | |
7209 XSYMBOL (CHARSET_NAME (charset))->name->data); | |
7210 | |
7211 XCAR (tail) = make_number (charset->id); | |
7212 if (max_charset_id < charset->id) | |
7213 max_charset_id = charset->id; | |
7214 } | |
7215 } | |
7216 CODING_ATTR_CHARSET_LIST (attrs) = charset_list; | |
7217 | |
7218 safe_charsets = Fmake_string (make_number (max_charset_id + 1), | |
7219 make_number (255)); | |
7220 for (tail = charset_list; CONSP (tail); tail = XCDR (tail)) | |
7221 XSTRING (safe_charsets)->data[XFASTINT (XCAR (tail))] = 0; | |
7222 CODING_ATTR_SAFE_CHARSETS (attrs) = safe_charsets; | |
7223 | |
7224 val = args[coding_arg_decode_translation_table]; | |
7225 if (! NILP (val)) | |
7226 CHECK_CHAR_TABLE (val); | |
7227 CODING_ATTR_DECODE_TBL (attrs) = val; | |
7228 | |
7229 val = args[coding_arg_encode_translation_table]; | |
7230 if (! NILP (val)) | |
7231 CHECK_CHAR_TABLE (val); | |
7232 CODING_ATTR_ENCODE_TBL (attrs) = val; | |
7233 | |
7234 val = args[coding_arg_post_read_conversion]; | |
7235 CHECK_SYMBOL (val); | |
7236 CODING_ATTR_POST_READ (attrs) = val; | |
7237 | |
7238 val = args[coding_arg_pre_write_conversion]; | |
7239 CHECK_SYMBOL (val); | |
7240 CODING_ATTR_PRE_WRITE (attrs) = val; | |
7241 | |
7242 val = args[coding_arg_default_char]; | |
7243 if (NILP (val)) | |
7244 CODING_ATTR_DEFAULT_CHAR (attrs) = make_number (' '); | |
7245 else | |
7246 { | |
7247 CHECK_CHARACTER (val); | |
7248 CODING_ATTR_DEFAULT_CHAR (attrs) = val; | |
7249 } | |
7250 | |
7251 val = args[coding_arg_plist]; | |
7252 CHECK_LIST (val); | |
7253 CODING_ATTR_PLIST (attrs) = val; | |
7254 | |
7255 if (EQ (coding_type, Qcharset)) | |
7256 { | |
7257 val = Fmake_vector (make_number (256), Qnil); | |
7258 | |
7259 for (tail = charset_list; CONSP (tail); tail = XCDR (tail)) | |
7260 { | |
7261 struct charset *charset = CHARSET_FROM_ID (XINT (XCAR (tail))); | |
7262 | |
7263 for (i = charset->code_space[0]; i <= charset->code_space[1]; i++) | |
7264 if (NILP (AREF (val, i))) | |
7265 ASET (val, i, XCAR (tail)); | |
7266 } | |
7267 ASET (attrs, coding_attr_charset_valids, val); | |
7268 category = coding_category_charset; | |
7269 } | |
7270 else if (EQ (coding_type, Qccl)) | |
7271 { | |
7272 Lisp_Object valids; | |
7273 | |
7274 if (nargs < coding_arg_ccl_max) | |
7275 goto short_args; | |
7276 | |
7277 val = args[coding_arg_ccl_decoder]; | |
7278 CHECK_CCL_PROGRAM (val); | |
7279 if (VECTORP (val)) | |
7280 val = Fcopy_sequence (val); | |
7281 ASET (attrs, coding_attr_ccl_decoder, val); | |
7282 | |
7283 val = args[coding_arg_ccl_encoder]; | |
7284 CHECK_CCL_PROGRAM (val); | |
7285 if (VECTORP (val)) | |
7286 val = Fcopy_sequence (val); | |
7287 ASET (attrs, coding_attr_ccl_encoder, val); | |
7288 | |
7289 val = args[coding_arg_ccl_valids]; | |
7290 valids = Fmake_string (make_number (256), make_number (0)); | |
7291 for (tail = val; !NILP (tail); tail = Fcdr (tail)) | |
7292 { | |
7293 val = Fcar (tail); | |
7294 if (INTEGERP (val)) | |
7295 ASET (valids, XINT (val), 1); | |
7296 else | |
7297 { | |
7298 int from, to; | |
7299 | |
7300 CHECK_CONS (val); | |
7301 CHECK_NUMBER (XCAR (val)); | |
7302 CHECK_NUMBER (XCDR (val)); | |
7303 from = XINT (XCAR (val)); | |
7304 to = XINT (XCDR (val)); | |
7305 for (i = from; i <= to; i++) | |
7306 ASET (valids, i, 1); | |
7307 } | |
7308 } | |
7309 ASET (attrs, coding_attr_ccl_valids, valids); | |
7310 | |
7311 category = coding_category_ccl; | |
7312 } | |
7313 else if (EQ (coding_type, Qutf_16)) | |
7314 { | |
7315 Lisp_Object bom, endian; | |
7316 | |
7317 if (nargs < coding_arg_utf16_max) | |
7318 goto short_args; | |
7319 | |
7320 bom = args[coding_arg_utf16_bom]; | |
7321 if (! NILP (bom) && ! EQ (bom, Qt)) | |
22874
b133f07a76db
(Qvalid_codes): New variable.
Kenichi Handa <handa@m17n.org>
parents:
22812
diff
changeset
|
7322 { |
88365 | 7323 CHECK_CONS (bom); |
7324 CHECK_CODING_SYSTEM (XCAR (bom)); | |
7325 CHECK_CODING_SYSTEM (XCDR (bom)); | |
7326 } | |
7327 ASET (attrs, coding_attr_utf_16_bom, bom); | |
7328 | |
7329 endian = args[coding_arg_utf16_endian]; | |
7330 ASET (attrs, coding_attr_utf_16_endian, endian); | |
7331 | |
7332 category = (CONSP (bom) | |
7333 ? coding_category_utf_16_auto | |
7334 : NILP (bom) | |
7335 ? (NILP (endian) | |
7336 ? coding_category_utf_16_be_nosig | |
7337 : coding_category_utf_16_le_nosig) | |
7338 : (NILP (endian) | |
7339 ? coding_category_utf_16_be | |
7340 : coding_category_utf_16_le)); | |
7341 } | |
7342 else if (EQ (coding_type, Qiso_2022)) | |
7343 { | |
7344 Lisp_Object initial, reg_usage, request, flags; | |
7345 struct charset *charset; | |
88430
6418a272b97e
* coding.c: Delete unused variables.
Kenichi Handa <handa@m17n.org>
parents:
88365
diff
changeset
|
7346 int i, id; |
88365 | 7347 |
7348 if (nargs < coding_arg_iso2022_max) | |
7349 goto short_args; | |
7350 | |
7351 initial = Fcopy_sequence (args[coding_arg_iso2022_initial]); | |
7352 CHECK_VECTOR (initial); | |
7353 for (i = 0; i < 4; i++) | |
7354 { | |
7355 val = Faref (initial, make_number (i)); | |
7356 if (! NILP (val)) | |
7357 { | |
7358 CHECK_CHARSET_GET_ID (val, id); | |
7359 ASET (initial, i, make_number (id)); | |
7360 } | |
7361 else | |
7362 ASET (initial, i, make_number (-1)); | |
7363 } | |
7364 | |
7365 reg_usage = args[coding_arg_iso2022_reg_usage]; | |
7366 CHECK_CONS (reg_usage); | |
7367 CHECK_NATNUM (XCAR (reg_usage)); | |
7368 CHECK_NATNUM (XCDR (reg_usage)); | |
7369 | |
7370 request = Fcopy_sequence (args[coding_arg_iso2022_request]); | |
7371 for (tail = request; ! NILP (tail); tail = Fcdr (tail)) | |
7372 { | |
7373 int id; | |
7374 | |
7375 val = Fcar (tail); | |
7376 CHECK_CONS (val); | |
7377 CHECK_CHARSET_GET_ID (XCAR (val), id); | |
7378 CHECK_NATNUM (XCDR (val)); | |
7379 if (XINT (XCDR (val)) >= 4) | |
7380 error ("Invalid graphic register number: %d", XINT (XCDR (val))); | |
7381 XCAR (val) = make_number (id); | |
7382 } | |
7383 | |
7384 flags = args[coding_arg_iso2022_flags]; | |
7385 CHECK_NATNUM (flags); | |
7386 i = XINT (flags); | |
7387 if (EQ (args[coding_arg_charset_list], Qiso_2022)) | |
7388 flags = make_number (i | CODING_ISO_FLAG_FULL_SUPPORT); | |
7389 | |
7390 ASET (attrs, coding_attr_iso_initial, initial); | |
7391 ASET (attrs, coding_attr_iso_usage, reg_usage); | |
7392 ASET (attrs, coding_attr_iso_request, request); | |
7393 ASET (attrs, coding_attr_iso_flags, flags); | |
7394 setup_iso_safe_charsets (attrs); | |
7395 | |
7396 if (i & CODING_ISO_FLAG_SEVEN_BITS) | |
7397 category = ((i & (CODING_ISO_FLAG_LOCKING_SHIFT | |
7398 | CODING_ISO_FLAG_SINGLE_SHIFT)) | |
7399 ? coding_category_iso_7_else | |
7400 : EQ (args[coding_arg_charset_list], Qiso_2022) | |
7401 ? coding_category_iso_7 | |
7402 : coding_category_iso_7_tight); | |
7403 else | |
7404 { | |
7405 int id = XINT (AREF (initial, 1)); | |
7406 | |
7407 category = (((i & (CODING_ISO_FLAG_LOCKING_SHIFT | |
7408 | CODING_ISO_FLAG_SINGLE_SHIFT)) | |
7409 || EQ (args[coding_arg_charset_list], Qiso_2022) | |
7410 || id < 0) | |
7411 ? coding_category_iso_8_else | |
7412 : (CHARSET_DIMENSION (CHARSET_FROM_ID (id)) == 1) | |
7413 ? coding_category_iso_8_1 | |
7414 : coding_category_iso_8_2); | |
22874
b133f07a76db
(Qvalid_codes): New variable.
Kenichi Handa <handa@m17n.org>
parents:
22812
diff
changeset
|
7415 } |
88365 | 7416 } |
7417 else if (EQ (coding_type, Qemacs_mule)) | |
7418 { | |
7419 if (EQ (args[coding_arg_charset_list], Qemacs_mule)) | |
7420 ASET (attrs, coding_attr_emacs_mule_full, Qt); | |
7421 | |
7422 category = coding_category_emacs_mule; | |
7423 } | |
7424 else if (EQ (coding_type, Qshift_jis)) | |
7425 { | |
7426 | |
7427 struct charset *charset; | |
7428 | |
7429 if (XINT (Flength (charset_list)) != 3) | |
7430 error ("There should be just three charsets"); | |
7431 | |
7432 charset = CHARSET_FROM_ID (XINT (XCAR (charset_list))); | |
7433 if (CHARSET_DIMENSION (charset) != 1) | |
7434 error ("Dimension of charset %s is not one", | |
7435 XSYMBOL (CHARSET_NAME (charset))->name->data); | |
7436 | |
7437 charset_list = XCDR (charset_list); | |
7438 charset = CHARSET_FROM_ID (XINT (XCAR (charset_list))); | |
7439 if (CHARSET_DIMENSION (charset) != 1) | |
7440 error ("Dimension of charset %s is not one", | |
7441 XSYMBOL (CHARSET_NAME (charset))->name->data); | |
7442 | |
7443 charset_list = XCDR (charset_list); | |
7444 charset = CHARSET_FROM_ID (XINT (XCAR (charset_list))); | |
7445 if (CHARSET_DIMENSION (charset) != 2) | |
7446 error ("Dimension of charset %s is not two", | |
7447 XSYMBOL (CHARSET_NAME (charset))->name->data); | |
7448 | |
7449 category = coding_category_sjis; | |
7450 Vsjis_coding_system = name; | |
7451 } | |
7452 else if (EQ (coding_type, Qbig5)) | |
7453 { | |
7454 struct charset *charset; | |
7455 | |
7456 if (XINT (Flength (charset_list)) != 2) | |
7457 error ("There should be just two charsets"); | |
7458 | |
7459 charset = CHARSET_FROM_ID (XINT (XCAR (charset_list))); | |
7460 if (CHARSET_DIMENSION (charset) != 1) | |
7461 error ("Dimension of charset %s is not one", | |
7462 XSYMBOL (CHARSET_NAME (charset))->name->data); | |
7463 | |
7464 charset_list = XCDR (charset_list); | |
7465 charset = CHARSET_FROM_ID (XINT (XCAR (charset_list))); | |
7466 if (CHARSET_DIMENSION (charset) != 2) | |
7467 error ("Dimension of charset %s is not two", | |
7468 XSYMBOL (CHARSET_NAME (charset))->name->data); | |
7469 | |
7470 category = coding_category_big5; | |
7471 Vbig5_coding_system = name; | |
7472 } | |
7473 else if (EQ (coding_type, Qraw_text)) | |
7474 category = coding_category_raw_text; | |
7475 else if (EQ (coding_type, Qutf_8)) | |
7476 category = coding_category_utf_8; | |
7477 else if (EQ (coding_type, Qundecided)) | |
7478 category = coding_category_undecided; | |
7479 else | |
7480 error ("Invalid coding system type: %s", | |
7481 XSYMBOL (coding_type)->name->data); | |
7482 | |
7483 CODING_ATTR_CATEGORY (attrs) = make_number (category); | |
7484 | |
7485 eol_type = args[coding_arg_eol_type]; | |
7486 if (! NILP (eol_type) | |
7487 && ! EQ (eol_type, Qunix) | |
7488 && ! EQ (eol_type, Qdos) | |
7489 && ! EQ (eol_type, Qmac)) | |
7490 error ("Invalid eol-type"); | |
7491 | |
7492 aliases = Fcons (name, Qnil); | |
7493 | |
7494 if (NILP (eol_type)) | |
7495 { | |
7496 eol_type = make_subsidiaries (name); | |
7497 for (i = 0; i < 3; i++) | |
22874
b133f07a76db
(Qvalid_codes): New variable.
Kenichi Handa <handa@m17n.org>
parents:
22812
diff
changeset
|
7498 { |
88365 | 7499 Lisp_Object this_spec, this_name, this_aliases, this_eol_type; |
7500 | |
7501 this_name = AREF (eol_type, i); | |
7502 this_aliases = Fcons (this_name, Qnil); | |
7503 this_eol_type = (i == 0 ? Qunix : i == 1 ? Qdos : Qmac); | |
7504 this_spec = Fmake_vector (make_number (3), attrs); | |
7505 ASET (this_spec, 1, this_aliases); | |
7506 ASET (this_spec, 2, this_eol_type); | |
7507 Fputhash (this_name, this_spec, Vcoding_system_hash_table); | |
7508 Vcoding_system_list = Fcons (this_name, Vcoding_system_list); | |
7509 Vcoding_system_alist = Fcons (Fcons (Fsymbol_name (this_name), Qnil), | |
7510 Vcoding_system_alist); | |
22874
b133f07a76db
(Qvalid_codes): New variable.
Kenichi Handa <handa@m17n.org>
parents:
22812
diff
changeset
|
7511 } |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
7512 } |
22874
b133f07a76db
(Qvalid_codes): New variable.
Kenichi Handa <handa@m17n.org>
parents:
22812
diff
changeset
|
7513 |
88365 | 7514 spec_vec = Fmake_vector (make_number (3), attrs); |
7515 ASET (spec_vec, 1, aliases); | |
7516 ASET (spec_vec, 2, eol_type); | |
7517 | |
7518 Fputhash (name, spec_vec, Vcoding_system_hash_table); | |
7519 Vcoding_system_list = Fcons (name, Vcoding_system_list); | |
7520 Vcoding_system_alist = Fcons (Fcons (Fsymbol_name (name), Qnil), | |
7521 Vcoding_system_alist); | |
7522 | |
7523 { | |
7524 int id = coding_categories[category].id; | |
7525 | |
7526 if (id < 0 || EQ (name, CODING_ID_NAME (id))) | |
7527 setup_coding_system (name, &coding_categories[category]); | |
7528 } | |
7529 | |
7530 return Qnil; | |
7531 | |
7532 short_args: | |
7533 return Fsignal (Qwrong_number_of_arguments, | |
7534 Fcons (intern ("define-coding-system-internal"), | |
7535 make_number (nargs))); | |
7536 } | |
7537 | |
7538 DEFUN ("define-coding-system-alias", Fdefine_coding_system_alias, | |
7539 Sdefine_coding_system_alias, 2, 2, 0, | |
7540 doc: /* Define ALIAS as an alias for CODING-SYSTEM. */) | |
7541 (alias, coding_system) | |
7542 Lisp_Object alias, coding_system; | |
7543 { | |
7544 Lisp_Object spec, aliases, eol_type; | |
7545 | |
7546 CHECK_SYMBOL (alias); | |
7547 CHECK_CODING_SYSTEM_GET_SPEC (coding_system, spec); | |
7548 aliases = AREF (spec, 1); | |
7549 while (!NILP (XCDR (aliases))) | |
7550 aliases = XCDR (aliases); | |
7551 XCDR (aliases) = Fcons (alias, Qnil); | |
7552 | |
7553 eol_type = AREF (spec, 2); | |
7554 if (VECTORP (eol_type)) | |
7555 { | |
7556 Lisp_Object subsidiaries; | |
7557 int i; | |
7558 | |
7559 subsidiaries = make_subsidiaries (alias); | |
7560 for (i = 0; i < 3; i++) | |
7561 Fdefine_coding_system_alias (AREF (subsidiaries, i), | |
7562 AREF (eol_type, i)); | |
7563 | |
7564 ASET (spec, 2, subsidiaries); | |
7565 } | |
7566 | |
7567 Fputhash (alias, spec, Vcoding_system_hash_table); | |
7568 | |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
7569 return Qnil; |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
7570 } |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
7571 |
88365 | 7572 DEFUN ("coding-system-base", Fcoding_system_base, Scoding_system_base, |
7573 1, 1, 0, | |
7574 doc: /* Return the base of CODING-SYSTEM. | |
7575 Any alias or subsidiary coding systems are not base coding system. */) | |
7576 (coding_system) | |
7577 Lisp_Object coding_system; | |
7578 { | |
7579 Lisp_Object spec, attrs; | |
7580 | |
7581 if (NILP (coding_system)) | |
7582 return (Qno_conversion); | |
7583 CHECK_CODING_SYSTEM_GET_SPEC (coding_system, spec); | |
7584 attrs = AREF (spec, 0); | |
7585 return CODING_ATTR_BASE_NAME (attrs); | |
7586 } | |
7587 | |
7588 DEFUN ("coding-system-plist", Fcoding_system_plist, Scoding_system_plist, | |
7589 1, 1, 0, | |
7590 doc: "Return the property list of CODING-SYSTEM.") | |
7591 (coding_system) | |
7592 Lisp_Object coding_system; | |
22226
557fac086b1b
(ascii_skip_code): New variable.
Kenichi Handa <handa@m17n.org>
parents:
22186
diff
changeset
|
7593 { |
88365 | 7594 Lisp_Object spec, attrs; |
7595 | |
7596 if (NILP (coding_system)) | |
7597 coding_system = Qno_conversion; | |
7598 CHECK_CODING_SYSTEM_GET_SPEC (coding_system, spec); | |
7599 attrs = AREF (spec, 0); | |
7600 return CODING_ATTR_PLIST (attrs); | |
7601 } | |
7602 | |
7603 | |
7604 DEFUN ("coding-system-aliases", Fcoding_system_aliases, Scoding_system_aliases, | |
7605 1, 1, 0, | |
7606 doc: /* Return the list of aliases of CODING-SYSTEM. | |
7607 A base coding system is what made by `define-coding-system'. | |
7608 Any alias nor subsidiary coding systems are not base coding system. */) | |
7609 (coding_system) | |
7610 Lisp_Object coding_system; | |
7611 { | |
7612 Lisp_Object spec; | |
7613 | |
7614 if (NILP (coding_system)) | |
7615 coding_system = Qno_conversion; | |
7616 CHECK_CODING_SYSTEM_GET_SPEC (coding_system, spec); | |
7617 return AREF (spec, 2); | |
7618 } | |
7619 | |
7620 DEFUN ("coding-system-eol-type", Fcoding_system_eol_type, | |
7621 Scoding_system_eol_type, 1, 1, 0, | |
7622 doc: /* Return eol-type of CODING-SYSTEM. | |
7623 An eol-type is integer 0, 1, 2, or a vector of coding systems. | |
7624 | |
7625 Integer values 0, 1, and 2 indicate a format of end-of-line; LF, CRLF, | |
7626 and CR respectively. | |
7627 | |
7628 A vector value indicates that a format of end-of-line should be | |
7629 detected automatically. Nth element of the vector is the subsidiary | |
7630 coding system whose eol-type is N. */) | |
7631 (coding_system) | |
7632 Lisp_Object coding_system; | |
7633 { | |
7634 Lisp_Object spec, eol_type; | |
7635 int n; | |
7636 | |
7637 if (NILP (coding_system)) | |
7638 coding_system = Qno_conversion; | |
7639 if (! CODING_SYSTEM_P (coding_system)) | |
7640 return Qnil; | |
7641 spec = CODING_SYSTEM_SPEC (coding_system); | |
7642 eol_type = AREF (spec, 2); | |
7643 if (VECTORP (eol_type)) | |
7644 return Fcopy_sequence (eol_type); | |
7645 n = EQ (eol_type, Qunix) ? 0 : EQ (eol_type, Qdos) ? 1 : 2; | |
7646 return make_number (n); | |
22226
557fac086b1b
(ascii_skip_code): New variable.
Kenichi Handa <handa@m17n.org>
parents:
22186
diff
changeset
|
7647 } |
557fac086b1b
(ascii_skip_code): New variable.
Kenichi Handa <handa@m17n.org>
parents:
22186
diff
changeset
|
7648 |
17052 | 7649 #endif /* emacs */ |
7650 | |
7651 | |
22874
b133f07a76db
(Qvalid_codes): New variable.
Kenichi Handa <handa@m17n.org>
parents:
22812
diff
changeset
|
7652 /*** 9. Post-amble ***/ |
17052 | 7653 |
21514 | 7654 void |
17052 | 7655 init_coding_once () |
7656 { | |
7657 int i; | |
7658 | |
88365 | 7659 for (i = 0; i < coding_category_max; i++) |
7660 { | |
7661 coding_categories[i].id = -1; | |
7662 coding_priorities[i] = i; | |
7663 } | |
17052 | 7664 |
7665 /* ISO2022 specific initialize routine. */ | |
7666 for (i = 0; i < 0x20; i++) | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
7667 iso_code_class[i] = ISO_control_0; |
17052 | 7668 for (i = 0x21; i < 0x7F; i++) |
7669 iso_code_class[i] = ISO_graphic_plane_0; | |
7670 for (i = 0x80; i < 0xA0; i++) | |
29005
b396df3a5181
(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to
Kenichi Handa <handa@m17n.org>
parents:
28512
diff
changeset
|
7671 iso_code_class[i] = ISO_control_1; |
17052 | 7672 for (i = 0xA1; i < 0xFF; i++) |
7673 iso_code_class[i] = ISO_graphic_plane_1; | |
7674 iso_code_class[0x20] = iso_code_class[0x7F] = ISO_0x20_or_0x7F; | |
7675 iso_code_class[0xA0] = iso_code_class[0xFF] = ISO_0xA0_or_0xFF; | |
7676 iso_code_class[ISO_CODE_CR] = ISO_carriage_return; | |
7677 iso_code_class[ISO_CODE_SO] = ISO_shift_out; | |
7678 iso_code_class[ISO_CODE_SI] = ISO_shift_in; | |
7679 iso_code_class[ISO_CODE_SS2_7] = ISO_single_shift_2_7; | |
7680 iso_code_class[ISO_CODE_ESC] = ISO_escape; | |
7681 iso_code_class[ISO_CODE_SS2] = ISO_single_shift_2; | |
7682 iso_code_class[ISO_CODE_SS3] = ISO_single_shift_3; | |
7683 iso_code_class[ISO_CODE_CSI] = ISO_control_sequence_introducer; | |
7684 | |
26067
f54ca66e2571
(code_convert_string): Add record_unwind_protect to
Kenichi Handa <handa@m17n.org>
parents:
25860
diff
changeset
|
7685 inhibit_pre_post_conversion = 0; |
88365 | 7686 |
7687 for (i = 0; i < 256; i++) | |
7688 { | |
7689 emacs_mule_bytes[i] = 1; | |
7690 } | |
17119
2cfb31c15ced
(create_process, Fopen_network_stream): Typo in indexes
Kenichi Handa <handa@m17n.org>
parents:
17071
diff
changeset
|
7691 } |
2cfb31c15ced
(create_process, Fopen_network_stream): Typo in indexes
Kenichi Handa <handa@m17n.org>
parents:
17071
diff
changeset
|
7692 |
2cfb31c15ced
(create_process, Fopen_network_stream): Typo in indexes
Kenichi Handa <handa@m17n.org>
parents:
17071
diff
changeset
|
7693 #ifdef emacs |
2cfb31c15ced
(create_process, Fopen_network_stream): Typo in indexes
Kenichi Handa <handa@m17n.org>
parents:
17071
diff
changeset
|
7694 |
21514 | 7695 void |
17119
2cfb31c15ced
(create_process, Fopen_network_stream): Typo in indexes
Kenichi Handa <handa@m17n.org>
parents:
17071
diff
changeset
|
7696 syms_of_coding () |
2cfb31c15ced
(create_process, Fopen_network_stream): Typo in indexes
Kenichi Handa <handa@m17n.org>
parents:
17071
diff
changeset
|
7697 { |
88365 | 7698 staticpro (&Vcoding_system_hash_table); |
7699 Vcoding_system_hash_table = Fmakehash (Qeq); | |
7700 | |
7701 staticpro (&Vsjis_coding_system); | |
7702 Vsjis_coding_system = Qnil; | |
7703 | |
7704 staticpro (&Vbig5_coding_system); | |
7705 Vbig5_coding_system = Qnil; | |
7706 | |
7707 staticpro (&Vcode_conversion_work_buf_list); | |
7708 Vcode_conversion_work_buf_list = Qnil; | |
7709 | |
7710 staticpro (&Vcode_conversion_reused_work_buf); | |
7711 Vcode_conversion_reused_work_buf = Qnil; | |
7712 | |
7713 DEFSYM (Qcharset, "charset"); | |
7714 DEFSYM (Qtarget_idx, "target-idx"); | |
7715 DEFSYM (Qcoding_system_history, "coding-system-history"); | |
19750
95e4e1cba6ac
(Qcoding_system_history): New variable.
Richard M. Stallman <rms@gnu.org>
parents:
19747
diff
changeset
|
7716 Fset (Qcoding_system_history, Qnil); |
95e4e1cba6ac
(Qcoding_system_history): New variable.
Richard M. Stallman <rms@gnu.org>
parents:
19747
diff
changeset
|
7717 |
18650
aa3f2820e2ac
(Qemacs_mule, inhibit_eol_conversion): New variables.
Kenichi Handa <handa@m17n.org>
parents:
18613
diff
changeset
|
7718 /* Target FILENAME is the first argument. */ |
17119
2cfb31c15ced
(create_process, Fopen_network_stream): Typo in indexes
Kenichi Handa <handa@m17n.org>
parents:
17071
diff
changeset
|
7719 Fput (Qinsert_file_contents, Qtarget_idx, make_number (0)); |
18650
aa3f2820e2ac
(Qemacs_mule, inhibit_eol_conversion): New variables.
Kenichi Handa <handa@m17n.org>
parents:
18613
diff
changeset
|
7720 /* Target FILENAME is the third argument. */ |
17119
2cfb31c15ced
(create_process, Fopen_network_stream): Typo in indexes
Kenichi Handa <handa@m17n.org>
parents:
17071
diff
changeset
|
7721 Fput (Qwrite_region, Qtarget_idx, make_number (2)); |
2cfb31c15ced
(create_process, Fopen_network_stream): Typo in indexes
Kenichi Handa <handa@m17n.org>
parents:
17071
diff
changeset
|
7722 |
88365 | 7723 DEFSYM (Qcall_process, "call-process"); |
18650
aa3f2820e2ac
(Qemacs_mule, inhibit_eol_conversion): New variables.
Kenichi Handa <handa@m17n.org>
parents:
18613
diff
changeset
|
7724 /* Target PROGRAM is the first argument. */ |
17119
2cfb31c15ced
(create_process, Fopen_network_stream): Typo in indexes
Kenichi Handa <handa@m17n.org>
parents:
17071
diff
changeset
|
7725 Fput (Qcall_process, Qtarget_idx, make_number (0)); |
2cfb31c15ced
(create_process, Fopen_network_stream): Typo in indexes
Kenichi Handa <handa@m17n.org>
parents:
17071
diff
changeset
|
7726 |
88365 | 7727 DEFSYM (Qcall_process_region, "call-process-region"); |
18650
aa3f2820e2ac
(Qemacs_mule, inhibit_eol_conversion): New variables.
Kenichi Handa <handa@m17n.org>
parents:
18613
diff
changeset
|
7728 /* Target PROGRAM is the third argument. */ |
17119
2cfb31c15ced
(create_process, Fopen_network_stream): Typo in indexes
Kenichi Handa <handa@m17n.org>
parents:
17071
diff
changeset
|
7729 Fput (Qcall_process_region, Qtarget_idx, make_number (2)); |
2cfb31c15ced
(create_process, Fopen_network_stream): Typo in indexes
Kenichi Handa <handa@m17n.org>
parents:
17071
diff
changeset
|
7730 |
88365 | 7731 DEFSYM (Qstart_process, "start-process"); |
18650
aa3f2820e2ac
(Qemacs_mule, inhibit_eol_conversion): New variables.
Kenichi Handa <handa@m17n.org>
parents:
18613
diff
changeset
|
7732 /* Target PROGRAM is the third argument. */ |
17119
2cfb31c15ced
(create_process, Fopen_network_stream): Typo in indexes
Kenichi Handa <handa@m17n.org>
parents:
17071
diff
changeset
|
7733 Fput (Qstart_process, Qtarget_idx, make_number (2)); |
2cfb31c15ced
(create_process, Fopen_network_stream): Typo in indexes
Kenichi Handa <handa@m17n.org>
parents:
17071
diff
changeset
|
7734 |
88365 | 7735 DEFSYM (Qopen_network_stream, "open-network-stream"); |
18650
aa3f2820e2ac
(Qemacs_mule, inhibit_eol_conversion): New variables.
Kenichi Handa <handa@m17n.org>
parents:
18613
diff
changeset
|
7736 /* Target SERVICE is the fourth argument. */ |
17119
2cfb31c15ced
(create_process, Fopen_network_stream): Typo in indexes
Kenichi Handa <handa@m17n.org>
parents:
17071
diff
changeset
|
7737 Fput (Qopen_network_stream, Qtarget_idx, make_number (3)); |
2cfb31c15ced
(create_process, Fopen_network_stream): Typo in indexes
Kenichi Handa <handa@m17n.org>
parents:
17071
diff
changeset
|
7738 |
88365 | 7739 DEFSYM (Qcoding_system, "coding-system"); |
7740 DEFSYM (Qcoding_aliases, "coding-aliases"); | |
7741 | |
7742 DEFSYM (Qeol_type, "eol-type"); | |
7743 DEFSYM (Qunix, "unix"); | |
7744 DEFSYM (Qdos, "dos"); | |
7745 DEFSYM (Qmac, "mac"); | |
7746 | |
7747 DEFSYM (Qbuffer_file_coding_system, "buffer-file-coding-system"); | |
7748 DEFSYM (Qpost_read_conversion, "post-read-conversion"); | |
7749 DEFSYM (Qpre_write_conversion, "pre-write-conversion"); | |
7750 DEFSYM (Qdefault_char, "default-char"); | |
7751 DEFSYM (Qundecided, "undecided"); | |
7752 DEFSYM (Qno_conversion, "no-conversion"); | |
7753 DEFSYM (Qraw_text, "raw-text"); | |
7754 | |
7755 DEFSYM (Qiso_2022, "iso-2022"); | |
7756 | |
7757 DEFSYM (Qutf_8, "utf-8"); | |
7758 | |
7759 DEFSYM (Qutf_16, "utf-16"); | |
7760 DEFSYM (Qutf_16_be, "utf-16-be"); | |
7761 DEFSYM (Qutf_16_be_nosig, "utf-16-be-nosig"); | |
7762 DEFSYM (Qutf_16_le, "utf-16-l3"); | |
7763 DEFSYM (Qutf_16_le_nosig, "utf-16-le-nosig"); | |
7764 DEFSYM (Qsignature, "signature"); | |
7765 DEFSYM (Qendian, "endian"); | |
7766 DEFSYM (Qbig, "big"); | |
7767 DEFSYM (Qlittle, "little"); | |
7768 | |
7769 DEFSYM (Qshift_jis, "shift-jis"); | |
7770 DEFSYM (Qbig5, "big5"); | |
7771 | |
7772 DEFSYM (Qcoding_system_p, "coding-system-p"); | |
7773 | |
7774 DEFSYM (Qcoding_system_error, "coding-system-error"); | |
17052 | 7775 Fput (Qcoding_system_error, Qerror_conditions, |
7776 Fcons (Qcoding_system_error, Fcons (Qerror, Qnil))); | |
7777 Fput (Qcoding_system_error, Qerror_message, | |
18650
aa3f2820e2ac
(Qemacs_mule, inhibit_eol_conversion): New variables.
Kenichi Handa <handa@m17n.org>
parents:
18613
diff
changeset
|
7778 build_string ("Invalid coding system")); |
17052 | 7779 |
30487
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
7780 /* Intern this now in case it isn't already done. |
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
7781 Setting this variable twice is harmless. |
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
7782 But don't staticpro it here--that is done in alloc.c. */ |
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
7783 Qchar_table_extra_slots = intern ("char-table-extra-slots"); |
88365 | 7784 |
7785 DEFSYM (Qtranslation_table, "translation-table"); | |
7786 Fput (Qtranslation_table, Qchar_table_extra_slots, make_number (1)); | |
7787 DEFSYM (Qtranslation_table_id, "translation-table-id"); | |
7788 DEFSYM (Qtranslation_table_for_decode, "translation-table-for-decode"); | |
7789 DEFSYM (Qtranslation_table_for_encode, "translation-table-for-encode"); | |
7790 | |
7791 DEFSYM (Qchar_coding_system, "char-coding-system"); | |
7792 | |
41678
5aa97e545399
(syms_of_coding) <Qchar_coding_system>: Give it an
Dave Love <fx@gnu.org>
parents:
41624
diff
changeset
|
7793 Fput (Qchar_coding_system, Qchar_table_extra_slots, make_number (2)); |
20150
402b6e5f4b58
(encode_designation_at_bol): Fix bug of finding graphic
Kenichi Handa <handa@m17n.org>
parents:
20105
diff
changeset
|
7794 |
88365 | 7795 DEFSYM (Qvalid_codes, "valid-codes"); |
7796 | |
7797 DEFSYM (Qemacs_mule, "emacs-mule"); | |
7798 | |
7799 Vcoding_category_table | |
7800 = Fmake_vector (make_number (coding_category_max), Qnil); | |
7801 staticpro (&Vcoding_category_table); | |
7802 /* Followings are target of code detection. */ | |
7803 ASET (Vcoding_category_table, coding_category_iso_7, | |
7804 intern ("coding-category-iso-7")); | |
7805 ASET (Vcoding_category_table, coding_category_iso_7_tight, | |
7806 intern ("coding-category-iso-7-tight")); | |
7807 ASET (Vcoding_category_table, coding_category_iso_8_1, | |
7808 intern ("coding-category-iso-8-1")); | |
7809 ASET (Vcoding_category_table, coding_category_iso_8_2, | |
7810 intern ("coding-category-iso-8-2")); | |
7811 ASET (Vcoding_category_table, coding_category_iso_7_else, | |
7812 intern ("coding-category-iso-7-else")); | |
7813 ASET (Vcoding_category_table, coding_category_iso_8_else, | |
7814 intern ("coding-category-iso-8-else")); | |
7815 ASET (Vcoding_category_table, coding_category_utf_8, | |
7816 intern ("coding-category-utf-8")); | |
7817 ASET (Vcoding_category_table, coding_category_utf_16_be, | |
7818 intern ("coding-category-utf-16-be")); | |
7819 ASET (Vcoding_category_table, coding_category_utf_16_le, | |
7820 intern ("coding-category-utf-16-le")); | |
7821 ASET (Vcoding_category_table, coding_category_utf_16_be_nosig, | |
7822 intern ("coding-category-utf-16-be-nosig")); | |
7823 ASET (Vcoding_category_table, coding_category_utf_16_le_nosig, | |
7824 intern ("coding-category-utf-16-le-nosig")); | |
7825 ASET (Vcoding_category_table, coding_category_charset, | |
7826 intern ("coding-category-charset")); | |
7827 ASET (Vcoding_category_table, coding_category_sjis, | |
7828 intern ("coding-category-sjis")); | |
7829 ASET (Vcoding_category_table, coding_category_big5, | |
7830 intern ("coding-category-big5")); | |
7831 ASET (Vcoding_category_table, coding_category_ccl, | |
7832 intern ("coding-category-ccl")); | |
7833 ASET (Vcoding_category_table, coding_category_emacs_mule, | |
7834 intern ("coding-category-emacs-mule")); | |
7835 /* Followings are NOT target of code detection. */ | |
7836 ASET (Vcoding_category_table, coding_category_raw_text, | |
7837 intern ("coding-category-raw-text")); | |
7838 ASET (Vcoding_category_table, coding_category_undecided, | |
7839 intern ("coding-category-undecided")); | |
7840 | |
7841 { | |
7842 Lisp_Object args[coding_arg_max]; | |
7843 Lisp_Object plist[14]; | |
7844 int i; | |
7845 | |
7846 for (i = 0; i < coding_arg_max; i++) | |
7847 args[i] = Qnil; | |
7848 | |
7849 plist[0] = intern (":name"); | |
7850 plist[1] = args[coding_arg_name] = Qno_conversion; | |
7851 plist[2] = intern (":mnemonic"); | |
7852 plist[3] = args[coding_arg_mnemonic] = make_number ('='); | |
7853 plist[4] = intern (":coding-type"); | |
7854 plist[5] = args[coding_arg_coding_type] = Qraw_text; | |
7855 plist[6] = intern (":ascii-compatible-p"); | |
7856 plist[7] = args[coding_arg_ascii_compatible_p] = Qt; | |
7857 plist[8] = intern (":default-char"); | |
7858 plist[9] = args[coding_arg_default_char] = make_number (0); | |
7859 plist[10] = intern (":docstring"); | |
7860 plist[11] = build_string ("Do no conversion.\n\ | |
7861 \n\ | |
7862 When you visit a file with this coding, the file is read into a\n\ | |
7863 unibyte buffer as is, thus each byte of a file is treated as a\n\ | |
7864 character."); | |
7865 plist[12] = intern (":eol-type"); | |
7866 plist[13] = args[coding_arg_eol_type] = Qunix; | |
7867 args[coding_arg_plist] = Flist (14, plist); | |
7868 Fdefine_coding_system_internal (coding_arg_max, args); | |
7869 } | |
7870 | |
7871 setup_coding_system (Qno_conversion, &keyboard_coding); | |
7872 setup_coding_system (Qno_conversion, &terminal_coding); | |
7873 setup_coding_system (Qno_conversion, &safe_terminal_coding); | |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
7874 |
17052 | 7875 defsubr (&Scoding_system_p); |
7876 defsubr (&Sread_coding_system); | |
7877 defsubr (&Sread_non_nil_coding_system); | |
7878 defsubr (&Scheck_coding_system); | |
7879 defsubr (&Sdetect_coding_region); | |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
7880 defsubr (&Sdetect_coding_string); |
30487
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
7881 defsubr (&Sfind_coding_systems_region_internal); |
88365 | 7882 defsubr (&Scheck_coding_systems_region); |
17052 | 7883 defsubr (&Sdecode_coding_region); |
7884 defsubr (&Sencode_coding_region); | |
7885 defsubr (&Sdecode_coding_string); | |
7886 defsubr (&Sencode_coding_string); | |
7887 defsubr (&Sdecode_sjis_char); | |
7888 defsubr (&Sencode_sjis_char); | |
7889 defsubr (&Sdecode_big5_char); | |
7890 defsubr (&Sencode_big5_char); | |
18002
a14261786239
(encode_invocation_designation): Use macro
Kenichi Handa <handa@m17n.org>
parents:
17835
diff
changeset
|
7891 defsubr (&Sset_terminal_coding_system_internal); |
19280
e755044718ee
(ENCODE_ISO_CHARACTER_DIMENSION1): Pay attention to
Kenichi Handa <handa@m17n.org>
parents:
19193
diff
changeset
|
7892 defsubr (&Sset_safe_terminal_coding_system_internal); |
17052 | 7893 defsubr (&Sterminal_coding_system); |
18002
a14261786239
(encode_invocation_designation): Use macro
Kenichi Handa <handa@m17n.org>
parents:
17835
diff
changeset
|
7894 defsubr (&Sset_keyboard_coding_system_internal); |
17052 | 7895 defsubr (&Skeyboard_coding_system); |
18536
69c0e220b626
(Vstandard_character_unification_table_for_decode):
Kenichi Handa <handa@m17n.org>
parents:
18523
diff
changeset
|
7896 defsubr (&Sfind_operation_coding_system); |
88365 | 7897 defsubr (&Sset_coding_system_priority); |
7898 defsubr (&Sdefine_coding_system_internal); | |
7899 defsubr (&Sdefine_coding_system_alias); | |
7900 defsubr (&Scoding_system_base); | |
7901 defsubr (&Scoding_system_plist); | |
7902 defsubr (&Scoding_system_aliases); | |
7903 defsubr (&Scoding_system_eol_type); | |
7904 defsubr (&Scoding_system_priority_list); | |
17052 | 7905 |
20105
c017642863c2
(Qcoding_system_spec): Deleted.
Kenichi Handa <handa@m17n.org>
parents:
19824
diff
changeset
|
7906 DEFVAR_LISP ("coding-system-list", &Vcoding_system_list, |
40713
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7907 doc: /* List of coding systems. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7908 |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7909 Do not alter the value of this variable manually. This variable should be |
88365 | 7910 updated by the functions `define-coding-system' and |
40713
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7911 `define-coding-system-alias'. */); |
20105
c017642863c2
(Qcoding_system_spec): Deleted.
Kenichi Handa <handa@m17n.org>
parents:
19824
diff
changeset
|
7912 Vcoding_system_list = Qnil; |
c017642863c2
(Qcoding_system_spec): Deleted.
Kenichi Handa <handa@m17n.org>
parents:
19824
diff
changeset
|
7913 |
c017642863c2
(Qcoding_system_spec): Deleted.
Kenichi Handa <handa@m17n.org>
parents:
19824
diff
changeset
|
7914 DEFVAR_LISP ("coding-system-alist", &Vcoding_system_alist, |
40713
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7915 doc: /* Alist of coding system names. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7916 Each element is one element list of coding system name. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7917 This variable is given to `completing-read' as TABLE argument. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7918 |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7919 Do not alter the value of this variable manually. This variable should be |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7920 updated by the functions `make-coding-system' and |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7921 `define-coding-system-alias'. */); |
20105
c017642863c2
(Qcoding_system_spec): Deleted.
Kenichi Handa <handa@m17n.org>
parents:
19824
diff
changeset
|
7922 Vcoding_system_alist = Qnil; |
c017642863c2
(Qcoding_system_spec): Deleted.
Kenichi Handa <handa@m17n.org>
parents:
19824
diff
changeset
|
7923 |
17052 | 7924 DEFVAR_LISP ("coding-category-list", &Vcoding_category_list, |
40713
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7925 doc: /* List of coding-categories (symbols) ordered by priority. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7926 |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7927 On detecting a coding system, Emacs tries code detection algorithms |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7928 associated with each coding-category one by one in this order. When |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7929 one algorithm agrees with a byte sequence of source text, the coding |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7930 system bound to the corresponding coding-category is selected. */); |
17052 | 7931 { |
7932 int i; | |
7933 | |
7934 Vcoding_category_list = Qnil; | |
88365 | 7935 for (i = coding_category_max - 1; i >= 0; i--) |
17052 | 7936 Vcoding_category_list |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
7937 = Fcons (XVECTOR (Vcoding_category_table)->contents[i], |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
7938 Vcoding_category_list); |
17052 | 7939 } |
7940 | |
7941 DEFVAR_LISP ("coding-system-for-read", &Vcoding_system_for_read, | |
40713
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7942 doc: /* Specify the coding system for read operations. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7943 It is useful to bind this variable with `let', but do not set it globally. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7944 If the value is a coding system, it is used for decoding on read operation. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7945 If not, an appropriate element is used from one of the coding system alists: |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7946 There are three such tables, `file-coding-system-alist', |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7947 `process-coding-system-alist', and `network-coding-system-alist'. */); |
17052 | 7948 Vcoding_system_for_read = Qnil; |
7949 | |
7950 DEFVAR_LISP ("coding-system-for-write", &Vcoding_system_for_write, | |
40713
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7951 doc: /* Specify the coding system for write operations. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7952 Programs bind this variable with `let', but you should not set it globally. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7953 If the value is a coding system, it is used for encoding of output, |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7954 when writing it to a file and when sending it to a file or subprocess. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7955 |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7956 If this does not specify a coding system, an appropriate element |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7957 is used from one of the coding system alists: |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7958 There are three such tables, `file-coding-system-alist', |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7959 `process-coding-system-alist', and `network-coding-system-alist'. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7960 For output to files, if the above procedure does not specify a coding system, |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7961 the value of `buffer-file-coding-system' is used. */); |
17052 | 7962 Vcoding_system_for_write = Qnil; |
7963 | |
7964 DEFVAR_LISP ("last-coding-system-used", &Vlast_coding_system_used, | |
88365 | 7965 doc: /* |
7966 Coding system used in the latest file or process I/O. */); | |
17052 | 7967 Vlast_coding_system_used = Qnil; |
7968 | |
18650
aa3f2820e2ac
(Qemacs_mule, inhibit_eol_conversion): New variables.
Kenichi Handa <handa@m17n.org>
parents:
18613
diff
changeset
|
7969 DEFVAR_BOOL ("inhibit-eol-conversion", &inhibit_eol_conversion, |
88365 | 7970 doc: /* |
7971 *Non-nil means always inhibit code conversion of end-of-line format. | |
40713
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7972 See info node `Coding Systems' and info node `Text and Binary' concerning |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7973 such conversion. */); |
18650
aa3f2820e2ac
(Qemacs_mule, inhibit_eol_conversion): New variables.
Kenichi Handa <handa@m17n.org>
parents:
18613
diff
changeset
|
7974 inhibit_eol_conversion = 0; |
aa3f2820e2ac
(Qemacs_mule, inhibit_eol_conversion): New variables.
Kenichi Handa <handa@m17n.org>
parents:
18613
diff
changeset
|
7975 |
21574
30394e3ae7f8
(syms_of_coding): Declare and define inherit-process-coding-system.
Eli Zaretskii <eliz@gnu.org>
parents:
21520
diff
changeset
|
7976 DEFVAR_BOOL ("inherit-process-coding-system", &inherit_process_coding_system, |
88365 | 7977 doc: /* |
7978 Non-nil means process buffer inherits coding system of process output. | |
40713
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7979 Bind it to t if the process output is to be treated as if it were a file |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7980 read from some filesystem. */); |
21574
30394e3ae7f8
(syms_of_coding): Declare and define inherit-process-coding-system.
Eli Zaretskii <eliz@gnu.org>
parents:
21520
diff
changeset
|
7981 inherit_process_coding_system = 0; |
30394e3ae7f8
(syms_of_coding): Declare and define inherit-process-coding-system.
Eli Zaretskii <eliz@gnu.org>
parents:
21520
diff
changeset
|
7982 |
18180
5f4c4da24e75
(Vcoding_system_alist): Deleted.
Kenichi Handa <handa@m17n.org>
parents:
18002
diff
changeset
|
7983 DEFVAR_LISP ("file-coding-system-alist", &Vfile_coding_system_alist, |
88365 | 7984 doc: /* |
7985 Alist to decide a coding system to use for a file I/O operation. | |
40713
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7986 The format is ((PATTERN . VAL) ...), |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7987 where PATTERN is a regular expression matching a file name, |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7988 VAL is a coding system, a cons of coding systems, or a function symbol. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7989 If VAL is a coding system, it is used for both decoding and encoding |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7990 the file contents. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7991 If VAL is a cons of coding systems, the car part is used for decoding, |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7992 and the cdr part is used for encoding. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7993 If VAL is a function symbol, the function must return a coding system |
41678
5aa97e545399
(syms_of_coding) <Qchar_coding_system>: Give it an
Dave Love <fx@gnu.org>
parents:
41624
diff
changeset
|
7994 or a cons of coding systems which are used as above. The function gets |
5aa97e545399
(syms_of_coding) <Qchar_coding_system>: Give it an
Dave Love <fx@gnu.org>
parents:
41624
diff
changeset
|
7995 the arguments with which `find-operation-coding-systems' was called. |
40713
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7996 |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7997 See also the function `find-operation-coding-system' |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
7998 and the variable `auto-coding-alist'. */); |
18180
5f4c4da24e75
(Vcoding_system_alist): Deleted.
Kenichi Handa <handa@m17n.org>
parents:
18002
diff
changeset
|
7999 Vfile_coding_system_alist = Qnil; |
5f4c4da24e75
(Vcoding_system_alist): Deleted.
Kenichi Handa <handa@m17n.org>
parents:
18002
diff
changeset
|
8000 |
5f4c4da24e75
(Vcoding_system_alist): Deleted.
Kenichi Handa <handa@m17n.org>
parents:
18002
diff
changeset
|
8001 DEFVAR_LISP ("process-coding-system-alist", &Vprocess_coding_system_alist, |
88365 | 8002 doc: /* |
8003 Alist to decide a coding system to use for a process I/O operation. | |
40713
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8004 The format is ((PATTERN . VAL) ...), |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8005 where PATTERN is a regular expression matching a program name, |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8006 VAL is a coding system, a cons of coding systems, or a function symbol. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8007 If VAL is a coding system, it is used for both decoding what received |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8008 from the program and encoding what sent to the program. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8009 If VAL is a cons of coding systems, the car part is used for decoding, |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8010 and the cdr part is used for encoding. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8011 If VAL is a function symbol, the function must return a coding system |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8012 or a cons of coding systems which are used as above. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8013 |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8014 See also the function `find-operation-coding-system'. */); |
18180
5f4c4da24e75
(Vcoding_system_alist): Deleted.
Kenichi Handa <handa@m17n.org>
parents:
18002
diff
changeset
|
8015 Vprocess_coding_system_alist = Qnil; |
5f4c4da24e75
(Vcoding_system_alist): Deleted.
Kenichi Handa <handa@m17n.org>
parents:
18002
diff
changeset
|
8016 |
5f4c4da24e75
(Vcoding_system_alist): Deleted.
Kenichi Handa <handa@m17n.org>
parents:
18002
diff
changeset
|
8017 DEFVAR_LISP ("network-coding-system-alist", &Vnetwork_coding_system_alist, |
88365 | 8018 doc: /* |
8019 Alist to decide a coding system to use for a network I/O operation. | |
40713
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8020 The format is ((PATTERN . VAL) ...), |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8021 where PATTERN is a regular expression matching a network service name |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8022 or is a port number to connect to, |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8023 VAL is a coding system, a cons of coding systems, or a function symbol. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8024 If VAL is a coding system, it is used for both decoding what received |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8025 from the network stream and encoding what sent to the network stream. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8026 If VAL is a cons of coding systems, the car part is used for decoding, |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8027 and the cdr part is used for encoding. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8028 If VAL is a function symbol, the function must return a coding system |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8029 or a cons of coding systems which are used as above. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8030 |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8031 See also the function `find-operation-coding-system'. */); |
18180
5f4c4da24e75
(Vcoding_system_alist): Deleted.
Kenichi Handa <handa@m17n.org>
parents:
18002
diff
changeset
|
8032 Vnetwork_coding_system_alist = Qnil; |
17052 | 8033 |
26088
b7aa6ac26872
Add support for large files, 64-bit Solaris, system locale codings.
Paul Eggert <eggert@twinsun.com>
parents:
26067
diff
changeset
|
8034 DEFVAR_LISP ("locale-coding-system", &Vlocale_coding_system, |
41026
6f20449b7e12
(syms_of_coding): Doc fix.
Richard M. Stallman <rms@gnu.org>
parents:
41006
diff
changeset
|
8035 doc: /* Coding system to use with system messages. |
6f20449b7e12
(syms_of_coding): Doc fix.
Richard M. Stallman <rms@gnu.org>
parents:
41006
diff
changeset
|
8036 Also used for decoding keyboard input on X Window system. */); |
26088
b7aa6ac26872
Add support for large files, 64-bit Solaris, system locale codings.
Paul Eggert <eggert@twinsun.com>
parents:
26067
diff
changeset
|
8037 Vlocale_coding_system = Qnil; |
b7aa6ac26872
Add support for large files, 64-bit Solaris, system locale codings.
Paul Eggert <eggert@twinsun.com>
parents:
26067
diff
changeset
|
8038 |
29182
1d1c27067af4
(encode_eol): Add null statement after label.
Dave Love <fx@gnu.org>
parents:
29172
diff
changeset
|
8039 /* The eol mnemonics are reset in startup.el system-dependently. */ |
24200
b9d9fccad516
(syms_of_coding): eol-mnemonic-* variables are now
Eli Zaretskii <eliz@gnu.org>
parents:
24178
diff
changeset
|
8040 DEFVAR_LISP ("eol-mnemonic-unix", &eol_mnemonic_unix, |
88365 | 8041 doc: /* |
8042 *String displayed in mode line for UNIX-like (LF) end-of-line format. */); | |
24200
b9d9fccad516
(syms_of_coding): eol-mnemonic-* variables are now
Eli Zaretskii <eliz@gnu.org>
parents:
24178
diff
changeset
|
8043 eol_mnemonic_unix = build_string (":"); |
b9d9fccad516
(syms_of_coding): eol-mnemonic-* variables are now
Eli Zaretskii <eliz@gnu.org>
parents:
24178
diff
changeset
|
8044 |
b9d9fccad516
(syms_of_coding): eol-mnemonic-* variables are now
Eli Zaretskii <eliz@gnu.org>
parents:
24178
diff
changeset
|
8045 DEFVAR_LISP ("eol-mnemonic-dos", &eol_mnemonic_dos, |
88365 | 8046 doc: /* |
8047 *String displayed in mode line for DOS-like (CRLF) end-of-line format. */); | |
24200
b9d9fccad516
(syms_of_coding): eol-mnemonic-* variables are now
Eli Zaretskii <eliz@gnu.org>
parents:
24178
diff
changeset
|
8048 eol_mnemonic_dos = build_string ("\\"); |
b9d9fccad516
(syms_of_coding): eol-mnemonic-* variables are now
Eli Zaretskii <eliz@gnu.org>
parents:
24178
diff
changeset
|
8049 |
b9d9fccad516
(syms_of_coding): eol-mnemonic-* variables are now
Eli Zaretskii <eliz@gnu.org>
parents:
24178
diff
changeset
|
8050 DEFVAR_LISP ("eol-mnemonic-mac", &eol_mnemonic_mac, |
88365 | 8051 doc: /* |
8052 *String displayed in mode line for MAC-like (CR) end-of-line format. */); | |
24200
b9d9fccad516
(syms_of_coding): eol-mnemonic-* variables are now
Eli Zaretskii <eliz@gnu.org>
parents:
24178
diff
changeset
|
8053 eol_mnemonic_mac = build_string ("/"); |
b9d9fccad516
(syms_of_coding): eol-mnemonic-* variables are now
Eli Zaretskii <eliz@gnu.org>
parents:
24178
diff
changeset
|
8054 |
b9d9fccad516
(syms_of_coding): eol-mnemonic-* variables are now
Eli Zaretskii <eliz@gnu.org>
parents:
24178
diff
changeset
|
8055 DEFVAR_LISP ("eol-mnemonic-undecided", &eol_mnemonic_undecided, |
88365 | 8056 doc: /* |
8057 *String displayed in mode line when end-of-line format is not yet determined. */); | |
24200
b9d9fccad516
(syms_of_coding): eol-mnemonic-* variables are now
Eli Zaretskii <eliz@gnu.org>
parents:
24178
diff
changeset
|
8058 eol_mnemonic_undecided = build_string (":"); |
17052 | 8059 |
22119
592bb8b9bcfd
Change terms unify/unification to
Kenichi Handa <handa@m17n.org>
parents:
22020
diff
changeset
|
8060 DEFVAR_LISP ("enable-character-translation", &Venable_character_translation, |
88365 | 8061 doc: /* |
8062 *Non-nil enables character translation while encoding and decoding. */); | |
22119
592bb8b9bcfd
Change terms unify/unification to
Kenichi Handa <handa@m17n.org>
parents:
22020
diff
changeset
|
8063 Venable_character_translation = Qt; |
592bb8b9bcfd
Change terms unify/unification to
Kenichi Handa <handa@m17n.org>
parents:
22020
diff
changeset
|
8064 |
22186
fc4aaf1b1772
Change term "character translation table" to "translation table".
Kenichi Handa <handa@m17n.org>
parents:
22166
diff
changeset
|
8065 DEFVAR_LISP ("standard-translation-table-for-decode", |
40713
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8066 &Vstandard_translation_table_for_decode, |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8067 doc: /* Table for translating characters while decoding. */); |
22186
fc4aaf1b1772
Change term "character translation table" to "translation table".
Kenichi Handa <handa@m17n.org>
parents:
22166
diff
changeset
|
8068 Vstandard_translation_table_for_decode = Qnil; |
fc4aaf1b1772
Change term "character translation table" to "translation table".
Kenichi Handa <handa@m17n.org>
parents:
22166
diff
changeset
|
8069 |
fc4aaf1b1772
Change term "character translation table" to "translation table".
Kenichi Handa <handa@m17n.org>
parents:
22166
diff
changeset
|
8070 DEFVAR_LISP ("standard-translation-table-for-encode", |
40713
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8071 &Vstandard_translation_table_for_encode, |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8072 doc: /* Table for translating characters while encoding. */); |
22186
fc4aaf1b1772
Change term "character translation table" to "translation table".
Kenichi Handa <handa@m17n.org>
parents:
22166
diff
changeset
|
8073 Vstandard_translation_table_for_encode = Qnil; |
17052 | 8074 |
88365 | 8075 DEFVAR_LISP ("charset-revision-table", &Vcharset_revision_table, |
40713
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8076 doc: /* Alist of charsets vs revision numbers. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8077 While encoding, if a charset (car part of an element) is found, |
88365 | 8078 designate it with the escape sequence identifying revision (cdr part |
8079 of the element). */); | |
8080 Vcharset_revision_table = Qnil; | |
18180
5f4c4da24e75
(Vcoding_system_alist): Deleted.
Kenichi Handa <handa@m17n.org>
parents:
18002
diff
changeset
|
8081 |
5f4c4da24e75
(Vcoding_system_alist): Deleted.
Kenichi Handa <handa@m17n.org>
parents:
18002
diff
changeset
|
8082 DEFVAR_LISP ("default-process-coding-system", |
5f4c4da24e75
(Vcoding_system_alist): Deleted.
Kenichi Handa <handa@m17n.org>
parents:
18002
diff
changeset
|
8083 &Vdefault_process_coding_system, |
40713
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8084 doc: /* Cons of coding systems used for process I/O by default. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8085 The car part is used for decoding a process output, |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8086 the cdr part is used for encoding a text to be sent to a process. */); |
18180
5f4c4da24e75
(Vcoding_system_alist): Deleted.
Kenichi Handa <handa@m17n.org>
parents:
18002
diff
changeset
|
8087 Vdefault_process_coding_system = Qnil; |
19280
e755044718ee
(ENCODE_ISO_CHARACTER_DIMENSION1): Pay attention to
Kenichi Handa <handa@m17n.org>
parents:
19193
diff
changeset
|
8088 |
19365
d9374f5ebd3a
(CODING_FLAG_ISO_LATIN_EXTRA): New macro.
Kenichi Handa <handa@m17n.org>
parents:
19285
diff
changeset
|
8089 DEFVAR_LISP ("latin-extra-code-table", &Vlatin_extra_code_table, |
88365 | 8090 doc: /* |
8091 Table of extra Latin codes in the range 128..159 (inclusive). | |
40713
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8092 This is a vector of length 256. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8093 If Nth element is non-nil, the existence of code N in a file |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8094 \(or output of subprocess) doesn't prevent it to be detected as |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8095 a coding system of ISO 2022 variant which has a flag |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8096 `accept-latin-extra-code' t (e.g. iso-latin-1) on reading a file |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8097 or reading output of a subprocess. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8098 Only 128th through 159th elements has a meaning. */); |
19365
d9374f5ebd3a
(CODING_FLAG_ISO_LATIN_EXTRA): New macro.
Kenichi Handa <handa@m17n.org>
parents:
19285
diff
changeset
|
8099 Vlatin_extra_code_table = Fmake_vector (make_number (256), Qnil); |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
8100 |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
8101 DEFVAR_LISP ("select-safe-coding-system-function", |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
8102 &Vselect_safe_coding_system_function, |
88365 | 8103 doc: /* |
8104 Function to call to select safe coding system for encoding a text. | |
40713
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8105 |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8106 If set, this function is called to force a user to select a proper |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8107 coding system which can encode the text in the case that a default |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8108 coding system used in each operation can't encode the text. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8109 |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8110 The default value is `select-safe-coding-system' (which see). */); |
20718
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
8111 Vselect_safe_coding_system_function = Qnil; |
c600dea3b06b
Vselect_safe_coding_system_function): New variable.
Kenichi Handa <handa@m17n.org>
parents:
20708
diff
changeset
|
8112 |
30487
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
8113 DEFVAR_LISP ("char-coding-system-table", &Vchar_coding_system_table, |
88365 | 8114 doc: /* |
8115 Char-table containing safe coding systems of each characters. | |
40713
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8116 Each element doesn't include such generic coding systems that can |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8117 encode any characters. They are in the first extra slot. */); |
30487
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
8118 Vchar_coding_system_table = Fmake_char_table (Qchar_coding_system, Qnil); |
6165da9c89c6
(Qsafe_charsets): This variable deleted.
Kenichi Handa <handa@m17n.org>
parents:
30384
diff
changeset
|
8119 |
30292
14a9937df1f5
(syms_of_coding): Fix typo in spelling of variable
Gerd Moellmann <gerd@gnu.org>
parents:
30263
diff
changeset
|
8120 DEFVAR_BOOL ("inhibit-iso-escape-detection", |
30204
35aec8514228
(inhibit_iso_escape_detection): New variable.
Kenichi Handa <handa@m17n.org>
parents:
29985
diff
changeset
|
8121 &inhibit_iso_escape_detection, |
88365 | 8122 doc: /* |
8123 If non-nil, Emacs ignores ISO2022's escape sequence on code detection. | |
40713
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8124 |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8125 By default, on reading a file, Emacs tries to detect how the text is |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8126 encoded. This code detection is sensitive to escape sequences. If |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8127 the sequence is valid as ISO2022, the code is determined as one of |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8128 the ISO2022 encodings, and the file is decoded by the corresponding |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8129 coding system (e.g. `iso-2022-7bit'). |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8130 |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8131 However, there may be a case that you want to read escape sequences in |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8132 a file as is. In such a case, you can set this variable to non-nil. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8133 Then, as the code detection ignores any escape sequences, no file is |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8134 detected as encoded in some ISO2022 encoding. The result is that all |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8135 escape sequences become visible in a buffer. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8136 |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8137 The default value is nil, and it is strongly recommended not to change |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8138 it. That is because many Emacs Lisp source files that contain |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8139 non-ASCII characters are encoded by the coding system `iso-2022-7bit' |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8140 in Emacs's distribution, and they won't be decoded correctly on |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8141 reading if you suppress escape sequence detection. |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8142 |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8143 The other way to read escape sequences in a file without decoding is |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8144 to explicitly specify some coding system that doesn't use ISO2022's |
42351475da08
Change doc-string comments to `new style' [w/`doc:' keyword].
Pavel Janík <Pavel@Janik.cz>
parents:
40656
diff
changeset
|
8145 escape sequence (e.g `latin-1') on reading by \\[universal-coding-system-argument]. */); |
30204
35aec8514228
(inhibit_iso_escape_detection): New variable.
Kenichi Handa <handa@m17n.org>
parents:
29985
diff
changeset
|
8146 inhibit_iso_escape_detection = 0; |
17052 | 8147 } |
8148 | |
26088
b7aa6ac26872
Add support for large files, 64-bit Solaris, system locale codings.
Paul Eggert <eggert@twinsun.com>
parents:
26067
diff
changeset
|
8149 char * |
b7aa6ac26872
Add support for large files, 64-bit Solaris, system locale codings.
Paul Eggert <eggert@twinsun.com>
parents:
26067
diff
changeset
|
8150 emacs_strerror (error_number) |
b7aa6ac26872
Add support for large files, 64-bit Solaris, system locale codings.
Paul Eggert <eggert@twinsun.com>
parents:
26067
diff
changeset
|
8151 int error_number; |
b7aa6ac26872
Add support for large files, 64-bit Solaris, system locale codings.
Paul Eggert <eggert@twinsun.com>
parents:
26067
diff
changeset
|
8152 { |
b7aa6ac26872
Add support for large files, 64-bit Solaris, system locale codings.
Paul Eggert <eggert@twinsun.com>
parents:
26067
diff
changeset
|
8153 char *str; |
b7aa6ac26872
Add support for large files, 64-bit Solaris, system locale codings.
Paul Eggert <eggert@twinsun.com>
parents:
26067
diff
changeset
|
8154 |
26526
b7438760079b
* callproc.c (strerror): Remove decl.
Paul Eggert <eggert@twinsun.com>
parents:
26240
diff
changeset
|
8155 synchronize_system_messages_locale (); |
26088
b7aa6ac26872
Add support for large files, 64-bit Solaris, system locale codings.
Paul Eggert <eggert@twinsun.com>
parents:
26067
diff
changeset
|
8156 str = strerror (error_number); |
b7aa6ac26872
Add support for large files, 64-bit Solaris, system locale codings.
Paul Eggert <eggert@twinsun.com>
parents:
26067
diff
changeset
|
8157 |
b7aa6ac26872
Add support for large files, 64-bit Solaris, system locale codings.
Paul Eggert <eggert@twinsun.com>
parents:
26067
diff
changeset
|
8158 if (! NILP (Vlocale_coding_system)) |
b7aa6ac26872
Add support for large files, 64-bit Solaris, system locale codings.
Paul Eggert <eggert@twinsun.com>
parents:
26067
diff
changeset
|
8159 { |
b7aa6ac26872
Add support for large files, 64-bit Solaris, system locale codings.
Paul Eggert <eggert@twinsun.com>
parents:
26067
diff
changeset
|
8160 Lisp_Object dec = code_convert_string_norecord (build_string (str), |
b7aa6ac26872
Add support for large files, 64-bit Solaris, system locale codings.
Paul Eggert <eggert@twinsun.com>
parents:
26067
diff
changeset
|
8161 Vlocale_coding_system, |
b7aa6ac26872
Add support for large files, 64-bit Solaris, system locale codings.
Paul Eggert <eggert@twinsun.com>
parents:
26067
diff
changeset
|
8162 0); |
b7aa6ac26872
Add support for large files, 64-bit Solaris, system locale codings.
Paul Eggert <eggert@twinsun.com>
parents:
26067
diff
changeset
|
8163 str = (char *) XSTRING (dec)->data; |
b7aa6ac26872
Add support for large files, 64-bit Solaris, system locale codings.
Paul Eggert <eggert@twinsun.com>
parents:
26067
diff
changeset
|
8164 } |
b7aa6ac26872
Add support for large files, 64-bit Solaris, system locale codings.
Paul Eggert <eggert@twinsun.com>
parents:
26067
diff
changeset
|
8165 |
b7aa6ac26872
Add support for large files, 64-bit Solaris, system locale codings.
Paul Eggert <eggert@twinsun.com>
parents:
26067
diff
changeset
|
8166 return str; |
b7aa6ac26872
Add support for large files, 64-bit Solaris, system locale codings.
Paul Eggert <eggert@twinsun.com>
parents:
26067
diff
changeset
|
8167 } |
b7aa6ac26872
Add support for large files, 64-bit Solaris, system locale codings.
Paul Eggert <eggert@twinsun.com>
parents:
26067
diff
changeset
|
8168 |
17052 | 8169 #endif /* emacs */ |