Mercurial > emacs
comparison lispref/nonascii.texi @ 29265:69f20c18d6eb
*** empty log message ***
author | Kenichi Handa <handa@m17n.org> |
---|---|
date | Sun, 28 May 2000 23:54:22 +0000 |
parents | ac620ff5fd5d |
children | d831c2ad9313 |
comparison
equal
deleted
inserted
replaced
29264:1e4a5ffdacf5 | 29265:69f20c18d6eb |
---|---|
155 @defvar nonascii-insert-offset | 155 @defvar nonascii-insert-offset |
156 This variable specifies the amount to add to a non-@sc{ascii} character | 156 This variable specifies the amount to add to a non-@sc{ascii} character |
157 when converting unibyte text to multibyte. It also applies when | 157 when converting unibyte text to multibyte. It also applies when |
158 @code{self-insert-command} inserts a character in the unibyte | 158 @code{self-insert-command} inserts a character in the unibyte |
159 non-@sc{ascii} range, 128 through 255. However, the function | 159 non-@sc{ascii} range, 128 through 255. However, the function |
160 @code{insert-char} does not perform this conversion. | 160 @code{insert} and @code{insert-char} do not perform this conversion. |
161 | 161 |
162 The right value to use to select character set @var{cs} is @code{(- | 162 The right value to use to select character set @var{cs} is @code{(- |
163 (make-char @var{cs}) 128)}. If the value of | 163 (make-char @var{cs}) 128)}. If the value of |
164 @code{nonascii-insert-offset} is zero, then conversion actually uses the | 164 @code{nonascii-insert-offset} is zero, then conversion actually uses the |
165 value for the Latin 1 character set, rather than zero. | 165 value for the Latin 1 character set, rather than zero. |
167 | 167 |
168 @defvar nonascii-translation-table | 168 @defvar nonascii-translation-table |
169 This variable provides a more general alternative to | 169 This variable provides a more general alternative to |
170 @code{nonascii-insert-offset}. You can use it to specify independently | 170 @code{nonascii-insert-offset}. You can use it to specify independently |
171 how to translate each code in the range of 128 through 255 into a | 171 how to translate each code in the range of 128 through 255 into a |
172 multibyte character. The value should be a vector, or @code{nil}. | 172 multibyte character. The value should be a char-table, or @code{nil}. |
173 If this is non-@code{nil}, it overrides @code{nonascii-insert-offset}. | 173 If this is non-@code{nil}, it overrides @code{nonascii-insert-offset}. |
174 @end defvar | 174 @end defvar |
175 | 175 |
176 @defun string-make-unibyte string | 176 @defun string-make-unibyte string |
177 This function converts the text of @var{string} to unibyte | 177 This function converts the text of @var{string} to unibyte |
198 | 198 |
199 This function leaves the buffer contents unchanged when viewed as a | 199 This function leaves the buffer contents unchanged when viewed as a |
200 sequence of bytes. As a consequence, it can change the contents viewed | 200 sequence of bytes. As a consequence, it can change the contents viewed |
201 as characters; a sequence of two bytes which is treated as one character | 201 as characters; a sequence of two bytes which is treated as one character |
202 in multibyte representation will count as two characters in unibyte | 202 in multibyte representation will count as two characters in unibyte |
203 representation. | 203 representation. Character codes 128 through 159 are an exception. They |
204 are represented by one byte in a unibyte buffer, but when the buffer is | |
205 set to multibyte, they are converted to two-byte sequences, and vice | |
206 versa. | |
204 | 207 |
205 This function sets @code{enable-multibyte-characters} to record which | 208 This function sets @code{enable-multibyte-characters} to record which |
206 representation is in use. It also adjusts various data in the buffer | 209 representation is in use. It also adjusts various data in the buffer |
207 (including overlays, text properties and markers) so that they cover the | 210 (including overlays, text properties and markers) so that they cover the |
208 same text as they did before. | 211 same text as they did before. |
242 really proper in multibyte text, but they can occur if you do explicit | 245 really proper in multibyte text, but they can occur if you do explicit |
243 encoding and decoding (@pxref{Explicit Encoding}). Some other character | 246 encoding and decoding (@pxref{Explicit Encoding}). Some other character |
244 codes cannot occur at all in multibyte text. Only the @sc{ascii} codes | 247 codes cannot occur at all in multibyte text. Only the @sc{ascii} codes |
245 0 through 127 are truly legitimate in both representations. | 248 0 through 127 are truly legitimate in both representations. |
246 | 249 |
247 @defun char-valid-p charcode | 250 @defun char-valid-p charcode &optional genericp |
248 This returns @code{t} if @var{charcode} is valid for either one of the two | 251 This returns @code{t} if @var{charcode} is valid for either one of the two |
249 text representations. | 252 text representations. |
250 | 253 |
251 @example | 254 @example |
252 (char-valid-p 65) | 255 (char-valid-p 65) |
254 (char-valid-p 256) | 257 (char-valid-p 256) |
255 @result{} nil | 258 @result{} nil |
256 (char-valid-p 2248) | 259 (char-valid-p 2248) |
257 @result{} t | 260 @result{} t |
258 @end example | 261 @end example |
262 | |
263 If the optional argument @var{genericp} is non-nil, this function | |
264 returns @code{t} if @var{charcode} is a generic character | |
265 (@pxref{Generic Character}). | |
259 @end defun | 266 @end defun |
260 | 267 |
261 @node Character Sets | 268 @node Character Sets |
262 @section Character Sets | 269 @section Character Sets |
263 @cindex character sets | 270 @cindex character sets |
297 @defun charset-plist charset | 304 @defun charset-plist charset |
298 @tindex charset-plist | 305 @tindex charset-plist |
299 This function returns the charset property list of the character set | 306 This function returns the charset property list of the character set |
300 @var{charset}. Although @var{charset} is a symbol, this is not the same | 307 @var{charset}. Although @var{charset} is a symbol, this is not the same |
301 as the property list of that symbol. Charset properties are used for | 308 as the property list of that symbol. Charset properties are used for |
302 special purposes within Emacs; for example, @code{x-charset-registry} | 309 special purposes within Emacs; for example, |
303 helps determine which fonts to use (@pxref{Font Selection}). | 310 @code{preferred-coding-system} helps determine which coding system to |
311 use to encode characters in a charset. | |
304 @end defun | 312 @end defun |
305 | 313 |
306 @node Chars and Bytes | 314 @node Chars and Bytes |
307 @section Characters and Bytes | 315 @section Characters and Bytes |
308 @cindex bytes and characters | 316 @cindex bytes and characters |
310 @cindex introduction sequence | 318 @cindex introduction sequence |
311 @cindex dimension (of character set) | 319 @cindex dimension (of character set) |
312 In multibyte representation, each character occupies one or more | 320 In multibyte representation, each character occupies one or more |
313 bytes. Each character set has an @dfn{introduction sequence}, which is | 321 bytes. Each character set has an @dfn{introduction sequence}, which is |
314 normally one or two bytes long. (Exception: the @sc{ascii} character | 322 normally one or two bytes long. (Exception: the @sc{ascii} character |
315 set has a zero-length introduction sequence.) The introduction sequence | 323 set and the @sc{eight-bit-graphic} character set have a zero-length |
316 is the beginning of the byte sequence for any character in the character | 324 introduction sequence.) The introduction sequence is the beginning of |
317 set. The rest of the character's bytes distinguish it from the other | 325 the byte sequence for any character in the character set. The rest of |
318 characters in the same character set. Depending on the character set, | 326 the character's bytes distinguish it from the other characters in the |
319 there are either one or two distinguishing bytes; the number of such | 327 same character set. Depending on the character set, there are either |
320 bytes is called the @dfn{dimension} of the character set. | 328 one or two distinguishing bytes; the number of such bytes is called the |
329 @dfn{dimension} of the character set. | |
321 | 330 |
322 @defun charset-dimension charset | 331 @defun charset-dimension charset |
323 This function returns the dimension of @var{charset}; at present, the | 332 This function returns the dimension of @var{charset}; at present, the |
324 dimension is always 1 or 2. | 333 dimension is always 1 or 2. |
325 @end defun | 334 @end defun |
355 @example | 364 @example |
356 (split-char 2248) | 365 (split-char 2248) |
357 @result{} (latin-iso8859-1 72) | 366 @result{} (latin-iso8859-1 72) |
358 (split-char 65) | 367 (split-char 65) |
359 @result{} (ascii 65) | 368 @result{} (ascii 65) |
360 @end example | 369 (split-char 128) |
361 | 370 @result{} (eight-bit-control 128) |
362 Unibyte non-@sc{ascii} characters are considered as part of | |
363 the @code{ascii} character set: | |
364 | |
365 @example | |
366 (split-char 192) | |
367 @result{} (ascii 192) | |
368 @end example | 371 @end example |
369 @end defun | 372 @end defun |
370 | 373 |
371 @defun make-char charset &rest byte-values | 374 @defun make-char charset &rest byte-values |
372 This function returns the character in character set @var{charset} | 375 This function returns the character in character set @var{charset} |
393 @example | 396 @example |
394 (make-char 'latin-iso8859-1) | 397 (make-char 'latin-iso8859-1) |
395 @result{} 2176 | 398 @result{} 2176 |
396 (char-valid-p 2176) | 399 (char-valid-p 2176) |
397 @result{} nil | 400 @result{} nil |
401 (char-valid-p 2176 t) | |
402 @result{} t | |
398 (split-char 2176) | 403 (split-char 2176) |
399 @result{} (latin-iso8859-1 0) | 404 @result{} (latin-iso8859-1 0) |
400 @end example | 405 @end example |
406 | |
407 The character sets @sc{ascii}, @sc{eight-bit-control}, and | |
408 @sc{eight-bit-graphic} don't have corresponding generic characters. | |
401 | 409 |
402 @node Scanning Charsets | 410 @node Scanning Charsets |
403 @section Scanning for Character Sets | 411 @section Scanning for Character Sets |
404 | 412 |
405 Sometimes it is useful to find out which character sets appear in a | 413 Sometimes it is useful to find out which character sets appear in a |
597 However, @code{buffer-file-coding-system} does not affect sending text | 605 However, @code{buffer-file-coding-system} does not affect sending text |
598 to a subprocess. | 606 to a subprocess. |
599 @end defvar | 607 @end defvar |
600 | 608 |
601 @defvar save-buffer-coding-system | 609 @defvar save-buffer-coding-system |
602 This variable specifies the coding system for saving the buffer---but it | 610 This variable specifies the coding system for saving the buffer (by |
603 is not used for @code{write-region}. | 611 overriding @code{buffer-file-coding-system}). Note that it is not used |
612 for @code{write-region}. | |
604 | 613 |
605 When a command to save the buffer starts out to use | 614 When a command to save the buffer starts out to use |
606 @code{save-buffer-coding-system}, and that coding system cannot handle | 615 @code{buffer-file-coding-system} (or @code{save-buffer-coding-system}), |
616 and that coding system cannot handle | |
607 the actual text in the buffer, the command asks the user to choose | 617 the actual text in the buffer, the command asks the user to choose |
608 another coding system. After that happens, the command also updates | 618 another coding system. After that happens, the command also updates |
609 @code{save-buffer-coding-system} to represent the coding system that the | 619 @code{buffer-file-coding-system} to represent the coding system that the |
610 user specified. | 620 user specified. |
611 @end defvar | 621 @end defvar |
612 | 622 |
613 @defvar last-coding-system-used | 623 @defvar last-coding-system-used |
614 I/O operations for files and subprocesses set this variable to the | 624 I/O operations for files and subprocesses set this variable to the |
630 Here are the Lisp facilities for working with coding systems: | 640 Here are the Lisp facilities for working with coding systems: |
631 | 641 |
632 @defun coding-system-list &optional base-only | 642 @defun coding-system-list &optional base-only |
633 This function returns a list of all coding system names (symbols). If | 643 This function returns a list of all coding system names (symbols). If |
634 @var{base-only} is non-@code{nil}, the value includes only the | 644 @var{base-only} is non-@code{nil}, the value includes only the |
635 base coding systems. Otherwise, it includes variant coding systems as well. | 645 base coding systems. Otherwise, it includes alias and variant coding |
646 systems as well. | |
636 @end defun | 647 @end defun |
637 | 648 |
638 @defun coding-system-p object | 649 @defun coding-system-p object |
639 This function returns @code{t} if @var{object} is a coding system | 650 This function returns @code{t} if @var{object} is a coding system |
640 name. | 651 name. |