comparison lispref/nonascii.texi @ 29265:69f20c18d6eb

*** empty log message ***
author Kenichi Handa <handa@m17n.org>
date Sun, 28 May 2000 23:54:22 +0000
parents ac620ff5fd5d
children d831c2ad9313
comparison
equal deleted inserted replaced
29264:1e4a5ffdacf5 29265:69f20c18d6eb
155 @defvar nonascii-insert-offset 155 @defvar nonascii-insert-offset
156 This variable specifies the amount to add to a non-@sc{ascii} character 156 This variable specifies the amount to add to a non-@sc{ascii} character
157 when converting unibyte text to multibyte. It also applies when 157 when converting unibyte text to multibyte. It also applies when
158 @code{self-insert-command} inserts a character in the unibyte 158 @code{self-insert-command} inserts a character in the unibyte
159 non-@sc{ascii} range, 128 through 255. However, the function 159 non-@sc{ascii} range, 128 through 255. However, the function
160 @code{insert-char} does not perform this conversion. 160 @code{insert} and @code{insert-char} do not perform this conversion.
161 161
162 The right value to use to select character set @var{cs} is @code{(- 162 The right value to use to select character set @var{cs} is @code{(-
163 (make-char @var{cs}) 128)}. If the value of 163 (make-char @var{cs}) 128)}. If the value of
164 @code{nonascii-insert-offset} is zero, then conversion actually uses the 164 @code{nonascii-insert-offset} is zero, then conversion actually uses the
165 value for the Latin 1 character set, rather than zero. 165 value for the Latin 1 character set, rather than zero.
167 167
168 @defvar nonascii-translation-table 168 @defvar nonascii-translation-table
169 This variable provides a more general alternative to 169 This variable provides a more general alternative to
170 @code{nonascii-insert-offset}. You can use it to specify independently 170 @code{nonascii-insert-offset}. You can use it to specify independently
171 how to translate each code in the range of 128 through 255 into a 171 how to translate each code in the range of 128 through 255 into a
172 multibyte character. The value should be a vector, or @code{nil}. 172 multibyte character. The value should be a char-table, or @code{nil}.
173 If this is non-@code{nil}, it overrides @code{nonascii-insert-offset}. 173 If this is non-@code{nil}, it overrides @code{nonascii-insert-offset}.
174 @end defvar 174 @end defvar
175 175
176 @defun string-make-unibyte string 176 @defun string-make-unibyte string
177 This function converts the text of @var{string} to unibyte 177 This function converts the text of @var{string} to unibyte
198 198
199 This function leaves the buffer contents unchanged when viewed as a 199 This function leaves the buffer contents unchanged when viewed as a
200 sequence of bytes. As a consequence, it can change the contents viewed 200 sequence of bytes. As a consequence, it can change the contents viewed
201 as characters; a sequence of two bytes which is treated as one character 201 as characters; a sequence of two bytes which is treated as one character
202 in multibyte representation will count as two characters in unibyte 202 in multibyte representation will count as two characters in unibyte
203 representation. 203 representation. Character codes 128 through 159 are an exception. They
204 are represented by one byte in a unibyte buffer, but when the buffer is
205 set to multibyte, they are converted to two-byte sequences, and vice
206 versa.
204 207
205 This function sets @code{enable-multibyte-characters} to record which 208 This function sets @code{enable-multibyte-characters} to record which
206 representation is in use. It also adjusts various data in the buffer 209 representation is in use. It also adjusts various data in the buffer
207 (including overlays, text properties and markers) so that they cover the 210 (including overlays, text properties and markers) so that they cover the
208 same text as they did before. 211 same text as they did before.
242 really proper in multibyte text, but they can occur if you do explicit 245 really proper in multibyte text, but they can occur if you do explicit
243 encoding and decoding (@pxref{Explicit Encoding}). Some other character 246 encoding and decoding (@pxref{Explicit Encoding}). Some other character
244 codes cannot occur at all in multibyte text. Only the @sc{ascii} codes 247 codes cannot occur at all in multibyte text. Only the @sc{ascii} codes
245 0 through 127 are truly legitimate in both representations. 248 0 through 127 are truly legitimate in both representations.
246 249
247 @defun char-valid-p charcode 250 @defun char-valid-p charcode &optional genericp
248 This returns @code{t} if @var{charcode} is valid for either one of the two 251 This returns @code{t} if @var{charcode} is valid for either one of the two
249 text representations. 252 text representations.
250 253
251 @example 254 @example
252 (char-valid-p 65) 255 (char-valid-p 65)
254 (char-valid-p 256) 257 (char-valid-p 256)
255 @result{} nil 258 @result{} nil
256 (char-valid-p 2248) 259 (char-valid-p 2248)
257 @result{} t 260 @result{} t
258 @end example 261 @end example
262
263 If the optional argument @var{genericp} is non-nil, this function
264 returns @code{t} if @var{charcode} is a generic character
265 (@pxref{Generic Character}).
259 @end defun 266 @end defun
260 267
261 @node Character Sets 268 @node Character Sets
262 @section Character Sets 269 @section Character Sets
263 @cindex character sets 270 @cindex character sets
297 @defun charset-plist charset 304 @defun charset-plist charset
298 @tindex charset-plist 305 @tindex charset-plist
299 This function returns the charset property list of the character set 306 This function returns the charset property list of the character set
300 @var{charset}. Although @var{charset} is a symbol, this is not the same 307 @var{charset}. Although @var{charset} is a symbol, this is not the same
301 as the property list of that symbol. Charset properties are used for 308 as the property list of that symbol. Charset properties are used for
302 special purposes within Emacs; for example, @code{x-charset-registry} 309 special purposes within Emacs; for example,
303 helps determine which fonts to use (@pxref{Font Selection}). 310 @code{preferred-coding-system} helps determine which coding system to
311 use to encode characters in a charset.
304 @end defun 312 @end defun
305 313
306 @node Chars and Bytes 314 @node Chars and Bytes
307 @section Characters and Bytes 315 @section Characters and Bytes
308 @cindex bytes and characters 316 @cindex bytes and characters
310 @cindex introduction sequence 318 @cindex introduction sequence
311 @cindex dimension (of character set) 319 @cindex dimension (of character set)
312 In multibyte representation, each character occupies one or more 320 In multibyte representation, each character occupies one or more
313 bytes. Each character set has an @dfn{introduction sequence}, which is 321 bytes. Each character set has an @dfn{introduction sequence}, which is
314 normally one or two bytes long. (Exception: the @sc{ascii} character 322 normally one or two bytes long. (Exception: the @sc{ascii} character
315 set has a zero-length introduction sequence.) The introduction sequence 323 set and the @sc{eight-bit-graphic} character set have a zero-length
316 is the beginning of the byte sequence for any character in the character 324 introduction sequence.) The introduction sequence is the beginning of
317 set. The rest of the character's bytes distinguish it from the other 325 the byte sequence for any character in the character set. The rest of
318 characters in the same character set. Depending on the character set, 326 the character's bytes distinguish it from the other characters in the
319 there are either one or two distinguishing bytes; the number of such 327 same character set. Depending on the character set, there are either
320 bytes is called the @dfn{dimension} of the character set. 328 one or two distinguishing bytes; the number of such bytes is called the
329 @dfn{dimension} of the character set.
321 330
322 @defun charset-dimension charset 331 @defun charset-dimension charset
323 This function returns the dimension of @var{charset}; at present, the 332 This function returns the dimension of @var{charset}; at present, the
324 dimension is always 1 or 2. 333 dimension is always 1 or 2.
325 @end defun 334 @end defun
355 @example 364 @example
356 (split-char 2248) 365 (split-char 2248)
357 @result{} (latin-iso8859-1 72) 366 @result{} (latin-iso8859-1 72)
358 (split-char 65) 367 (split-char 65)
359 @result{} (ascii 65) 368 @result{} (ascii 65)
360 @end example 369 (split-char 128)
361 370 @result{} (eight-bit-control 128)
362 Unibyte non-@sc{ascii} characters are considered as part of
363 the @code{ascii} character set:
364
365 @example
366 (split-char 192)
367 @result{} (ascii 192)
368 @end example 371 @end example
369 @end defun 372 @end defun
370 373
371 @defun make-char charset &rest byte-values 374 @defun make-char charset &rest byte-values
372 This function returns the character in character set @var{charset} 375 This function returns the character in character set @var{charset}
393 @example 396 @example
394 (make-char 'latin-iso8859-1) 397 (make-char 'latin-iso8859-1)
395 @result{} 2176 398 @result{} 2176
396 (char-valid-p 2176) 399 (char-valid-p 2176)
397 @result{} nil 400 @result{} nil
401 (char-valid-p 2176 t)
402 @result{} t
398 (split-char 2176) 403 (split-char 2176)
399 @result{} (latin-iso8859-1 0) 404 @result{} (latin-iso8859-1 0)
400 @end example 405 @end example
406
407 The character sets @sc{ascii}, @sc{eight-bit-control}, and
408 @sc{eight-bit-graphic} don't have corresponding generic characters.
401 409
402 @node Scanning Charsets 410 @node Scanning Charsets
403 @section Scanning for Character Sets 411 @section Scanning for Character Sets
404 412
405 Sometimes it is useful to find out which character sets appear in a 413 Sometimes it is useful to find out which character sets appear in a
597 However, @code{buffer-file-coding-system} does not affect sending text 605 However, @code{buffer-file-coding-system} does not affect sending text
598 to a subprocess. 606 to a subprocess.
599 @end defvar 607 @end defvar
600 608
601 @defvar save-buffer-coding-system 609 @defvar save-buffer-coding-system
602 This variable specifies the coding system for saving the buffer---but it 610 This variable specifies the coding system for saving the buffer (by
603 is not used for @code{write-region}. 611 overriding @code{buffer-file-coding-system}). Note that it is not used
612 for @code{write-region}.
604 613
605 When a command to save the buffer starts out to use 614 When a command to save the buffer starts out to use
606 @code{save-buffer-coding-system}, and that coding system cannot handle 615 @code{buffer-file-coding-system} (or @code{save-buffer-coding-system}),
616 and that coding system cannot handle
607 the actual text in the buffer, the command asks the user to choose 617 the actual text in the buffer, the command asks the user to choose
608 another coding system. After that happens, the command also updates 618 another coding system. After that happens, the command also updates
609 @code{save-buffer-coding-system} to represent the coding system that the 619 @code{buffer-file-coding-system} to represent the coding system that the
610 user specified. 620 user specified.
611 @end defvar 621 @end defvar
612 622
613 @defvar last-coding-system-used 623 @defvar last-coding-system-used
614 I/O operations for files and subprocesses set this variable to the 624 I/O operations for files and subprocesses set this variable to the
630 Here are the Lisp facilities for working with coding systems: 640 Here are the Lisp facilities for working with coding systems:
631 641
632 @defun coding-system-list &optional base-only 642 @defun coding-system-list &optional base-only
633 This function returns a list of all coding system names (symbols). If 643 This function returns a list of all coding system names (symbols). If
634 @var{base-only} is non-@code{nil}, the value includes only the 644 @var{base-only} is non-@code{nil}, the value includes only the
635 base coding systems. Otherwise, it includes variant coding systems as well. 645 base coding systems. Otherwise, it includes alias and variant coding
646 systems as well.
636 @end defun 647 @end defun
637 648
638 @defun coding-system-p object 649 @defun coding-system-p object
639 This function returns @code{t} if @var{object} is a coding system 650 This function returns @code{t} if @var{object} is a coding system
640 name. 651 name.