comparison lispref/nonascii.texi @ 32523:4881cd839f12

*** empty log message ***
author Gerd Moellmann <gerd@gnu.org>
date Mon, 16 Oct 2000 11:43:01 +0000
parents d831c2ad9313
children 67b6bdbd95c6
comparison
equal deleted inserted replaced
32522:fedf4de246a1 32523:4881cd839f12
58 @dfn{leading codes}. The second and subsequent bytes of a multibyte 58 @dfn{leading codes}. The second and subsequent bytes of a multibyte
59 character are always in the range 160 through 255 (octal 0240 through 59 character are always in the range 160 through 255 (octal 0240 through
60 0377); these values are @dfn{trailing codes}. 60 0377); these values are @dfn{trailing codes}.
61 61
62 Some sequences of bytes are not valid in multibyte text: for example, 62 Some sequences of bytes are not valid in multibyte text: for example,
63 a single isolated byte in the range 128 through 159 is not allowed. 63 a single isolated byte in the range 128 through 159 is not allowed. But
64 But character codes 128 through 159 can appear in multibyte text, 64 character codes 128 through 159 can appear in multibyte text,
65 represented as two-byte sequences. None of the character codes 128 65 represented as two-byte sequences. All the character codes 128 through
66 through 255 normally appear in ordinary multibyte text, but they do 66 255 are possible (though slightly abnormal) in multibyte text; they
67 appear in multibyte buffers and strings when you do explicit encoding 67 appear in multibyte buffers and strings when you do explicit encoding
68 and decoding (@pxref{Explicit Encoding}). 68 and decoding (@pxref{Explicit Encoding}).
69 69
70 In a buffer, the buffer-local value of the variable 70 In a buffer, the buffer-local value of the variable
71 @code{enable-multibyte-characters} specifies the representation used. 71 @code{enable-multibyte-characters} specifies the representation used.
133 alternative, to convert the buffer contents to multibyte, is not 133 alternative, to convert the buffer contents to multibyte, is not
134 acceptable because the buffer's representation is a choice made by the 134 acceptable because the buffer's representation is a choice made by the
135 user that cannot be overridden automatically. 135 user that cannot be overridden automatically.
136 136
137 Converting unibyte text to multibyte text leaves @sc{ascii} characters 137 Converting unibyte text to multibyte text leaves @sc{ascii} characters
138 unchanged, and likewise 128 through 159. It converts the non-@sc{ascii} 138 unchanged, and likewise character codes 128 through 159. It converts
139 codes 160 through 255 by adding the value @code{nonascii-insert-offset} 139 the non-@sc{ascii} codes 160 through 255 by adding the value
140 to each character code. By setting this variable, you specify which 140 @code{nonascii-insert-offset} to each character code. By setting this
141 character set the unibyte characters correspond to (@pxref{Character 141 variable, you specify which character set the unibyte characters
142 Sets}). For example, if @code{nonascii-insert-offset} is 2048, which is 142 correspond to (@pxref{Character Sets}). For example, if
143 @code{(- (make-char 'latin-iso8859-1) 128)}, then the unibyte 143 @code{nonascii-insert-offset} is 2048, which is @code{(- (make-char
144 non-@sc{ascii} characters correspond to Latin 1. If it is 2688, which 144 'latin-iso8859-1) 128)}, then the unibyte non-@sc{ascii} characters
145 is @code{(- (make-char 'greek-iso8859-7) 128)}, then they correspond to 145 correspond to Latin 1. If it is 2688, which is @code{(- (make-char
146 Greek letters. 146 'greek-iso8859-7) 128)}, then they correspond to Greek letters.
147 147
148 Converting multibyte text to unibyte is simpler: it discards all but 148 Converting multibyte text to unibyte is simpler: it discards all but
149 the low 8 bits of each character code. If @code{nonascii-insert-offset} 149 the low 8 bits of each character code. If @code{nonascii-insert-offset}
150 has a reasonable value, corresponding to the beginning of some character 150 has a reasonable value, corresponding to the beginning of some character
151 set, this conversion is the inverse of the other: converting unibyte 151 set, this conversion is the inverse of the other: converting unibyte
240 The unibyte and multibyte text representations use different character 240 The unibyte and multibyte text representations use different character
241 codes. The valid character codes for unibyte representation range from 241 codes. The valid character codes for unibyte representation range from
242 0 to 255---the values that can fit in one byte. The valid character 242 0 to 255---the values that can fit in one byte. The valid character
243 codes for multibyte representation range from 0 to 524287, but not all 243 codes for multibyte representation range from 0 to 524287, but not all
244 values in that range are valid. The values 128 through 255 are not 244 values in that range are valid. The values 128 through 255 are not
245 really proper in multibyte text, but they can occur if you do explicit 245 entirely proper in multibyte text, but they can occur if you do explicit
246 encoding and decoding (@pxref{Explicit Encoding}). Some other character 246 encoding and decoding (@pxref{Explicit Encoding}). Some other character
247 codes cannot occur at all in multibyte text. Only the @sc{ascii} codes 247 codes cannot occur at all in multibyte text. Only the @sc{ascii} codes
248 0 through 127 are truly legitimate in both representations. 248 0 through 127 are completely legitimate in both representations.
249 249
250 @defun char-valid-p charcode &optional genericp 250 @defun char-valid-p charcode &optional genericp
251 This returns @code{t} if @var{charcode} is valid for either one of the two 251 This returns @code{t} if @var{charcode} is valid for either one of the two
252 text representations. 252 text representations.
253 253