Mercurial > emacs
comparison lispref/nonascii.texi @ 32523:4881cd839f12
*** empty log message ***
author | Gerd Moellmann <gerd@gnu.org> |
---|---|
date | Mon, 16 Oct 2000 11:43:01 +0000 |
parents | d831c2ad9313 |
children | 67b6bdbd95c6 |
comparison
equal
deleted
inserted
replaced
32522:fedf4de246a1 | 32523:4881cd839f12 |
---|---|
58 @dfn{leading codes}. The second and subsequent bytes of a multibyte | 58 @dfn{leading codes}. The second and subsequent bytes of a multibyte |
59 character are always in the range 160 through 255 (octal 0240 through | 59 character are always in the range 160 through 255 (octal 0240 through |
60 0377); these values are @dfn{trailing codes}. | 60 0377); these values are @dfn{trailing codes}. |
61 | 61 |
62 Some sequences of bytes are not valid in multibyte text: for example, | 62 Some sequences of bytes are not valid in multibyte text: for example, |
63 a single isolated byte in the range 128 through 159 is not allowed. | 63 a single isolated byte in the range 128 through 159 is not allowed. But |
64 But character codes 128 through 159 can appear in multibyte text, | 64 character codes 128 through 159 can appear in multibyte text, |
65 represented as two-byte sequences. None of the character codes 128 | 65 represented as two-byte sequences. All the character codes 128 through |
66 through 255 normally appear in ordinary multibyte text, but they do | 66 255 are possible (though slightly abnormal) in multibyte text; they |
67 appear in multibyte buffers and strings when you do explicit encoding | 67 appear in multibyte buffers and strings when you do explicit encoding |
68 and decoding (@pxref{Explicit Encoding}). | 68 and decoding (@pxref{Explicit Encoding}). |
69 | 69 |
70 In a buffer, the buffer-local value of the variable | 70 In a buffer, the buffer-local value of the variable |
71 @code{enable-multibyte-characters} specifies the representation used. | 71 @code{enable-multibyte-characters} specifies the representation used. |
133 alternative, to convert the buffer contents to multibyte, is not | 133 alternative, to convert the buffer contents to multibyte, is not |
134 acceptable because the buffer's representation is a choice made by the | 134 acceptable because the buffer's representation is a choice made by the |
135 user that cannot be overridden automatically. | 135 user that cannot be overridden automatically. |
136 | 136 |
137 Converting unibyte text to multibyte text leaves @sc{ascii} characters | 137 Converting unibyte text to multibyte text leaves @sc{ascii} characters |
138 unchanged, and likewise 128 through 159. It converts the non-@sc{ascii} | 138 unchanged, and likewise character codes 128 through 159. It converts |
139 codes 160 through 255 by adding the value @code{nonascii-insert-offset} | 139 the non-@sc{ascii} codes 160 through 255 by adding the value |
140 to each character code. By setting this variable, you specify which | 140 @code{nonascii-insert-offset} to each character code. By setting this |
141 character set the unibyte characters correspond to (@pxref{Character | 141 variable, you specify which character set the unibyte characters |
142 Sets}). For example, if @code{nonascii-insert-offset} is 2048, which is | 142 correspond to (@pxref{Character Sets}). For example, if |
143 @code{(- (make-char 'latin-iso8859-1) 128)}, then the unibyte | 143 @code{nonascii-insert-offset} is 2048, which is @code{(- (make-char |
144 non-@sc{ascii} characters correspond to Latin 1. If it is 2688, which | 144 'latin-iso8859-1) 128)}, then the unibyte non-@sc{ascii} characters |
145 is @code{(- (make-char 'greek-iso8859-7) 128)}, then they correspond to | 145 correspond to Latin 1. If it is 2688, which is @code{(- (make-char |
146 Greek letters. | 146 'greek-iso8859-7) 128)}, then they correspond to Greek letters. |
147 | 147 |
148 Converting multibyte text to unibyte is simpler: it discards all but | 148 Converting multibyte text to unibyte is simpler: it discards all but |
149 the low 8 bits of each character code. If @code{nonascii-insert-offset} | 149 the low 8 bits of each character code. If @code{nonascii-insert-offset} |
150 has a reasonable value, corresponding to the beginning of some character | 150 has a reasonable value, corresponding to the beginning of some character |
151 set, this conversion is the inverse of the other: converting unibyte | 151 set, this conversion is the inverse of the other: converting unibyte |
240 The unibyte and multibyte text representations use different character | 240 The unibyte and multibyte text representations use different character |
241 codes. The valid character codes for unibyte representation range from | 241 codes. The valid character codes for unibyte representation range from |
242 0 to 255---the values that can fit in one byte. The valid character | 242 0 to 255---the values that can fit in one byte. The valid character |
243 codes for multibyte representation range from 0 to 524287, but not all | 243 codes for multibyte representation range from 0 to 524287, but not all |
244 values in that range are valid. The values 128 through 255 are not | 244 values in that range are valid. The values 128 through 255 are not |
245 really proper in multibyte text, but they can occur if you do explicit | 245 entirely proper in multibyte text, but they can occur if you do explicit |
246 encoding and decoding (@pxref{Explicit Encoding}). Some other character | 246 encoding and decoding (@pxref{Explicit Encoding}). Some other character |
247 codes cannot occur at all in multibyte text. Only the @sc{ascii} codes | 247 codes cannot occur at all in multibyte text. Only the @sc{ascii} codes |
248 0 through 127 are truly legitimate in both representations. | 248 0 through 127 are completely legitimate in both representations. |
249 | 249 |
250 @defun char-valid-p charcode &optional genericp | 250 @defun char-valid-p charcode &optional genericp |
251 This returns @code{t} if @var{charcode} is valid for either one of the two | 251 This returns @code{t} if @var{charcode} is valid for either one of the two |
252 text representations. | 252 text representations. |
253 | 253 |