Mercurial > emacs
comparison lispref/strings.texi @ 88155:d7ddb3e565de
sync with trunk
author | Henrik Enberg <henrik.enberg@telia.com> |
---|---|
date | Mon, 16 Jan 2006 00:03:54 +0000 |
parents | 23a1cea22d13 |
children |
comparison
equal
deleted
inserted
replaced
88154:8ce476d3ba36 | 88155:d7ddb3e565de |
---|---|
1 @c -*-texinfo-*- | 1 @c -*-texinfo-*- |
2 @c This is part of the GNU Emacs Lisp Reference Manual. | 2 @c This is part of the GNU Emacs Lisp Reference Manual. |
3 @c Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995, 1998, 1999 | 3 @c Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995, 1998, 1999, 2002, 2003, |
4 @c Free Software Foundation, Inc. | 4 @c 2004, 2005 Free Software Foundation, Inc. |
5 @c See the file elisp.texi for copying conditions. | 5 @c See the file elisp.texi for copying conditions. |
6 @setfilename ../info/strings | 6 @setfilename ../info/strings |
7 @node Strings and Characters, Lists, Numbers, Top | 7 @node Strings and Characters, Lists, Numbers, Top |
8 @comment node-name, next, previous, up | 8 @comment node-name, next, previous, up |
9 @chapter Strings and Characters | 9 @chapter Strings and Characters |
42 used. Thus, strings really contain integers. | 42 used. Thus, strings really contain integers. |
43 | 43 |
44 The length of a string (like any array) is fixed, and cannot be | 44 The length of a string (like any array) is fixed, and cannot be |
45 altered once the string exists. Strings in Lisp are @emph{not} | 45 altered once the string exists. Strings in Lisp are @emph{not} |
46 terminated by a distinguished character code. (By contrast, strings in | 46 terminated by a distinguished character code. (By contrast, strings in |
47 C are terminated by a character with @sc{ascii} code 0.) | 47 C are terminated by a character with @acronym{ASCII} code 0.) |
48 | 48 |
49 Since strings are arrays, and therefore sequences as well, you can | 49 Since strings are arrays, and therefore sequences as well, you can |
50 operate on them with the general array and sequence functions. | 50 operate on them with the general array and sequence functions. |
51 (@xref{Sequences Arrays Vectors}.) For example, you can access or | 51 (@xref{Sequences Arrays Vectors}.) For example, you can access or |
52 change individual characters in a string using the functions @code{aref} | 52 change individual characters in a string using the functions @code{aref} |
53 and @code{aset} (@pxref{Array Functions}). | 53 and @code{aset} (@pxref{Array Functions}). |
54 | 54 |
55 There are two text representations for non-@sc{ascii} characters in | 55 There are two text representations for non-@acronym{ASCII} characters in |
56 Emacs strings (and in buffers): unibyte and multibyte (@pxref{Text | 56 Emacs strings (and in buffers): unibyte and multibyte (@pxref{Text |
57 Representations}). An @sc{ascii} character always occupies one byte in a | 57 Representations}). An @acronym{ASCII} character always occupies one byte in a |
58 string; in fact, when a string is all @sc{ascii}, there is no real | 58 string; in fact, when a string is all @acronym{ASCII}, there is no real |
59 difference between the unibyte and multibyte representations. | 59 difference between the unibyte and multibyte representations. |
60 For most Lisp programming, you don't need to be concerned with these two | 60 For most Lisp programming, you don't need to be concerned with these two |
61 representations. | 61 representations. |
62 | 62 |
63 Sometimes key sequences are represented as strings. When a string is | 63 Sometimes key sequences are represented as strings. When a string is |
64 a key sequence, string elements in the range 128 to 255 represent meta | 64 a key sequence, string elements in the range 128 to 255 represent meta |
65 characters (which are large integers) rather than character | 65 characters (which are large integers) rather than character |
66 codes in the range 128 to 255. | 66 codes in the range 128 to 255. |
67 | 67 |
68 Strings cannot hold characters that have the hyper, super or alt | 68 Strings cannot hold characters that have the hyper, super or alt |
69 modifiers; they can hold @sc{ascii} control characters, but no other | 69 modifiers; they can hold @acronym{ASCII} control characters, but no other |
70 control characters. They do not distinguish case in @sc{ascii} control | 70 control characters. They do not distinguish case in @acronym{ASCII} control |
71 characters. If you want to store such characters in a sequence, such as | 71 characters. If you want to store such characters in a sequence, such as |
72 a key sequence, you must use a vector instead of a string. | 72 a key sequence, you must use a vector instead of a string. |
73 @xref{Character Type}, for more information about the representation of meta | 73 @xref{Character Type}, for more information about the representation of meta |
74 and other modifiers for keyboard input characters. | 74 and other modifiers for keyboard input characters. |
75 | 75 |
76 Strings are useful for holding regular expressions. You can also | 76 Strings are useful for holding regular expressions. You can also |
77 match regular expressions against strings (@pxref{Regexp Search}). The | 77 match regular expressions against strings with @code{string-match} |
78 functions @code{match-string} (@pxref{Simple Match Data}) and | 78 (@pxref{Regexp Search}). The functions @code{match-string} |
79 @code{replace-match} (@pxref{Replacing Match}) are useful for | 79 (@pxref{Simple Match Data}) and @code{replace-match} (@pxref{Replacing |
80 decomposing and modifying strings based on regular expression matching. | 80 Match}) are useful for decomposing and modifying strings after |
81 matching regular expressions against them. | |
81 | 82 |
82 Like a buffer, a string can contain text properties for the characters | 83 Like a buffer, a string can contain text properties for the characters |
83 in it, as well as the characters themselves. @xref{Text Properties}. | 84 in it, as well as the characters themselves. @xref{Text Properties}. |
84 All the Lisp primitives that copy text from strings to buffers or other | 85 All the Lisp primitives that copy text from strings to buffers or other |
85 strings also copy the properties of the characters being copied. | 86 strings also copy the properties of the characters being copied. |
170 @noindent | 171 @noindent |
171 In this example, the index for @samp{e} is @minus{}3, the index for | 172 In this example, the index for @samp{e} is @minus{}3, the index for |
172 @samp{f} is @minus{}2, and the index for @samp{g} is @minus{}1. | 173 @samp{f} is @minus{}2, and the index for @samp{g} is @minus{}1. |
173 Therefore, @samp{e} and @samp{f} are included, and @samp{g} is excluded. | 174 Therefore, @samp{e} and @samp{f} are included, and @samp{g} is excluded. |
174 | 175 |
175 When @code{nil} is used as an index, it stands for the length of the | 176 When @code{nil} is used for @var{end}, it stands for the length of the |
176 string. Thus, | 177 string. Thus, |
177 | 178 |
178 @example | 179 @example |
179 @group | 180 @group |
180 (substring "abcdefg" -3 nil) | 181 (substring "abcdefg" -3 nil) |
206 @example | 207 @example |
207 (substring [a b (c) "d"] 1 3) | 208 (substring [a b (c) "d"] 1 3) |
208 @result{} [b (c)] | 209 @result{} [b (c)] |
209 @end example | 210 @end example |
210 | 211 |
211 A @code{wrong-type-argument} error is signaled if either @var{start} or | 212 A @code{wrong-type-argument} error is signaled if @var{start} is not |
212 @var{end} is not an integer or @code{nil}. An @code{args-out-of-range} | 213 an integer or if @var{end} is neither an integer nor @code{nil}. An |
213 error is signaled if @var{start} indicates a character following | 214 @code{args-out-of-range} error is signaled if @var{start} indicates a |
214 @var{end}, or if either integer is out of range for @var{string}. | 215 character following @var{end}, or if either integer is out of range |
216 for @var{string}. | |
215 | 217 |
216 Contrast this function with @code{buffer-substring} (@pxref{Buffer | 218 Contrast this function with @code{buffer-substring} (@pxref{Buffer |
217 Contents}), which returns a string containing a portion of the text in | 219 Contents}), which returns a string containing a portion of the text in |
218 the current buffer. The beginning of a string is at index 0, but the | 220 the current buffer. The beginning of a string is at index 0, but the |
219 beginning of a buffer is at index 1. | 221 beginning of a buffer is at index 1. |
222 @end defun | |
223 | |
224 @defun substring-no-properties string &optional start end | |
225 This works like @code{substring} but discards all text properties from | |
226 the value. Also, @var{start} may be omitted or @code{nil}, which is | |
227 equivalent to 0. Thus, @w{@code{(substring-no-properties | |
228 @var{string})}} returns a copy of @var{string}, with all text | |
229 properties removed. | |
220 @end defun | 230 @end defun |
221 | 231 |
222 @defun concat &rest sequences | 232 @defun concat &rest sequences |
223 @cindex copying strings | 233 @cindex copying strings |
224 @cindex concatenating strings | 234 @cindex concatenating strings |
253 printed form is with @code{format} (@pxref{Formatting Strings}) or | 263 printed form is with @code{format} (@pxref{Formatting Strings}) or |
254 @code{number-to-string} (@pxref{String Conversion}). | 264 @code{number-to-string} (@pxref{String Conversion}). |
255 | 265 |
256 For information about other concatenation functions, see the | 266 For information about other concatenation functions, see the |
257 description of @code{mapconcat} in @ref{Mapping Functions}, | 267 description of @code{mapconcat} in @ref{Mapping Functions}, |
258 @code{vconcat} in @ref{Vectors}, and @code{append} in @ref{Building | 268 @code{vconcat} in @ref{Vector Functions}, and @code{append} in @ref{Building |
259 Lists}. | 269 Lists}. |
260 @end defun | 270 @end defun |
261 | 271 |
262 @defun split-string string separators | 272 @defun split-string string &optional separators omit-nulls |
263 This function splits @var{string} into substrings at matches for the regular | 273 This function splits @var{string} into substrings at matches for the |
264 expression @var{separators}. Each match for @var{separators} defines a | 274 regular expression @var{separators}. Each match for @var{separators} |
265 splitting point; the substrings between the splitting points are made | 275 defines a splitting point; the substrings between the splitting points |
266 into a list, which is the value returned by @code{split-string}. | 276 are made into a list, which is the value returned by |
277 @code{split-string}. | |
278 | |
279 If @var{omit-nulls} is @code{nil}, the result contains null strings | |
280 whenever there are two consecutive matches for @var{separators}, or a | |
281 match is adjacent to the beginning or end of @var{string}. If | |
282 @var{omit-nulls} is @code{t}, these null strings are omitted from the | |
283 result list. | |
284 | |
267 If @var{separators} is @code{nil} (or omitted), | 285 If @var{separators} is @code{nil} (or omitted), |
268 the default is @code{"[ \f\t\n\r\v]+"}. | 286 the default is the value of @code{split-string-default-separators}. |
269 | 287 |
270 For example, | 288 As a special case, when @var{separators} is @code{nil} (or omitted), |
289 null strings are always omitted from the result. Thus: | |
290 | |
291 @example | |
292 (split-string " two words ") | |
293 @result{} ("two" "words") | |
294 @end example | |
295 | |
296 The result is not @samp{("" "two" "words" "")}, which would rarely be | |
297 useful. If you need such a result, use an explicit value for | |
298 @var{separators}: | |
299 | |
300 @example | |
301 (split-string " two words " | |
302 split-string-default-separators) | |
303 @result{} ("" "two" "words" "") | |
304 @end example | |
305 | |
306 More examples: | |
271 | 307 |
272 @example | 308 @example |
273 (split-string "Soup is good food" "o") | 309 (split-string "Soup is good food" "o") |
274 @result{} ("S" "up is g" "" "d f" "" "d") | 310 @result{} ("S" "up is g" "" "d f" "" "d") |
311 (split-string "Soup is good food" "o" t) | |
312 @result{} ("S" "up is g" "d f" "d") | |
275 (split-string "Soup is good food" "o+") | 313 (split-string "Soup is good food" "o+") |
276 @result{} ("S" "up is g" "d f" "d") | 314 @result{} ("S" "up is g" "d f" "d") |
277 @end example | 315 @end example |
278 | 316 |
279 When there is a match adjacent to the beginning or end of the string, | 317 Empty matches do count, except that @code{split-string} will not look |
280 this does not cause a null string to appear at the beginning or end | 318 for a final empty match when it already reached the end of the string |
281 of the list: | 319 using a non-empty match or when @var{string} is empty: |
282 | 320 |
283 @example | 321 @example |
284 (split-string "out to moo" "o+") | 322 (split-string "aooob" "o*") |
285 @result{} ("ut t" " m") | 323 @result{} ("" "a" "" "b" "") |
286 @end example | 324 (split-string "ooaboo" "o*") |
287 | 325 @result{} ("" "" "a" "b" "") |
288 Empty matches do count, when not adjacent to another match: | 326 (split-string "" "") |
289 | 327 @result{} ("") |
290 @example | 328 @end example |
291 (split-string "Soup is good food" "o*") | 329 |
292 @result{}("S" "u" "p" " " "i" "s" " " "g" "d" " " "f" "d") | 330 However, when @var{separators} can match the empty string, |
293 (split-string "Nice doggy!" "") | 331 @var{omit-nulls} is usually @code{t}, so that the subtleties in the |
294 @result{}("N" "i" "c" "e" " " "d" "o" "g" "g" "y" "!") | 332 three previous examples are rarely relevant: |
295 @end example | 333 |
296 @end defun | 334 @example |
335 (split-string "Soup is good food" "o*" t) | |
336 @result{} ("S" "u" "p" " " "i" "s" " " "g" "d" " " "f" "d") | |
337 (split-string "Nice doggy!" "" t) | |
338 @result{} ("N" "i" "c" "e" " " "d" "o" "g" "g" "y" "!") | |
339 (split-string "" "" t) | |
340 @result{} nil | |
341 @end example | |
342 | |
343 Somewhat odd, but predictable, behavior can occur for certain | |
344 ``non-greedy'' values of @var{separators} that can prefer empty | |
345 matches over non-empty matches. Again, such values rarely occur in | |
346 practice: | |
347 | |
348 @example | |
349 (split-string "ooo" "o*" t) | |
350 @result{} nil | |
351 (split-string "ooo" "\\|o+" t) | |
352 @result{} ("o" "o" "o") | |
353 @end example | |
354 @end defun | |
355 | |
356 @defvar split-string-default-separators | |
357 The default value of @var{separators} for @code{split-string}. Its | |
358 usual value is @w{@samp{"[ \f\t\n\r\v]+"}}. | |
359 @end defvar | |
297 | 360 |
298 @node Modifying Strings | 361 @node Modifying Strings |
299 @section Modifying Strings | 362 @section Modifying Strings |
300 | 363 |
301 The most basic way to alter the contents of an existing string is with | 364 The most basic way to alter the contents of an existing string is with |
316 an error if @var{obj} doesn't fit within @var{string}'s actual length, | 379 an error if @var{obj} doesn't fit within @var{string}'s actual length, |
317 or if any new character requires a different number of bytes from the | 380 or if any new character requires a different number of bytes from the |
318 character currently present at that point in @var{string}. | 381 character currently present at that point in @var{string}. |
319 @end defun | 382 @end defun |
320 | 383 |
384 To clear out a string that contained a password, use | |
385 @code{clear-string}: | |
386 | |
387 @defun clear-string string | |
388 This clears the contents of @var{string} to zeros. | |
389 It may also change @var{string}'s length and convert it to | |
390 a unibyte string. | |
391 @end defun | |
392 | |
321 @need 2000 | 393 @need 2000 |
322 @node Text Comparison | 394 @node Text Comparison |
323 @section Comparison of Characters and Strings | 395 @section Comparison of Characters and Strings |
324 @cindex string equality | 396 @cindex string equality |
325 | 397 |
337 @end example | 409 @end example |
338 @end defun | 410 @end defun |
339 | 411 |
340 @defun string= string1 string2 | 412 @defun string= string1 string2 |
341 This function returns @code{t} if the characters of the two strings | 413 This function returns @code{t} if the characters of the two strings |
342 match exactly. | 414 match exactly. Symbols are also allowed as arguments, in which case |
415 their print names are used. | |
343 Case is always significant, regardless of @code{case-fold-search}. | 416 Case is always significant, regardless of @code{case-fold-search}. |
344 | 417 |
345 @example | 418 @example |
346 (string= "abc" "abc") | 419 (string= "abc" "abc") |
347 @result{} t | 420 @result{} t |
353 | 426 |
354 The function @code{string=} ignores the text properties of the two | 427 The function @code{string=} ignores the text properties of the two |
355 strings. When @code{equal} (@pxref{Equality Predicates}) compares two | 428 strings. When @code{equal} (@pxref{Equality Predicates}) compares two |
356 strings, it uses @code{string=}. | 429 strings, it uses @code{string=}. |
357 | 430 |
358 If the strings contain non-@sc{ascii} characters, and one is unibyte | 431 For technical reasons, a unibyte and a multibyte string are |
359 while the other is multibyte, then they cannot be equal. @xref{Text | 432 @code{equal} if and only if they contain the same sequence of |
433 character codes and all these codes are either in the range 0 through | |
434 127 (@acronym{ASCII}) or 160 through 255 (@code{eight-bit-graphic}). | |
435 However, when a unibyte string gets converted to a multibyte string, | |
436 all characters with codes in the range 160 through 255 get converted | |
437 to characters with higher codes, whereas @acronym{ASCII} characters | |
438 remain unchanged. Thus, a unibyte string and its conversion to | |
439 multibyte are only @code{equal} if the string is all @acronym{ASCII}. | |
440 Character codes 160 through 255 are not entirely proper in multibyte | |
441 text, even though they can occur. As a consequence, the situation | |
442 where a unibyte and a multibyte string are @code{equal} without both | |
443 being all @acronym{ASCII} is a technical oddity that very few Emacs | |
444 Lisp programmers ever get confronted with. @xref{Text | |
360 Representations}. | 445 Representations}. |
361 @end defun | 446 @end defun |
362 | 447 |
363 @defun string-equal string1 string2 | 448 @defun string-equal string1 string2 |
364 @code{string-equal} is another name for @code{string=}. | 449 @code{string-equal} is another name for @code{string=}. |
375 @var{string2}, then @var{string1} is greater, and this function returns | 460 @var{string2}, then @var{string1} is greater, and this function returns |
376 @code{nil}. If the two strings match entirely, the value is @code{nil}. | 461 @code{nil}. If the two strings match entirely, the value is @code{nil}. |
377 | 462 |
378 Pairs of characters are compared according to their character codes. | 463 Pairs of characters are compared according to their character codes. |
379 Keep in mind that lower case letters have higher numeric values in the | 464 Keep in mind that lower case letters have higher numeric values in the |
380 @sc{ascii} character set than their upper case counterparts; digits and | 465 @acronym{ASCII} character set than their upper case counterparts; digits and |
381 many punctuation characters have a lower numeric value than upper case | 466 many punctuation characters have a lower numeric value than upper case |
382 letters. An @sc{ascii} character is less than any non-@sc{ascii} | 467 letters. An @acronym{ASCII} character is less than any non-@acronym{ASCII} |
383 character; a unibyte non-@sc{ascii} character is always less than any | 468 character; a unibyte non-@acronym{ASCII} character is always less than any |
384 multibyte non-@sc{ascii} character (@pxref{Text Representations}). | 469 multibyte non-@acronym{ASCII} character (@pxref{Text Representations}). |
385 | 470 |
386 @example | 471 @example |
387 @group | 472 @group |
388 (string< "abc" "abd") | 473 (string< "abc" "abd") |
389 @result{} t | 474 @result{} t |
411 @result{} nil | 496 @result{} nil |
412 (string< "" "") | 497 (string< "" "") |
413 @result{} nil | 498 @result{} nil |
414 @end group | 499 @end group |
415 @end example | 500 @end example |
501 | |
502 Symbols are also allowed as arguments, in which case their print names | |
503 are used. | |
416 @end defun | 504 @end defun |
417 | 505 |
418 @defun string-lessp string1 string2 | 506 @defun string-lessp string1 string2 |
419 @code{string-lessp} is another name for @code{string<}. | 507 @code{string-lessp} is another name for @code{string<}. |
420 @end defun | 508 @end defun |
426 the end of the string). The specified part of @var{string2} runs from | 514 the end of the string). The specified part of @var{string2} runs from |
427 index @var{start2} up to index @var{end2} (@code{nil} means the end of | 515 index @var{start2} up to index @var{end2} (@code{nil} means the end of |
428 the string). | 516 the string). |
429 | 517 |
430 The strings are both converted to multibyte for the comparison | 518 The strings are both converted to multibyte for the comparison |
431 (@pxref{Text Representations}) so that a unibyte string can be equal to | 519 (@pxref{Text Representations}) so that a unibyte string and its |
432 a multibyte string. If @var{ignore-case} is non-@code{nil}, then case | 520 conversion to multibyte are always regarded as equal. If |
433 is ignored, so that upper case letters can be equal to lower case letters. | 521 @var{ignore-case} is non-@code{nil}, then case is ignored, so that |
522 upper case letters can be equal to lower case letters. | |
434 | 523 |
435 If the specified portions of the two strings match, the value is | 524 If the specified portions of the two strings match, the value is |
436 @code{t}. Otherwise, the value is an integer which indicates how many | 525 @code{t}. Otherwise, the value is an integer which indicates how many |
437 leading characters agree, and which string is less. Its absolute value | 526 leading characters agree, and which string is less. Its absolute value |
438 is one plus the number of characters that agree at the beginning of the | 527 is one plus the number of characters that agree at the beginning of the |
439 two strings. The sign is negative if @var{string1} (or its specified | 528 two strings. The sign is negative if @var{string1} (or its specified |
440 portion) is less. | 529 portion) is less. |
441 @end defun | 530 @end defun |
442 | 531 |
443 @defun assoc-ignore-case key alist | 532 @defun assoc-string key alist &optional case-fold |
444 This function works like @code{assoc}, except that @var{key} must be a | 533 This function works like @code{assoc}, except that @var{key} must be a |
445 string, and comparison is done using @code{compare-strings}, ignoring | 534 string, and comparison is done using @code{compare-strings}. If |
446 case differences. @xref{Association Lists}. | 535 @var{case-fold} is non-@code{nil}, it ignores case differences. |
447 @end defun | 536 Unlike @code{assoc}, this function can also match elements of the alist |
448 | 537 that are strings rather than conses. In particular, @var{alist} can |
449 @defun assoc-ignore-representation key alist | 538 be a list of strings rather than an actual alist. |
450 This function works like @code{assoc}, except that @var{key} must be a | 539 @xref{Association Lists}. |
451 string, and comparison is done using @code{compare-strings}. | |
452 Case differences are significant. | |
453 @end defun | 540 @end defun |
454 | 541 |
455 See also @code{compare-buffer-substrings} in @ref{Comparing Text}, for | 542 See also @code{compare-buffer-substrings} in @ref{Comparing Text}, for |
456 a way to compare text in buffers. The function @code{string-match}, | 543 a way to compare text in buffers. The function @code{string-match}, |
457 which matches a regular expression against a string, can be used | 544 which matches a regular expression against a string, can be used |
461 @comment node-name, next, previous, up | 548 @comment node-name, next, previous, up |
462 @section Conversion of Characters and Strings | 549 @section Conversion of Characters and Strings |
463 @cindex conversion of strings | 550 @cindex conversion of strings |
464 | 551 |
465 This section describes functions for conversions between characters, | 552 This section describes functions for conversions between characters, |
466 strings and integers. @code{format} and @code{prin1-to-string} | 553 strings and integers. @code{format} (@pxref{Formatting Strings}) |
554 and @code{prin1-to-string} | |
467 (@pxref{Output Functions}) can also convert Lisp objects into strings. | 555 (@pxref{Output Functions}) can also convert Lisp objects into strings. |
468 @code{read-from-string} (@pxref{Input Functions}) can ``convert'' a | 556 @code{read-from-string} (@pxref{Input Functions}) can ``convert'' a |
469 string representation of a Lisp object into an object. The functions | 557 string representation of a Lisp object into an object. The functions |
470 @code{string-make-multibyte} and @code{string-make-unibyte} convert the | 558 @code{string-make-multibyte} and @code{string-make-unibyte} convert the |
471 text representation of a string (@pxref{Converting Representations}). | 559 text representation of a string (@pxref{Converting Representations}). |
484 | 572 |
485 @defun string-to-char string | 573 @defun string-to-char string |
486 @cindex string to character | 574 @cindex string to character |
487 This function returns the first character in @var{string}. If the | 575 This function returns the first character in @var{string}. If the |
488 string is empty, the function returns 0. The value is also 0 when the | 576 string is empty, the function returns 0. The value is also 0 when the |
489 first character of @var{string} is the null character, @sc{ascii} code | 577 first character of @var{string} is the null character, @acronym{ASCII} code |
490 0. | 578 0. |
491 | 579 |
492 @example | 580 @example |
493 (string-to-char "ABC") | 581 (string-to-char "ABC") |
494 @result{} 65 | 582 @result{} 65 |
515 negative. | 603 negative. |
516 | 604 |
517 @example | 605 @example |
518 (number-to-string 256) | 606 (number-to-string 256) |
519 @result{} "256" | 607 @result{} "256" |
608 @group | |
520 (number-to-string -23) | 609 (number-to-string -23) |
521 @result{} "-23" | 610 @result{} "-23" |
611 @end group | |
522 (number-to-string -23.5) | 612 (number-to-string -23.5) |
523 @result{} "-23.5" | 613 @result{} "-23.5" |
524 @end example | 614 @end example |
525 | 615 |
526 @cindex int-to-string | 616 @cindex int-to-string |
530 @end defun | 620 @end defun |
531 | 621 |
532 @defun string-to-number string &optional base | 622 @defun string-to-number string &optional base |
533 @cindex string to number | 623 @cindex string to number |
534 This function returns the numeric value of the characters in | 624 This function returns the numeric value of the characters in |
535 @var{string}. If @var{base} is non-@code{nil}, integers are converted | 625 @var{string}. If @var{base} is non-@code{nil}, it must be an integer |
536 in that base. If @var{base} is @code{nil}, then base ten is used. | 626 between 2 and 16 (inclusive), and integers are converted in that base. |
537 Floating point conversion always uses base ten; we have not implemented | 627 If @var{base} is @code{nil}, then base ten is used. Floating point |
538 other radices for floating point numbers, because that would be much | 628 conversion only works in base ten; we have not implemented other |
539 more work and does not seem useful. If @var{string} looks like an | 629 radices for floating point numbers, because that would be much more |
540 integer but its value is too large to fit into a Lisp integer, | 630 work and does not seem useful. If @var{string} looks like an integer |
631 but its value is too large to fit into a Lisp integer, | |
541 @code{string-to-number} returns a floating point result. | 632 @code{string-to-number} returns a floating point result. |
542 | 633 |
543 The parsing skips spaces and tabs at the beginning of @var{string}, then | 634 The parsing skips spaces and tabs at the beginning of @var{string}, |
544 reads as much of @var{string} as it can interpret as a number. (On some | 635 then reads as much of @var{string} as it can interpret as a number in |
545 systems it ignores other whitespace at the beginning, not just spaces | 636 the given base. (On some systems it ignores other whitespace at the |
546 and tabs.) If the first character after the ignored whitespace is | 637 beginning, not just spaces and tabs.) If the first character after |
547 neither a digit, nor a plus or minus sign, nor the leading dot of a | 638 the ignored whitespace is neither a digit in the given base, nor a |
548 floating point number, this function returns 0. | 639 plus or minus sign, nor the leading dot of a floating point number, |
640 this function returns 0. | |
549 | 641 |
550 @example | 642 @example |
551 (string-to-number "256") | 643 (string-to-number "256") |
552 @result{} 256 | 644 @result{} 256 |
553 (string-to-number "25 is a perfect square.") | 645 (string-to-number "25 is a perfect square.") |
600 @var{string} and then replacing any format specification | 692 @var{string} and then replacing any format specification |
601 in the copy with encodings of the corresponding @var{objects}. The | 693 in the copy with encodings of the corresponding @var{objects}. The |
602 arguments @var{objects} are the computed values to be formatted. | 694 arguments @var{objects} are the computed values to be formatted. |
603 | 695 |
604 The characters in @var{string}, other than the format specifications, | 696 The characters in @var{string}, other than the format specifications, |
605 are copied directly into the output; starting in Emacs 21, if they have | 697 are copied directly into the output; if they have text properties, |
606 text properties, these are copied into the output also. | 698 these are copied into the output also. |
607 @end defun | 699 @end defun |
608 | 700 |
609 @cindex @samp{%} in format | 701 @cindex @samp{%} in format |
610 @cindex format specification | 702 @cindex format specification |
611 A format specification is a sequence of characters beginning with a | 703 A format specification is a sequence of characters beginning with a |
624 If @var{string} contains more than one format specification, the | 716 If @var{string} contains more than one format specification, the |
625 format specifications correspond to successive values from | 717 format specifications correspond to successive values from |
626 @var{objects}. Thus, the first format specification in @var{string} | 718 @var{objects}. Thus, the first format specification in @var{string} |
627 uses the first such value, the second format specification uses the | 719 uses the first such value, the second format specification uses the |
628 second such value, and so on. Any extra format specifications (those | 720 second such value, and so on. Any extra format specifications (those |
629 for which there are no corresponding values) cause unpredictable | 721 for which there are no corresponding values) cause an error. Any |
630 behavior. Any extra values to be formatted are ignored. | 722 extra values to be formatted are ignored. |
631 | 723 |
632 Certain format specifications require values of particular types. If | 724 Certain format specifications require values of particular types. If |
633 you supply a value that doesn't fit the requirements, an error is | 725 you supply a value that doesn't fit the requirements, an error is |
634 signaled. | 726 signaled. |
635 | 727 |
641 made without quoting (that is, using @code{princ}, not | 733 made without quoting (that is, using @code{princ}, not |
642 @code{prin1}---@pxref{Output Functions}). Thus, strings are represented | 734 @code{prin1}---@pxref{Output Functions}). Thus, strings are represented |
643 by their contents alone, with no @samp{"} characters, and symbols appear | 735 by their contents alone, with no @samp{"} characters, and symbols appear |
644 without @samp{\} characters. | 736 without @samp{\} characters. |
645 | 737 |
646 Starting in Emacs 21, if the object is a string, its text properties are | 738 If the object is a string, its text properties are |
647 copied into the output. The text properties of the @samp{%s} itself | 739 copied into the output. The text properties of the @samp{%s} itself |
648 are also copied, but those of the object take priority. | 740 are also copied, but those of the object take priority. |
649 | |
650 If there is no corresponding object, the empty string is used. | |
651 | 741 |
652 @item %S | 742 @item %S |
653 Replace the specification with the printed representation of the object, | 743 Replace the specification with the printed representation of the object, |
654 made with quoting (that is, using @code{prin1}---@pxref{Output | 744 made with quoting (that is, using @code{prin1}---@pxref{Output |
655 Functions}). Thus, strings are enclosed in @samp{"} characters, and | 745 Functions}). Thus, strings are enclosed in @samp{"} characters, and |
656 @samp{\} characters appear where necessary before special characters. | 746 @samp{\} characters appear where necessary before special characters. |
657 | 747 |
658 If there is no corresponding object, the empty string is used. | |
659 | |
660 @item %o | 748 @item %o |
661 @cindex integer to octal | 749 @cindex integer to octal |
662 Replace the specification with the base-eight representation of an | 750 Replace the specification with the base-eight representation of an |
663 integer. | 751 integer. |
664 | 752 |
712 @result{} "The octal value of 18 is 22, | 800 @result{} "The octal value of 18 is 22, |
713 and the hex value is 12." | 801 and the hex value is 12." |
714 @end group | 802 @end group |
715 @end example | 803 @end example |
716 | 804 |
717 @cindex numeric prefix | |
718 @cindex field width | 805 @cindex field width |
719 @cindex padding | 806 @cindex padding |
720 All the specification characters allow an optional numeric prefix | 807 All the specification characters allow an optional ``width'', which |
721 between the @samp{%} and the character. The optional numeric prefix | 808 is a digit-string between the @samp{%} and the character. If the |
722 defines the minimum width for the object. If the printed representation | 809 printed representation of the object contains fewer characters than |
723 of the object contains fewer characters than this, then it is padded. | 810 this width, then it is padded. The padding is on the left if the |
724 The padding is on the left if the prefix is positive (or starts with | 811 width is positive (or starts with zero) and on the right if the |
725 zero) and on the right if the prefix is negative. The padding character | 812 width is negative. The padding character is normally a space, but if |
726 is normally a space, but if the numeric prefix starts with a zero, zeros | 813 the width starts with a zero, zeros are used for padding. Some of |
727 are used for padding. Here are some examples of padding: | 814 these conventions are ignored for specification characters for which |
815 they do not make sense. That is, @samp{%s}, @samp{%S} and @samp{%c} | |
816 accept a width starting with 0, but still pad with @emph{spaces} on | |
817 the left. Also, @samp{%%} accepts a width, but ignores it. Here are | |
818 some examples of padding: | |
728 | 819 |
729 @example | 820 @example |
730 (format "%06d is padded on the left with zeros" 123) | 821 (format "%06d is padded on the left with zeros" 123) |
731 @result{} "000123 is padded on the left with zeros" | 822 @result{} "000123 is padded on the left with zeros" |
732 | 823 |
733 (format "%-6d is padded on the right" 123) | 824 (format "%-6d is padded on the right" 123) |
734 @result{} "123 is padded on the right" | 825 @result{} "123 is padded on the right" |
735 @end example | 826 @end example |
736 | 827 |
737 @code{format} never truncates an object's printed representation, no | 828 If the width is too small, @code{format} does not truncate the |
738 matter what width you specify. Thus, you can use a numeric prefix to | 829 object's printed representation. Thus, you can use a width to specify |
739 specify a minimum spacing between columns with no risk of losing | 830 a minimum spacing between columns with no risk of losing information. |
740 information. | |
741 | 831 |
742 In the following three examples, @samp{%7s} specifies a minimum width | 832 In the following three examples, @samp{%7s} specifies a minimum width |
743 of 7. In the first case, the string inserted in place of @samp{%7s} has | 833 of 7. In the first case, the string inserted in place of @samp{%7s} has |
744 only 3 letters, so 4 blank spaces are inserted for padding. In the | 834 only 3 letters, so 4 blank spaces are inserted for padding. In the |
745 second case, the string @code{"specification"} is 13 letters wide but is | 835 second case, the string @code{"specification"} is 13 letters wide but is |
762 (format "The word `%-7s' actually has %d letters in it." | 852 (format "The word `%-7s' actually has %d letters in it." |
763 "foo" (length "foo")) | 853 "foo" (length "foo")) |
764 @result{} "The word `foo ' actually has 3 letters in it." | 854 @result{} "The word `foo ' actually has 3 letters in it." |
765 @end group | 855 @end group |
766 @end smallexample | 856 @end smallexample |
857 | |
858 @cindex precision in format specifications | |
859 All the specification characters allow an optional ``precision'' | |
860 before the character (after the width, if present). The precision is | |
861 a decimal-point @samp{.} followed by a digit-string. For the | |
862 floating-point specifications (@samp{%e}, @samp{%f}, @samp{%g}), the | |
863 precision specifies how many decimal places to show; if zero, the | |
864 decimal-point itself is also omitted. For @samp{%s} and @samp{%S}, | |
865 the precision truncates the string to the given width, so | |
866 @samp{%.3s} shows only the first three characters of the | |
867 representation for @var{object}. Precision is ignored for other | |
868 specification characters. | |
869 | |
870 @cindex flags in format specifications | |
871 Immediately after the @samp{%} and before the optional width and | |
872 precision, you can put certain ``flag'' characters. | |
873 | |
874 A space character inserts a space for positive numbers (otherwise | |
875 nothing is inserted for positive numbers). This flag is ignored | |
876 except for @samp{%d}, @samp{%e}, @samp{%f}, @samp{%g}. | |
877 | |
878 The flag @samp{#} indicates ``alternate form''. For @samp{%o} it | |
879 ensures that the result begins with a 0. For @samp{%x} and @samp{%X} | |
880 the result is prefixed with @samp{0x} or @samp{0X}. For @samp{%e}, | |
881 @samp{%f}, and @samp{%g} a decimal point is always shown even if the | |
882 precision is zero. | |
767 | 883 |
768 @node Case Conversion | 884 @node Case Conversion |
769 @comment node-name, next, previous, up | 885 @comment node-name, next, previous, up |
770 @section Case Conversion in Lisp | 886 @section Case Conversion in Lisp |
771 @cindex upper case | 887 @cindex upper case |
774 @cindex case conversion in Lisp | 890 @cindex case conversion in Lisp |
775 | 891 |
776 The character case functions change the case of single characters or | 892 The character case functions change the case of single characters or |
777 of the contents of strings. The functions normally convert only | 893 of the contents of strings. The functions normally convert only |
778 alphabetic characters (the letters @samp{A} through @samp{Z} and | 894 alphabetic characters (the letters @samp{A} through @samp{Z} and |
779 @samp{a} through @samp{z}, as well as non-@sc{ascii} letters); other | 895 @samp{a} through @samp{z}, as well as non-@acronym{ASCII} letters); other |
780 characters are not altered. You can specify a different case | 896 characters are not altered. You can specify a different case |
781 conversion mapping by specifying a case table (@pxref{Case Tables}). | 897 conversion mapping by specifying a case table (@pxref{Case Tables}). |
782 | 898 |
783 These functions do not modify the strings that are passed to them as | 899 These functions do not modify the strings that are passed to them as |
784 arguments. | 900 arguments. |
785 | 901 |
786 The examples below use the characters @samp{X} and @samp{x} which have | 902 The examples below use the characters @samp{X} and @samp{x} which have |
787 @sc{ascii} codes 88 and 120 respectively. | 903 @acronym{ASCII} codes 88 and 120 respectively. |
788 | 904 |
789 @defun downcase string-or-char | 905 @defun downcase string-or-char |
790 This function converts a character or a string to lower case. | 906 This function converts a character or a string to lower case. |
791 | 907 |
792 When the argument to @code{downcase} is a string, the function creates | 908 When the argument to @code{downcase} is a string, the function creates |
842 | 958 |
843 When the argument to @code{capitalize} is a character, @code{capitalize} | 959 When the argument to @code{capitalize} is a character, @code{capitalize} |
844 has the same result as @code{upcase}. | 960 has the same result as @code{upcase}. |
845 | 961 |
846 @example | 962 @example |
963 @group | |
847 (capitalize "The cat in the hat") | 964 (capitalize "The cat in the hat") |
848 @result{} "The Cat In The Hat" | 965 @result{} "The Cat In The Hat" |
849 | 966 @end group |
967 | |
968 @group | |
850 (capitalize "THE 77TH-HATTED CAT") | 969 (capitalize "THE 77TH-HATTED CAT") |
851 @result{} "The 77th-Hatted Cat" | 970 @result{} "The 77th-Hatted Cat" |
971 @end group | |
852 | 972 |
853 @group | 973 @group |
854 (capitalize ?x) | 974 (capitalize ?x) |
855 @result{} 88 | 975 @result{} 88 |
856 @end group | 976 @end group |
857 @end example | 977 @end example |
858 @end defun | 978 @end defun |
859 | 979 |
860 @defun upcase-initials string | 980 @defun upcase-initials string-or-char |
861 This function capitalizes the initials of the words in @var{string}, | 981 If @var{string-or-char} is a string, this function capitalizes the |
862 without altering any letters other than the initials. It returns a new | 982 initials of the words in @var{string-or-char}, without altering any |
863 string whose contents are a copy of @var{string}, in which each word has | 983 letters other than the initials. It returns a new string whose |
984 contents are a copy of @var{string-or-char}, in which each word has | |
864 had its initial letter converted to upper case. | 985 had its initial letter converted to upper case. |
865 | 986 |
866 The definition of a word is any sequence of consecutive characters that | 987 The definition of a word is any sequence of consecutive characters that |
867 are assigned to the word constituent syntax class in the current syntax | 988 are assigned to the word constituent syntax class in the current syntax |
868 table (@pxref{Syntax Class Table}). | 989 table (@pxref{Syntax Class Table}). |
990 | |
991 When the argument to @code{upcase-initials} is a character, | |
992 @code{upcase-initials} has the same result as @code{upcase}. | |
869 | 993 |
870 @example | 994 @example |
871 @group | 995 @group |
872 (upcase-initials "The CAT in the hAt") | 996 (upcase-initials "The CAT in the hAt") |
873 @result{} "The CAT In The HAt" | 997 @result{} "The CAT In The HAt" |
919 the same canonical equivalent character. For example, since @samp{a} | 1043 the same canonical equivalent character. For example, since @samp{a} |
920 and @samp{A} are related by case-conversion, they should have the same | 1044 and @samp{A} are related by case-conversion, they should have the same |
921 canonical equivalent character (which should be either @samp{a} for both | 1045 canonical equivalent character (which should be either @samp{a} for both |
922 of them, or @samp{A} for both of them). | 1046 of them, or @samp{A} for both of them). |
923 | 1047 |
924 The extra table @var{equivalences} is a map that cyclicly permutes | 1048 The extra table @var{equivalences} is a map that cyclically permutes |
925 each equivalence class (of characters with the same canonical | 1049 each equivalence class (of characters with the same canonical |
926 equivalent). (For ordinary @sc{ascii}, this would map @samp{a} into | 1050 equivalent). (For ordinary @acronym{ASCII}, this would map @samp{a} into |
927 @samp{A} and @samp{A} into @samp{a}, and likewise for each set of | 1051 @samp{A} and @samp{A} into @samp{a}, and likewise for each set of |
928 equivalent characters.) | 1052 equivalent characters.) |
929 | 1053 |
930 When you construct a case table, you can provide @code{nil} for | 1054 When you construct a case table, you can provide @code{nil} for |
931 @var{canonicalize}; then Emacs fills in this slot from the lower case | 1055 @var{canonicalize}; then Emacs fills in this slot from the lower case |
958 @defun set-case-table table | 1082 @defun set-case-table table |
959 This sets the current buffer's case table to @var{table}. | 1083 This sets the current buffer's case table to @var{table}. |
960 @end defun | 1084 @end defun |
961 | 1085 |
962 The following three functions are convenient subroutines for packages | 1086 The following three functions are convenient subroutines for packages |
963 that define non-@sc{ascii} character sets. They modify the specified | 1087 that define non-@acronym{ASCII} character sets. They modify the specified |
964 case table @var{case-table}; they also modify the standard syntax table. | 1088 case table @var{case-table}; they also modify the standard syntax table. |
965 @xref{Syntax Tables}. Normally you would use these functions to change | 1089 @xref{Syntax Tables}. Normally you would use these functions to change |
966 the standard case table. | 1090 the standard case table. |
967 | 1091 |
968 @defun set-case-syntax-pair uc lc case-table | 1092 @defun set-case-syntax-pair uc lc case-table |
982 | 1106 |
983 @deffn Command describe-buffer-case-table | 1107 @deffn Command describe-buffer-case-table |
984 This command displays a description of the contents of the current | 1108 This command displays a description of the contents of the current |
985 buffer's case table. | 1109 buffer's case table. |
986 @end deffn | 1110 @end deffn |
1111 | |
1112 @ignore | |
1113 arch-tag: 700b8e95-7aa5-4b52-9eb3-8f2e1ea152b4 | |
1114 @end ignore |