6552
|
1 @c -*-texinfo-*-
|
|
2 @c This is part of the GNU Emacs Lisp Reference Manual.
|
|
3 @c Copyright (C) 1990, 1991, 1992, 1993, 1994 Free Software Foundation, Inc.
|
|
4 @c See the file elisp.texi for copying conditions.
|
|
5 @setfilename ../info/syntax
|
|
6 @node Syntax Tables, Abbrevs, Searching and Matching, Top
|
|
7 @chapter Syntax Tables
|
|
8 @cindex parsing
|
|
9 @cindex syntax table
|
|
10 @cindex text parsing
|
|
11
|
|
12 A @dfn{syntax table} specifies the syntactic textual function of each
|
|
13 character. This information is used by the parsing commands, the
|
|
14 complex movement commands, and others to determine where words, symbols,
|
|
15 and other syntactic constructs begin and end. The current syntax table
|
|
16 controls the meaning of the word motion functions (@pxref{Word Motion})
|
|
17 and the list motion functions (@pxref{List Motion}) as well as the
|
|
18 functions in this chapter.
|
|
19
|
|
20 @menu
|
|
21 * Basics: Syntax Basics. Basic concepts of syntax tables.
|
|
22 * Desc: Syntax Descriptors. How characters are classified.
|
|
23 * Syntax Table Functions:: How to create, examine and alter syntax tables.
|
|
24 * Motion and Syntax:: Moving over characters with certain syntaxes.
|
|
25 * Parsing Expressions:: Parsing balanced expressions
|
|
26 using the syntax table.
|
|
27 * Standard Syntax Tables:: Syntax tables used by various major modes.
|
|
28 * Syntax Table Internals:: How syntax table information is stored.
|
|
29 @end menu
|
|
30
|
|
31 @node Syntax Basics
|
|
32 @section Syntax Table Concepts
|
|
33
|
|
34 @ifinfo
|
|
35 A @dfn{syntax table} provides Emacs with the information that
|
|
36 determines the syntactic use of each character in a buffer. This
|
|
37 information is used by the parsing commands, the complex movement
|
|
38 commands, and others to determine where words, symbols, and other
|
|
39 syntactic constructs begin and end. The current syntax table controls
|
|
40 the meaning of the word motion functions (@pxref{Word Motion}) and the
|
|
41 list motion functions (@pxref{List Motion}) as well as the functions in
|
|
42 this chapter.
|
|
43 @end ifinfo
|
|
44
|
|
45 A syntax table is a vector of 256 elements; it contains one entry for
|
|
46 each of the 256 @sc{ASCII} characters of an 8-bit byte. Each element is
|
|
47 an integer that encodes the syntax of the character in question.
|
|
48
|
|
49 Syntax tables are used only for moving across text, not for the Emacs
|
|
50 Lisp reader. Emacs Lisp uses built-in syntactic rules when reading Lisp
|
|
51 expressions, and these rules cannot be changed.
|
|
52
|
|
53 Each buffer has its own major mode, and each major mode has its own
|
|
54 idea of the syntactic class of various characters. For example, in Lisp
|
|
55 mode, the character @samp{;} begins a comment, but in C mode, it
|
|
56 terminates a statement. To support these variations, Emacs makes the
|
|
57 choice of syntax table local to each buffer. Typically, each major
|
|
58 mode has its own syntax table and installs that table in each buffer
|
8469
|
59 that uses that mode. Changing this table alters the syntax in all
|
6552
|
60 those buffers as well as in any buffers subsequently put in that mode.
|
|
61 Occasionally several similar modes share one syntax table.
|
|
62 @xref{Example Major Modes}, for an example of how to set up a syntax
|
|
63 table.
|
|
64
|
|
65 A syntax table can inherit the data for some characters from the
|
|
66 standard syntax table, while specifying other characters itself. The
|
|
67 ``inherit'' syntax class means ``inherit this character's syntax from
|
|
68 the standard syntax table.'' Most major modes' syntax tables inherit
|
|
69 the syntax of character codes 0 through 31 and 128 through 255. This is
|
|
70 useful with character sets such as ISO Latin-1 that have additional
|
|
71 alphabetic characters in the range 128 to 255. Just changing the
|
|
72 standard syntax for these characters affects all major modes.
|
|
73
|
|
74 @defun syntax-table-p object
|
|
75 This function returns @code{t} if @var{object} is a vector of length 256
|
|
76 elements. This means that the vector may be a syntax table. However,
|
|
77 according to this test, any vector of length 256 is considered to be a
|
|
78 syntax table, no matter what its contents.
|
|
79 @end defun
|
|
80
|
|
81 @node Syntax Descriptors
|
|
82 @section Syntax Descriptors
|
|
83 @cindex syntax classes
|
|
84
|
|
85 This section describes the syntax classes and flags that denote the
|
|
86 syntax of a character, and how they are represented as a @dfn{syntax
|
|
87 descriptor}, which is a Lisp string that you pass to
|
|
88 @code{modify-syntax-entry} to specify the desired syntax.
|
|
89
|
|
90 Emacs defines a number of @dfn{syntax classes}. Each syntax table
|
|
91 puts each character into one class. There is no necessary relationship
|
|
92 between the class of a character in one syntax table and its class in
|
|
93 any other table.
|
|
94
|
8469
|
95 Each class is designated by a mnemonic character, which serves as the
|
6552
|
96 name of the class when you need to specify a class. Usually the
|
8469
|
97 designator character is one that is frequently in that class; however,
|
|
98 its meaning as a designator is unvarying and independent of what syntax
|
|
99 that character currently has.
|
6552
|
100
|
|
101 @cindex syntax descriptor
|
8469
|
102 A syntax descriptor is a Lisp string that specifies a syntax class, a
|
6552
|
103 matching character (used only for the parenthesis classes) and flags.
|
|
104 The first character is the designator for a syntax class. The second
|
|
105 character is the character to match; if it is unused, put a space there.
|
|
106 Then come the characters for any desired flags. If no matching
|
|
107 character or flags are needed, one character is sufficient.
|
|
108
|
|
109 For example, the descriptor for the character @samp{*} in C mode is
|
|
110 @samp{@w{. 23}} (i.e., punctuation, matching character slot unused,
|
|
111 second character of a comment-starter, first character of an
|
|
112 comment-ender), and the entry for @samp{/} is @samp{@w{. 14}} (i.e.,
|
|
113 punctuation, matching character slot unused, first character of a
|
|
114 comment-starter, second character of a comment-ender).
|
|
115
|
|
116 @menu
|
|
117 * Syntax Class Table:: Table of syntax classes.
|
|
118 * Syntax Flags:: Additional flags each character can have.
|
|
119 @end menu
|
|
120
|
|
121 @node Syntax Class Table
|
|
122 @subsection Table of Syntax Classes
|
|
123
|
8469
|
124 Here is a table of syntax classes, the characters that stand for them,
|
6552
|
125 their meanings, and examples of their use.
|
|
126
|
|
127 @deffn {Syntax class} @w{whitespace character}
|
|
128 @dfn{Whitespace characters} (designated with @w{@samp{@ }} or @samp{-})
|
|
129 separate symbols and words from each other. Typically, whitespace
|
|
130 characters have no other syntactic significance, and multiple whitespace
|
|
131 characters are syntactically equivalent to a single one. Space, tab,
|
|
132 newline and formfeed are almost always classified as whitespace.
|
|
133 @end deffn
|
|
134
|
|
135 @deffn {Syntax class} @w{word constituent}
|
|
136 @dfn{Word constituents} (designated with @samp{w}) are parts of normal
|
|
137 English words and are typically used in variable and command names in
|
8469
|
138 programs. All upper- and lower-case letters, and the digits, are typically
|
6552
|
139 word constituents.
|
|
140 @end deffn
|
|
141
|
|
142 @deffn {Syntax class} @w{symbol constituent}
|
|
143 @dfn{Symbol constituents} (designated with @samp{_}) are the extra
|
|
144 characters that are used in variable and command names along with word
|
|
145 constituents. For example, the symbol constituents class is used in
|
|
146 Lisp mode to indicate that certain characters may be part of symbol
|
|
147 names even though they are not part of English words. These characters
|
|
148 are @samp{$&*+-_<>}. In standard C, the only non-word-constituent
|
|
149 character that is valid in symbols is underscore (@samp{_}).
|
|
150 @end deffn
|
|
151
|
|
152 @deffn {Syntax class} @w{punctuation character}
|
|
153 @dfn{Punctuation characters} (@samp{.}) are those characters that are
|
|
154 used as punctuation in English, or are used in some way in a programming
|
|
155 language to separate symbols from one another. Most programming
|
|
156 language modes, including Emacs Lisp mode, have no characters in this
|
|
157 class since the few characters that are not symbol or word constituents
|
|
158 all have other uses.
|
|
159 @end deffn
|
|
160
|
|
161 @deffn {Syntax class} @w{open parenthesis character}
|
|
162 @deffnx {Syntax class} @w{close parenthesis character}
|
|
163 @cindex parenthesis syntax
|
|
164 Open and close @dfn{parenthesis characters} are characters used in
|
|
165 dissimilar pairs to surround sentences or expressions. Such a grouping
|
|
166 is begun with an open parenthesis character and terminated with a close.
|
|
167 Each open parenthesis character matches a particular close parenthesis
|
|
168 character, and vice versa. Normally, Emacs indicates momentarily the
|
|
169 matching open parenthesis when you insert a close parenthesis.
|
|
170 @xref{Blinking}.
|
|
171
|
|
172 The class of open parentheses is designated with @samp{(}, and that of
|
|
173 close parentheses with @samp{)}.
|
|
174
|
|
175 In English text, and in C code, the parenthesis pairs are @samp{()},
|
|
176 @samp{[]}, and @samp{@{@}}. In Emacs Lisp, the delimiters for lists and
|
|
177 vectors (@samp{()} and @samp{[]}) are classified as parenthesis
|
|
178 characters.
|
|
179 @end deffn
|
|
180
|
|
181 @deffn {Syntax class} @w{string quote}
|
|
182 @dfn{String quote characters} (designated with @samp{"}) are used in
|
|
183 many languages, including Lisp and C, to delimit string constants. The
|
|
184 same string quote character appears at the beginning and the end of a
|
|
185 string. Such quoted strings do not nest.
|
|
186
|
|
187 The parsing facilities of Emacs consider a string as a single token.
|
|
188 The usual syntactic meanings of the characters in the string are
|
|
189 suppressed.
|
|
190
|
|
191 The Lisp modes have two string quote characters: double-quote (@samp{"})
|
|
192 and vertical bar (@samp{|}). @samp{|} is not used in Emacs Lisp, but it
|
|
193 is used in Common Lisp. C also has two string quote characters:
|
|
194 double-quote for strings, and single-quote (@samp{'}) for character
|
|
195 constants.
|
|
196
|
|
197 English text has no string quote characters because English is not a
|
|
198 programming language. Although quotation marks are used in English,
|
|
199 we do not want them to turn off the usual syntactic properties of
|
|
200 other characters in the quotation.
|
|
201 @end deffn
|
|
202
|
|
203 @deffn {Syntax class} @w{escape}
|
|
204 An @dfn{escape character} (designated with @samp{\}) starts an escape
|
|
205 sequence such as is used in C string and character constants. The
|
|
206 character @samp{\} belongs to this class in both C and Lisp. (In C, it
|
|
207 is used thus only inside strings, but it turns out to cause no trouble
|
|
208 to treat it this way throughout C code.)
|
|
209
|
|
210 Characters in this class count as part of words if
|
|
211 @code{words-include-escapes} is non-@code{nil}. @xref{Word Motion}.
|
|
212 @end deffn
|
|
213
|
|
214 @deffn {Syntax class} @w{character quote}
|
|
215 A @dfn{character quote character} (designated with @samp{/}) quotes the
|
|
216 following character so that it loses its normal syntactic meaning. This
|
|
217 differs from an escape character in that only the character immediately
|
|
218 following is ever affected.
|
|
219
|
|
220 Characters in this class count as part of words if
|
|
221 @code{words-include-escapes} is non-@code{nil}. @xref{Word Motion}.
|
|
222
|
|
223 This class is not currently used in any standard Emacs modes.
|
|
224 @end deffn
|
|
225
|
|
226 @deffn {Syntax class} @w{paired delimiter}
|
|
227 @dfn{Paired delimiter characters} (designated with @samp{$}) are like
|
|
228 string quote characters except that the syntactic properties of the
|
|
229 characters between the delimiters are not suppressed. Only @TeX{} mode
|
8469
|
230 uses a paired delimiter presently---the @samp{$} that both enters and
|
|
231 leaves math mode.
|
6552
|
232 @end deffn
|
|
233
|
|
234 @deffn {Syntax class} @w{expression prefix}
|
|
235 An @dfn{expression prefix operator} (designated with @samp{'}) is used
|
|
236 for syntactic operators that are part of an expression if they appear
|
|
237 next to one. These characters in Lisp include the apostrophe, @samp{'}
|
|
238 (used for quoting), the comma, @samp{,} (used in macros), and @samp{#}
|
|
239 (used in the read syntax for certain data types).
|
|
240 @end deffn
|
|
241
|
|
242 @deffn {Syntax class} @w{comment starter}
|
|
243 @deffnx {Syntax class} @w{comment ender}
|
|
244 @cindex comment syntax
|
|
245 The @dfn{comment starter} and @dfn{comment ender} characters are used in
|
|
246 various languages to delimit comments. These classes are designated
|
|
247 with @samp{<} and @samp{>}, respectively.
|
|
248
|
|
249 English text has no comment characters. In Lisp, the semicolon
|
|
250 (@samp{;}) starts a comment and a newline or formfeed ends one.
|
|
251 @end deffn
|
|
252
|
|
253 @deffn {Syntax class} @w{inherit}
|
|
254 This syntax class does not specify a syntax. It says to look in the
|
|
255 standard syntax table to find the syntax of this character. The
|
|
256 designator for this syntax code is @samp{@@}.
|
|
257 @end deffn
|
|
258
|
|
259 @node Syntax Flags
|
|
260 @subsection Syntax Flags
|
|
261 @cindex syntax flags
|
|
262
|
|
263 In addition to the classes, entries for characters in a syntax table
|
|
264 can include flags. There are six possible flags, represented by the
|
|
265 characters @samp{1}, @samp{2}, @samp{3}, @samp{4}, @samp{b} and
|
|
266 @samp{p}.
|
|
267
|
|
268 All the flags except @samp{p} are used to describe multi-character
|
|
269 comment delimiters. The digit flags indicate that a character can
|
|
270 @emph{also} be part of a comment sequence, in addition to the syntactic
|
|
271 properties associated with its character class. The flags are
|
|
272 independent of the class and each other for the sake of characters such
|
|
273 as @samp{*} in C mode, which is a punctuation character, @emph{and} the
|
|
274 second character of a start-of-comment sequence (@samp{/*}), @emph{and}
|
|
275 the first character of an end-of-comment sequence (@samp{*/}).
|
|
276
|
|
277 The flags for a character @var{c} are:
|
|
278
|
|
279 @itemize @bullet
|
|
280 @item
|
8469
|
281 @samp{1} means @var{c} is the start of a two-character comment-start
|
6552
|
282 sequence.
|
|
283
|
|
284 @item
|
|
285 @samp{2} means @var{c} is the second character of such a sequence.
|
|
286
|
|
287 @item
|
8469
|
288 @samp{3} means @var{c} is the start of a two-character comment-end
|
6552
|
289 sequence.
|
|
290
|
|
291 @item
|
|
292 @samp{4} means @var{c} is the second character of such a sequence.
|
|
293
|
|
294 @item
|
|
295 @c Emacs 19 feature
|
|
296 @samp{b} means that @var{c} as a comment delimiter belongs to the
|
|
297 alternative ``b'' comment style.
|
|
298
|
|
299 Emacs supports two comment styles simultaneously in any one syntax
|
|
300 table. This is for the sake of C++. Each style of comment syntax has
|
|
301 its own comment-start sequence and its own comment-end sequence. Each
|
|
302 comment must stick to one style or the other; thus, if it starts with
|
|
303 the comment-start sequence of style ``b'', it must also end with the
|
|
304 comment-end sequence of style ``b''.
|
|
305
|
|
306 The two comment-start sequences must begin with the same character; only
|
|
307 the second character may differ. Mark the second character of the
|
8469
|
308 ``b''-style comment-start sequence with the @samp{b} flag.
|
6552
|
309
|
|
310 A comment-end sequence (one or two characters) applies to the ``b''
|
|
311 style if its first character has the @samp{b} flag set; otherwise, it
|
|
312 applies to the ``a'' style.
|
|
313
|
|
314 The appropriate comment syntax settings for C++ are as follows:
|
|
315
|
|
316 @table @asis
|
|
317 @item @samp{/}
|
|
318 @samp{124b}
|
|
319 @item @samp{*}
|
|
320 @samp{23}
|
|
321 @item newline
|
|
322 @samp{>b}
|
|
323 @end table
|
|
324
|
8469
|
325 This defines four comment-delimiting sequences:
|
|
326
|
|
327 @table @asis
|
|
328 @item @samp{/*}
|
|
329 This is a comment-start sequence for ``a'' style because the
|
|
330 second character, @samp{*}, does not have the @samp{b} flag.
|
|
331
|
|
332 @item @samp{//}
|
|
333 This is a comment-start sequence for ``b'' style because the second
|
|
334 character, @samp{/}, does have the @samp{b} flag.
|
|
335
|
|
336 @item @samp{*/}
|
|
337 This is a comment-end sequence for ``a'' style because the first
|
|
338 character, @samp{*}, does not have the @samp{b} flag
|
|
339
|
|
340 @item newline
|
|
341 This is a comment-end sequence for ``b'' style, because the newline
|
|
342 character has the @samp{b} flag.
|
|
343 @end table
|
6552
|
344
|
|
345 @item
|
|
346 @c Emacs 19 feature
|
|
347 @samp{p} identifies an additional ``prefix character'' for Lisp syntax.
|
|
348 These characters are treated as whitespace when they appear between
|
|
349 expressions. When they appear within an expression, they are handled
|
|
350 according to their usual syntax codes.
|
|
351
|
|
352 The function @code{backward-prefix-chars} moves back over these
|
|
353 characters, as well as over characters whose primary syntax class is
|
|
354 prefix (@samp{'}). @xref{Motion and Syntax}.
|
|
355 @end itemize
|
|
356
|
|
357 @node Syntax Table Functions
|
|
358 @section Syntax Table Functions
|
|
359
|
|
360 In this section we describe functions for creating, accessing and
|
|
361 altering syntax tables.
|
|
362
|
|
363 @defun make-syntax-table
|
|
364 This function creates a new syntax table. Character codes 0 through
|
8469
|
365 31 and 128 through 255 are set up to inherit from the standard syntax
|
6552
|
366 table. The other character codes are set up by copying what the
|
|
367 standard syntax table says about them.
|
|
368
|
|
369 Most major mode syntax tables are created in this way.
|
|
370 @end defun
|
|
371
|
|
372 @defun copy-syntax-table &optional table
|
|
373 This function constructs a copy of @var{table} and returns it. If
|
|
374 @var{table} is not supplied (or is @code{nil}), it returns a copy of the
|
|
375 current syntax table. Otherwise, an error is signaled if @var{table} is
|
|
376 not a syntax table.
|
|
377 @end defun
|
|
378
|
|
379 @deffn Command modify-syntax-entry char syntax-descriptor &optional table
|
|
380 This function sets the syntax entry for @var{char} according to
|
|
381 @var{syntax-descriptor}. The syntax is changed only for @var{table},
|
|
382 which defaults to the current buffer's syntax table, and not in any
|
|
383 other syntax table. The argument @var{syntax-descriptor} specifies the
|
|
384 desired syntax; this is a string beginning with a class designator
|
|
385 character, and optionally containing a matching character and flags as
|
|
386 well. @xref{Syntax Descriptors}.
|
|
387
|
|
388 This function always returns @code{nil}. The old syntax information in
|
|
389 the table for this character is discarded.
|
|
390
|
|
391 An error is signaled if the first character of the syntax descriptor is not
|
|
392 one of the twelve syntax class designator characters. An error is also
|
|
393 signaled if @var{char} is not a character.
|
|
394
|
|
395 @example
|
|
396 @group
|
|
397 @exdent @r{Examples:}
|
|
398
|
|
399 ;; @r{Put the space character in class whitespace.}
|
|
400 (modify-syntax-entry ?\ " ")
|
|
401 @result{} nil
|
|
402 @end group
|
|
403
|
|
404 @group
|
|
405 ;; @r{Make @samp{$} an open parenthesis character,}
|
|
406 ;; @r{with @samp{^} as its matching close.}
|
|
407 (modify-syntax-entry ?$ "(^")
|
|
408 @result{} nil
|
|
409 @end group
|
|
410
|
|
411 @group
|
|
412 ;; @r{Make @samp{^} a close parenthesis character,}
|
|
413 ;; @r{with @samp{$} as its matching open.}
|
|
414 (modify-syntax-entry ?^ ")$")
|
|
415 @result{} nil
|
|
416 @end group
|
|
417
|
|
418 @group
|
|
419 ;; @r{Make @samp{/} a punctuation character,}
|
|
420 ;; @r{the first character of a start-comment sequence,}
|
|
421 ;; @r{and the second character of an end-comment sequence.}
|
|
422 ;; @r{This is used in C mode.}
|
8469
|
423 (modify-syntax-entry ?/ ". 14")
|
6552
|
424 @result{} nil
|
|
425 @end group
|
|
426 @end example
|
|
427 @end deffn
|
|
428
|
|
429 @defun char-syntax character
|
|
430 This function returns the syntax class of @var{character}, represented
|
|
431 by its mnemonic designator character. This @emph{only} returns the
|
|
432 class, not any matching parenthesis or flags.
|
|
433
|
|
434 An error is signaled if @var{char} is not a character.
|
|
435
|
|
436 The following examples apply to C mode. The first example shows that
|
|
437 the syntax class of space is whitespace (represented by a space). The
|
|
438 second example shows that the syntax of @samp{/} is punctuation. This
|
8469
|
439 does not show the fact that it is also part of comment-start and -end
|
|
440 sequences. The third example shows that open parenthesis is in the class
|
6552
|
441 of open parentheses. This does not show the fact that it has a matching
|
|
442 character, @samp{)}.
|
|
443
|
|
444 @example
|
|
445 @group
|
|
446 (char-to-string (char-syntax ?\ ))
|
|
447 @result{} " "
|
|
448 @end group
|
|
449
|
|
450 @group
|
|
451 (char-to-string (char-syntax ?/))
|
|
452 @result{} "."
|
|
453 @end group
|
|
454
|
|
455 @group
|
|
456 (char-to-string (char-syntax ?\())
|
|
457 @result{} "("
|
|
458 @end group
|
|
459 @end example
|
|
460 @end defun
|
|
461
|
|
462 @defun set-syntax-table table
|
|
463 This function makes @var{table} the syntax table for the current buffer.
|
|
464 It returns @var{table}.
|
|
465 @end defun
|
|
466
|
|
467 @defun syntax-table
|
|
468 This function returns the current syntax table, which is the table for
|
|
469 the current buffer.
|
|
470 @end defun
|
|
471
|
|
472 @node Motion and Syntax
|
|
473 @section Motion and Syntax
|
|
474
|
|
475 This section describes functions for moving across characters in
|
|
476 certain syntax classes. None of these functions exists in Emacs
|
|
477 version 18 or earlier.
|
|
478
|
|
479 @defun skip-syntax-forward syntaxes &optional limit
|
|
480 This function moves point forward across characters having syntax classes
|
|
481 mentioned in @var{syntaxes}. It stops when it encounters the end of
|
8469
|
482 the buffer, or position @var{limit} (if specified), or a character it is
|
6552
|
483 not supposed to skip.
|
|
484 @ignore @c may want to change this.
|
|
485 The return value is the distance traveled, which is a nonnegative
|
|
486 integer.
|
|
487 @end ignore
|
|
488 @end defun
|
|
489
|
|
490 @defun skip-syntax-backward syntaxes &optional limit
|
|
491 This function moves point backward across characters whose syntax
|
|
492 classes are mentioned in @var{syntaxes}. It stops when it encounters
|
8469
|
493 the beginning of the buffer, or position @var{limit} (if specified), or a
|
6552
|
494 character it is not supposed to skip.
|
|
495 @ignore @c may want to change this.
|
|
496 The return value indicates the distance traveled. It is an integer that
|
|
497 is zero or less.
|
|
498 @end ignore
|
|
499 @end defun
|
|
500
|
|
501 @defun backward-prefix-chars
|
|
502 This function moves point backward over any number of characters with
|
|
503 expression prefix syntax. This includes both characters in the
|
|
504 expression prefix syntax class, and characters with the @samp{p} flag.
|
|
505 @end defun
|
|
506
|
|
507 @node Parsing Expressions
|
|
508 @section Parsing Balanced Expressions
|
|
509
|
|
510 Here are several functions for parsing and scanning balanced
|
|
511 expressions, also known as @dfn{sexps}, in which parentheses match in
|
|
512 pairs. The syntax table controls the interpretation of characters, so
|
|
513 these functions can be used for Lisp expressions when in Lisp mode and
|
|
514 for C expressions when in C mode. @xref{List Motion}, for convenient
|
|
515 higher-level functions for moving over balanced expressions.
|
|
516
|
|
517 @defun parse-partial-sexp start limit &optional target-depth stop-before state stop-comment
|
|
518 This function parses a sexp in the current buffer starting at
|
8469
|
519 @var{start}, not scanning past @var{limit}. It stops at position
|
|
520 @var{limit} or when certain criteria described below are met, and sets
|
|
521 point to the location where parsing stops. It returns a value
|
|
522 describing the status of the parse at the point where it stops.
|
6552
|
523
|
|
524 If @var{state} is @code{nil}, @var{start} is assumed to be at the top
|
|
525 level of parenthesis structure, such as the beginning of a function
|
|
526 definition. Alternatively, you might wish to resume parsing in the
|
|
527 middle of the structure. To do this, you must provide a @var{state}
|
|
528 argument that describes the initial status of parsing.
|
|
529
|
|
530 @cindex parenthesis depth
|
|
531 If the third argument @var{target-depth} is non-@code{nil}, parsing
|
|
532 stops if the depth in parentheses becomes equal to @var{target-depth}.
|
|
533 The depth starts at 0, or at whatever is given in @var{state}.
|
|
534
|
|
535 If the fourth argument @var{stop-before} is non-@code{nil}, parsing
|
|
536 stops when it comes to any character that starts a sexp. If
|
|
537 @var{stop-comment} is non-@code{nil}, parsing stops when it comes to the
|
|
538 start of a comment.
|
|
539
|
|
540 @cindex parse state
|
|
541 The fifth argument @var{state} is an eight-element list of the same
|
|
542 form as the value of this function, described below. The return value
|
|
543 of one call may be used to initialize the state of the parse on another
|
|
544 call to @code{parse-partial-sexp}.
|
|
545
|
|
546 The result is a list of eight elements describing the final state of
|
|
547 the parse:
|
|
548
|
|
549 @enumerate 0
|
|
550 @item
|
|
551 The depth in parentheses, counting from 0.
|
|
552
|
|
553 @item
|
|
554 @cindex innermost containing parentheses
|
8469
|
555 The character position of the start of the innermost parenthetical
|
|
556 grouping containing the stopping point; @code{nil} if none.
|
6552
|
557
|
|
558 @item
|
|
559 @cindex previous complete subexpression
|
|
560 The character position of the start of the last complete subexpression
|
|
561 terminated; @code{nil} if none.
|
|
562
|
|
563 @item
|
|
564 @cindex inside string
|
|
565 Non-@code{nil} if inside a string. More precisely, this is the
|
|
566 character that will terminate the string.
|
|
567
|
|
568 @item
|
|
569 @cindex inside comment
|
8469
|
570 @code{t} if inside a comment (of either style).
|
6552
|
571
|
|
572 @item
|
|
573 @cindex quote character
|
|
574 @code{t} if point is just after a quote character.
|
|
575
|
|
576 @item
|
|
577 The minimum parenthesis depth encountered during this scan.
|
|
578
|
|
579 @item
|
|
580 @code{t} if inside a comment of style ``b''.
|
|
581 @end enumerate
|
|
582
|
|
583 Elements 0, 3, 4, 5 and 7 are significant in the argument @var{state}.
|
|
584
|
|
585 @cindex indenting with parentheses
|
|
586 This function is most often used to compute indentation for languages
|
|
587 that have nested parentheses.
|
|
588 @end defun
|
|
589
|
|
590 @defun scan-lists from count depth
|
|
591 This function scans forward @var{count} balanced parenthetical groupings
|
|
592 from character number @var{from}. It returns the character position
|
|
593 where the scan stops.
|
|
594
|
|
595 If @var{depth} is nonzero, parenthesis depth counting begins from that
|
|
596 value. The only candidates for stopping are places where the depth in
|
|
597 parentheses becomes zero; @code{scan-lists} counts @var{count} such
|
|
598 places and then stops. Thus, a positive value for @var{depth} means go
|
8469
|
599 out @var{depth} levels of parenthesis.
|
6552
|
600
|
|
601 Scanning ignores comments if @code{parse-sexp-ignore-comments} is
|
|
602 non-@code{nil}.
|
|
603
|
8469
|
604 If the scan reaches the beginning or end of the buffer (or its
|
|
605 accessible portion), and the depth is not zero, an error is signaled.
|
|
606 If the depth is zero but the count is not used up, @code{nil} is
|
|
607 returned.
|
6552
|
608 @end defun
|
|
609
|
|
610 @defun scan-sexps from count
|
|
611 This function scans forward @var{count} sexps from character position
|
|
612 @var{from}. It returns the character position where the scan stops.
|
|
613
|
|
614 Scanning ignores comments if @code{parse-sexp-ignore-comments} is
|
|
615 non-@code{nil}.
|
|
616
|
8469
|
617 If the scan reaches the beginning or end of (the accessible part of) the
|
6552
|
618 buffer in the middle of a parenthetical grouping, an error is signaled.
|
|
619 If it reaches the beginning or end between groupings but before count is
|
|
620 used up, @code{nil} is returned.
|
|
621 @end defun
|
|
622
|
|
623 @defvar parse-sexp-ignore-comments
|
|
624 @cindex skipping comments
|
|
625 If the value is non-@code{nil}, then comments are treated as
|
|
626 whitespace by the functions in this section and by @code{forward-sexp}.
|
|
627
|
|
628 In older Emacs versions, this feature worked only when the comment
|
|
629 terminator is something like @samp{*/}, and appears only to end a
|
|
630 comment. In languages where newlines terminate comments, it was
|
|
631 necessary make this variable @code{nil}, since not every newline is the
|
|
632 end of a comment. This limitation no longer exists.
|
|
633 @end defvar
|
|
634
|
|
635 You can use @code{forward-comment} to move forward or backward over
|
|
636 one comment or several comments.
|
|
637
|
|
638 @defun forward-comment count
|
|
639 This function moves point forward across @var{count} comments (backward,
|
|
640 if @var{count} is negative). If it finds anything other than a comment
|
|
641 or whitespace, it stops, leaving point at the place where it stopped.
|
|
642 It also stops after satisfying @var{count}.
|
|
643 @end defun
|
|
644
|
|
645 To move forward over all comments and whitespace following point, use
|
|
646 @code{(forward-comment (buffer-size))}. @code{(buffer-size)} is a good
|
8469
|
647 argument to use, because the number of comments in the buffer cannot
|
6552
|
648 exceed that many.
|
|
649
|
|
650 @node Standard Syntax Tables
|
|
651 @section Some Standard Syntax Tables
|
|
652
|
|
653 Each of the major modes in Emacs has its own syntax table. Here are
|
|
654 several of them:
|
|
655
|
|
656 @defun standard-syntax-table
|
|
657 This function returns the standard syntax table, which is the syntax
|
|
658 table used in Fundamental mode.
|
|
659 @end defun
|
|
660
|
|
661 @defvar text-mode-syntax-table
|
|
662 The value of this variable is the syntax table used in Text mode.
|
|
663 @end defvar
|
|
664
|
|
665 @defvar c-mode-syntax-table
|
|
666 The value of this variable is the syntax table for C-mode buffers.
|
|
667 @end defvar
|
|
668
|
|
669 @defvar emacs-lisp-mode-syntax-table
|
|
670 The value of this variable is the syntax table used in Emacs Lisp mode
|
|
671 by editing commands. (It has no effect on the Lisp @code{read}
|
|
672 function.)
|
|
673 @end defvar
|
|
674
|
|
675 @node Syntax Table Internals
|
|
676 @section Syntax Table Internals
|
|
677 @cindex syntax table internals
|
|
678
|
|
679 Each element of a syntax table is an integer that encodes the syntax
|
|
680 of one character: the syntax class, possible matching character, and
|
|
681 flags. Lisp programs don't usually work with the elements directly; the
|
|
682 Lisp-level syntax table functions usually work with syntax descriptors
|
|
683 (@pxref{Syntax Descriptors}).
|
|
684
|
|
685 The low 8 bits of each element of a syntax table indicate the
|
|
686 syntax class.
|
|
687
|
|
688 @table @asis
|
|
689 @item @i{Integer}
|
|
690 @i{Class}
|
|
691 @item 0
|
|
692 whitespace
|
|
693 @item 1
|
|
694 punctuation
|
|
695 @item 2
|
|
696 word
|
|
697 @item 3
|
|
698 symbol
|
|
699 @item 4
|
|
700 open parenthesis
|
|
701 @item 5
|
|
702 close parenthesis
|
|
703 @item 6
|
|
704 expression prefix
|
|
705 @item 7
|
|
706 string quote
|
|
707 @item 8
|
|
708 paired delimiter
|
|
709 @item 9
|
|
710 escape
|
|
711 @item 10
|
|
712 character quote
|
|
713 @item 11
|
|
714 comment-start
|
|
715 @item 12
|
|
716 comment-end
|
|
717 @item 13
|
|
718 inherit
|
|
719 @end table
|
|
720
|
|
721 The next 8 bits are the matching opposite parenthesis (if the
|
|
722 character has parenthesis syntax); otherwise, they are not meaningful.
|
|
723 The next 6 bits are the flags.
|