annotate lisp/emacs-lisp/sregex.el @ 90618:b7ce72709298

(get_composition_id): Pay attention to TAB component.
author Kenichi Handa <handa@m17n.org>
date Mon, 16 Oct 2006 07:53:52 +0000
parents c5406394f567
children 6588c6259dfb
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
22537
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
1 ;;; sregex.el --- symbolic regular expressions
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
2
64751
5b1a238fcbb4 Update years in copyright notice; nfc.
Thien-Thi Nguyen <ttn@gnuvola.org>
parents: 64085
diff changeset
3 ;; Copyright (C) 1997, 1998, 2000, 2002, 2003, 2004,
68648
067115a6e738 Update years in copyright notice; nfc.
Thien-Thi Nguyen <ttn@gnuvola.org>
parents: 64751
diff changeset
4 ;; 2005, 2006 Free Software Foundation, Inc.
22537
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
5
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
6 ;; Author: Bob Glickstein <bobg+sregex@zanshin.com>
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
7 ;; Maintainer: Bob Glickstein <bobg+sregex@zanshin.com>
29212
f35b1d67aa8f Add finder keywords.
Dave Love <fx@gnu.org>
parents: 29069
diff changeset
8 ;; Keywords: extensions
22537
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
9
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
10 ;; This file is part of GNU Emacs.
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
11
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
12 ;; GNU Emacs is free software; you can redistribute it and/or modify
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
13 ;; it under the terms of the GNU General Public License as published by
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
14 ;; the Free Software Foundation; either version 2, or (at your option)
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
15 ;; any later version.
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
16
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
17 ;; GNU Emacs is distributed in the hope that it will be useful,
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
18 ;; but WITHOUT ANY WARRANTY; without even the implied warranty of
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
19 ;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
20 ;; GNU General Public License for more details.
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
21
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
22 ;; You should have received a copy of the GNU General Public License
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
23 ;; along with GNU Emacs; see the file COPYING. If not, write to the
64085
18a818a2ee7c Update FSF's address.
Lute Kamstra <lute@gnu.org>
parents: 52401
diff changeset
24 ;; Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor,
18a818a2ee7c Update FSF's address.
Lute Kamstra <lute@gnu.org>
parents: 52401
diff changeset
25 ;; Boston, MA 02110-1301, USA.
22537
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
26
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
27 ;;; Commentary:
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
28
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
29 ;; This package allows you to write regular expressions using a
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
30 ;; totally new, Lisp-like syntax.
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
31
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
32 ;; A "symbolic regular expression" (sregex for short) is a Lisp form
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
33 ;; that, when evaluated, produces the string form of the specified
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
34 ;; regular expression. Here's a simple example:
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
35
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
36 ;; (sregexq (or "Bob" "Robert")) => "Bob\\|Robert"
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
37
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
38 ;; As you can see, an sregex is specified by placing one or more
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
39 ;; special clauses in a call to `sregexq'. The clause in this case is
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
40 ;; the `or' of two strings (not to be confused with the Lisp function
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
41 ;; `or'). The list of allowable clauses appears below.
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
42
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
43 ;; With sregex, it is never necessary to "escape" magic characters
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
44 ;; that are meant to be taken literally; that happens automatically.
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
45 ;; For example:
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
46
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
47 ;; (sregexq "M*A*S*H") => "M\\*A\\*S\\*H"
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
48
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
49 ;; It is also unnecessary to "group" parts of the expression together
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
50 ;; to overcome operator precedence; that also happens automatically.
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
51 ;; For example:
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
52
29069
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
53 ;; (sregexq (opt (or "Bob" "Robert"))) => "\\(?:Bob\\|Robert\\)?"
22537
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
54
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
55 ;; It *is* possible to group parts of the expression in order to refer
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
56 ;; to them with numbered backreferences:
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
57
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
58 ;; (sregexq (group (or "Go" "Run"))
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
59 ;; ", Spot, "
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
60 ;; (backref 1)) => "\\(Go\\|Run\\), Spot, \\1"
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
61
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
62 ;; `sregexq' is a macro. Each time it is used, it constructs a simple
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
63 ;; Lisp expression that then invokes a moderately complex engine to
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
64 ;; interpret the sregex and render the string form. Because of this,
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
65 ;; I don't recommend sprinkling calls to `sregexq' throughout your
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
66 ;; code, the way one normally does with string regexes (which are
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
67 ;; cheap to evaluate). Instead, it's wiser to precompute the regexes
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
68 ;; you need wherever possible instead of repeatedly constructing the
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
69 ;; same ones over and over. Example:
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
70
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
71 ;; (let ((field-regex (sregexq (opt "resent-")
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
72 ;; (or "to" "cc" "bcc"))))
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
73 ;; ...
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
74 ;; (while ...
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
75 ;; ...
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
76 ;; (re-search-forward field-regex ...)
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
77 ;; ...))
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
78
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
79 ;; The arguments to `sregexq' are automatically quoted, but the
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
80 ;; flipside of this is that it is not straightforward to include
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
81 ;; computed (i.e., non-constant) values in `sregexq' expressions. So
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
82 ;; `sregex' is a function that is like `sregexq' but which does not
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
83 ;; automatically quote its values. Literal sregex clauses must be
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
84 ;; explicitly quoted like so:
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
85
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
86 ;; (sregex '(or "Bob" "Robert")) => "Bob\\|Robert"
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
87
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
88 ;; but computed clauses can be included easily, allowing for the reuse
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
89 ;; of common clauses:
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
90
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
91 ;; (let ((dotstar '(0+ any))
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
92 ;; (whitespace '(1+ (syntax ?-)))
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
93 ;; (digits '(1+ (char (?0 . ?9)))))
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
94 ;; (sregex 'bol dotstar ":" whitespace digits)) => "^.*:\\s-+[0-9]+"
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
95
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
96 ;; To use this package in a Lisp program, simply (require 'sregex).
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
97
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
98 ;; Here are the clauses allowed in an `sregex' or `sregexq'
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
99 ;; expression:
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
100
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
101 ;; - a string
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
102 ;; This stands for the literal string. If it contains
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
103 ;; metacharacters, they will be escaped in the resulting regex
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
104 ;; (using `regexp-quote').
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
105
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
106 ;; - the symbol `any'
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
107 ;; This stands for ".", a regex matching any character except
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
108 ;; newline.
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
109
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
110 ;; - the symbol `bol'
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
111 ;; Stands for "^", matching the empty string at the beginning of a line
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
112
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
113 ;; - the symbol `eol'
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
114 ;; Stands for "$", matching the empty string at the end of a line
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
115
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
116 ;; - (group CLAUSE ...)
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
117 ;; Groups the given CLAUSEs using "\\(" and "\\)".
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
118
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
119 ;; - (sequence CLAUSE ...)
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
120
29069
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
121 ;; Groups the given CLAUSEs; may or may not use "\\(?:" and "\\)".
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
122 ;; Clauses grouped by `sequence' do not count for purposes of
22537
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
123 ;; numbering backreferences. Use `sequence' in situations like
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
124 ;; this:
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
125
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
126 ;; (sregexq (or "dog" "cat"
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
127 ;; (sequence (opt "sea ") "monkey")))
29069
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
128 ;; => "dog\\|cat\\|\\(?:sea \\)?monkey"
22537
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
129
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
130 ;; where a single `or' alternate needs to contain multiple
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
131 ;; subclauses.
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
132
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
133 ;; - (backref N)
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
134 ;; Matches the same string previously matched by the Nth "group" in
29069
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
135 ;; the same sregex. N is a positive integer.
22537
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
136
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
137 ;; - (or CLAUSE ...)
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
138 ;; Matches any one of the CLAUSEs by separating them with "\\|".
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
139
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
140 ;; - (0+ CLAUSE ...)
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
141 ;; Concatenates the given CLAUSEs and matches zero or more
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
142 ;; occurrences by appending "*".
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
143
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
144 ;; - (1+ CLAUSE ...)
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
145 ;; Concatenates the given CLAUSEs and matches one or more
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
146 ;; occurrences by appending "+".
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
147
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
148 ;; - (opt CLAUSE ...)
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
149 ;; Concatenates the given CLAUSEs and matches zero or one occurrence
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
150 ;; by appending "?".
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
151
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
152 ;; - (repeat MIN MAX CLAUSE ...)
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
153 ;; Concatenates the given CLAUSEs and constructs a regex matching at
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
154 ;; least MIN occurrences and at most MAX occurrences. MIN must be a
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
155 ;; non-negative integer. MAX must be a non-negative integer greater
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
156 ;; than or equal to MIN; or MAX can be nil to mean "infinity."
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
157
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
158 ;; - (char CHAR-CLAUSE ...)
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
159 ;; Creates a "character class" matching one character from the given
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
160 ;; set. See below for how to construct a CHAR-CLAUSE.
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
161
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
162 ;; - (not-char CHAR-CLAUSE ...)
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
163 ;; Creates a "character class" matching any one character not in the
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
164 ;; given set. See below for how to construct a CHAR-CLAUSE.
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
165
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
166 ;; - the symbol `bot'
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
167 ;; Stands for "\\`", matching the empty string at the beginning of
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
168 ;; text (beginning of a string or of a buffer).
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
169
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
170 ;; - the symbol `eot'
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
171 ;; Stands for "\\'", matching the empty string at the end of text.
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
172
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
173 ;; - the symbol `point'
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
174 ;; Stands for "\\=", matching the empty string at point.
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
175
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
176 ;; - the symbol `word-boundary'
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
177 ;; Stands for "\\b", matching the empty string at the beginning or
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
178 ;; end of a word.
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
179
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
180 ;; - the symbol `not-word-boundary'
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
181 ;; Stands for "\\B", matching the empty string not at the beginning
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
182 ;; or end of a word.
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
183
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
184 ;; - the symbol `bow'
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
185 ;; Stands for "\\<", matching the empty string at the beginning of a
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
186 ;; word.
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
187
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
188 ;; - the symbol `eow'
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
189 ;; Stands for "\\>", matching the empty string at the end of a word.
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
190
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
191 ;; - the symbol `wordchar'
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
192 ;; Stands for the regex "\\w", matching a word-constituent character
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
193 ;; (as determined by the current syntax table)
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
194
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
195 ;; - the symbol `not-wordchar'
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
196 ;; Stands for the regex "\\W", matching a non-word-constituent
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
197 ;; character.
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
198
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
199 ;; - (syntax CODE)
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
200 ;; Stands for the regex "\\sCODE", where CODE is a syntax table code
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
201 ;; (a single character). Matches any character with the requested
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
202 ;; syntax.
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
203
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
204 ;; - (not-syntax CODE)
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
205 ;; Stands for the regex "\\SCODE", where CODE is a syntax table code
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
206 ;; (a single character). Matches any character without the
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
207 ;; requested syntax.
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
208
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
209 ;; - (regex REGEX)
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
210 ;; This is a "trapdoor" for including ordinary regular expression
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
211 ;; strings in the result. Some regular expressions are clearer when
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
212 ;; written the old way: "[a-z]" vs. (sregexq (char (?a . ?z))), for
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
213 ;; instance. However, see the note under "Bugs," below.
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
214
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
215 ;; Each CHAR-CLAUSE that is passed to (char ...) and (not-char ...)
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
216 ;; has one of the following forms:
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
217
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
218 ;; - a character
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
219 ;; Adds that character to the set.
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
220
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
221 ;; - a string
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
222 ;; Adds all the characters in the string to the set.
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
223
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
224 ;; - A pair (MIN . MAX)
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
225 ;; Where MIN and MAX are characters, adds the range of characters
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
226 ;; from MIN through MAX to the set.
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
227
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
228 ;;; To do:
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
229
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
230 ;; An earlier version of this package could optionally translate the
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
231 ;; symbolic regex into other languages' syntaxes, e.g. Perl. For
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
232 ;; instance, with Perl syntax selected, (sregexq (or "ab" "cd")) would
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
233 ;; yield "ab|cd" instead of "ab\\|cd". It might be useful to restore
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
234 ;; such a facility.
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
235
29069
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
236 ;; - handle multibyte chars in sregex--char-aux
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
237 ;; - add support for character classes ([:blank:], ...)
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
238 ;; - add support for non-greedy operators *? and +?
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
239 ;; - bug: (sregexq (opt (opt ?a))) returns "a??" which is a non-greedy "a?"
22537
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
240
29069
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
241 ;;; Bugs:
22537
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
242
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
243 ;;; Code:
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
244
29069
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
245 (eval-when-compile (require 'cl))
22537
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
246
29069
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
247 ;; Compatibility code for when we didn't have shy-groups
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
248 (defvar sregex--current-sregex nil)
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
249 (defun sregex-info () nil)
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
250 (defmacro sregex-save-match-data (&rest forms) (cons 'save-match-data forms))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
251 (defun sregex-replace-match (r &optional f l str subexp x)
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
252 (replace-match r f l str subexp))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
253 (defun sregex-match-string (c &optional i x) (match-string c i))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
254 (defun sregex-match-string-no-properties (count &optional in-string sregex)
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
255 (match-string-no-properties count in-string))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
256 (defun sregex-match-beginning (count &optional sregex) (match-beginning count))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
257 (defun sregex-match-end (count &optional sregex) (match-end count))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
258 (defun sregex-match-data (&optional sregex) (match-data))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
259 (defun sregex-backref-num (n &optional sregex) n)
22537
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
260
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
261
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
262 (defun sregex (&rest exps)
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
263 "Symbolic regular expression interpreter.
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
264 This is exactly like `sregexq' (q.v.) except that it evaluates all its
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
265 arguments, so literal sregex clauses must be quoted. For example:
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
266
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
267 (sregex '(or \"Bob\" \"Robert\")) => \"Bob\\\\|Robert\"
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
268
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
269 An argument-evaluating sregex interpreter lets you reuse sregex
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
270 subexpressions:
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
271
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
272 (let ((dotstar '(0+ any))
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
273 (whitespace '(1+ (syntax ?-)))
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
274 (digits '(1+ (char (?0 . ?9)))))
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
275 (sregex 'bol dotstar \":\" whitespace digits)) => \"^.*:\\\\s-+[0-9]+\""
29069
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
276 (sregex--sequence exps nil))
22537
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
277
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
278 (defmacro sregexq (&rest exps)
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
279 "Symbolic regular expression interpreter.
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
280 This macro allows you to specify a regular expression (regexp) in
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
281 symbolic form, and converts it into the string form required by Emacs's
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
282 regex functions such as `re-search-forward' and `looking-at'. Here is
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
283 a simple example:
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
284
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
285 (sregexq (or \"Bob\" \"Robert\")) => \"Bob\\\\|Robert\"
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
286
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
287 As you can see, an sregex is specified by placing one or more special
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
288 clauses in a call to `sregexq'. The clause in this case is the `or'
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
289 of two strings (not to be confused with the Lisp function `or'). The
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
290 list of allowable clauses appears below.
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
291
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
292 With `sregex', it is never necessary to \"escape\" magic characters
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
293 that are meant to be taken literally; that happens automatically.
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
294 For example:
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
295
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
296 (sregexq \"M*A*S*H\") => \"M\\\\*A\\\\*S\\\\*H\"
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
297
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
298 It is also unnecessary to \"group\" parts of the expression together
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
299 to overcome operator precedence; that also happens automatically.
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
300 For example:
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
301
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
302 (sregexq (opt (or \"Bob\" \"Robert\"))) => \"\\\\(Bob\\\\|Robert\\\\)?\"
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
303
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
304 It *is* possible to group parts of the expression in order to refer
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
305 to them with numbered backreferences:
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
306
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
307 (sregexq (group (or \"Go\" \"Run\"))
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
308 \", Spot, \"
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
309 (backref 1)) => \"\\\\(Go\\\\|Run\\\\), Spot, \\\\1\"
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
310
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
311 If `sregexq' needs to introduce its own grouping parentheses, it will
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
312 automatically renumber your backreferences:
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
313
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
314 (sregexq (opt \"resent-\")
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
315 (group (or \"to\" \"cc\" \"bcc\"))
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
316 \": \"
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
317 (backref 1)) => \"\\\\(resent-\\\\)?\\\\(to\\\\|cc\\\\|bcc\\\\): \\\\2\"
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
318
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
319 `sregexq' is a macro. Each time it is used, it constructs a simple
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
320 Lisp expression that then invokes a moderately complex engine to
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
321 interpret the sregex and render the string form. Because of this, I
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
322 don't recommend sprinkling calls to `sregexq' throughout your code,
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
323 the way one normally does with string regexes (which are cheap to
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
324 evaluate). Instead, it's wiser to precompute the regexes you need
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
325 wherever possible instead of repeatedly constructing the same ones
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
326 over and over. Example:
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
327
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
328 (let ((field-regex (sregexq (opt \"resent-\")
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
329 (or \"to\" \"cc\" \"bcc\"))))
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
330 ...
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
331 (while ...
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
332 ...
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
333 (re-search-forward field-regex ...)
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
334 ...))
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
335
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
336 The arguments to `sregexq' are automatically quoted, but the
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
337 flipside of this is that it is not straightforward to include
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
338 computed (i.e., non-constant) values in `sregexq' expressions. So
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
339 `sregex' is a function that is like `sregexq' but which does not
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
340 automatically quote its values. Literal sregex clauses must be
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
341 explicitly quoted like so:
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
342
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
343 (sregex '(or \"Bob\" \"Robert\")) => \"Bob\\\\|Robert\"
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
344
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
345 but computed clauses can be included easily, allowing for the reuse
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
346 of common clauses:
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
347
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
348 (let ((dotstar '(0+ any))
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
349 (whitespace '(1+ (syntax ?-)))
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
350 (digits '(1+ (char (?0 . ?9)))))
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
351 (sregex 'bol dotstar \":\" whitespace digits)) => \"^.*:\\\\s-+[0-9]+\"
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
352
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
353 Here are the clauses allowed in an `sregex' or `sregexq' expression:
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
354
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
355 - a string
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
356 This stands for the literal string. If it contains
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
357 metacharacters, they will be escaped in the resulting regex
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
358 (using `regexp-quote').
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
359
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
360 - the symbol `any'
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
361 This stands for \".\", a regex matching any character except
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
362 newline.
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
363
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
364 - the symbol `bol'
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
365 Stands for \"^\", matching the empty string at the beginning of a line
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
366
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
367 - the symbol `eol'
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
368 Stands for \"$\", matching the empty string at the end of a line
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
369
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
370 - (group CLAUSE ...)
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
371 Groups the given CLAUSEs using \"\\\\(\" and \"\\\\)\".
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
372
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
373 - (sequence CLAUSE ...)
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
374
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
375 Groups the given CLAUSEs; may or may not use \"\\\\(\" and \"\\\\)\".
29069
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
376 Clauses grouped by `sequence' do not count for purposes of
22537
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
377 numbering backreferences. Use `sequence' in situations like
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
378 this:
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
379
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
380 (sregexq (or \"dog\" \"cat\"
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
381 (sequence (opt \"sea \") \"monkey\")))
29069
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
382 => \"dog\\\\|cat\\\\|\\\\(?:sea \\\\)?monkey\"
22537
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
383
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
384 where a single `or' alternate needs to contain multiple
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
385 subclauses.
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
386
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
387 - (backref N)
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
388 Matches the same string previously matched by the Nth \"group\" in
29069
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
389 the same sregex. N is a positive integer.
22537
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
390
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
391 - (or CLAUSE ...)
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
392 Matches any one of the CLAUSEs by separating them with \"\\\\|\".
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
393
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
394 - (0+ CLAUSE ...)
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
395 Concatenates the given CLAUSEs and matches zero or more
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
396 occurrences by appending \"*\".
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
397
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
398 - (1+ CLAUSE ...)
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
399 Concatenates the given CLAUSEs and matches one or more
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
400 occurrences by appending \"+\".
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
401
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
402 - (opt CLAUSE ...)
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
403 Concatenates the given CLAUSEs and matches zero or one occurrence
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
404 by appending \"?\".
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
405
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
406 - (repeat MIN MAX CLAUSE ...)
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
407 Concatenates the given CLAUSEs and constructs a regex matching at
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
408 least MIN occurrences and at most MAX occurrences. MIN must be a
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
409 non-negative integer. MAX must be a non-negative integer greater
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
410 than or equal to MIN; or MAX can be nil to mean \"infinity.\"
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
411
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
412 - (char CHAR-CLAUSE ...)
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
413 Creates a \"character class\" matching one character from the given
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
414 set. See below for how to construct a CHAR-CLAUSE.
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
415
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
416 - (not-char CHAR-CLAUSE ...)
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
417 Creates a \"character class\" matching any one character not in the
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
418 given set. See below for how to construct a CHAR-CLAUSE.
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
419
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
420 - the symbol `bot'
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
421 Stands for \"\\\\`\", matching the empty string at the beginning of
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
422 text (beginning of a string or of a buffer).
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
423
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
424 - the symbol `eot'
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
425 Stands for \"\\\\'\", matching the empty string at the end of text.
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
426
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
427 - the symbol `point'
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
428 Stands for \"\\\\=\", matching the empty string at point.
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
429
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
430 - the symbol `word-boundary'
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
431 Stands for \"\\\\b\", matching the empty string at the beginning or
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
432 end of a word.
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
433
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
434 - the symbol `not-word-boundary'
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
435 Stands for \"\\\\B\", matching the empty string not at the beginning
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
436 or end of a word.
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
437
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
438 - the symbol `bow'
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
439 Stands for \"\\\\\\=<\", matching the empty string at the beginning of a
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
440 word.
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
441
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
442 - the symbol `eow'
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
443 Stands for \"\\\\\\=>\", matching the empty string at the end of a word.
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
444
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
445 - the symbol `wordchar'
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
446 Stands for the regex \"\\\\w\", matching a word-constituent character
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
447 (as determined by the current syntax table)
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
448
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
449 - the symbol `not-wordchar'
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
450 Stands for the regex \"\\\\W\", matching a non-word-constituent
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
451 character.
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
452
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
453 - (syntax CODE)
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
454 Stands for the regex \"\\\\sCODE\", where CODE is a syntax table code
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
455 (a single character). Matches any character with the requested
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
456 syntax.
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
457
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
458 - (not-syntax CODE)
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
459 Stands for the regex \"\\\\SCODE\", where CODE is a syntax table code
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
460 (a single character). Matches any character without the
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
461 requested syntax.
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
462
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
463 - (regex REGEX)
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
464 This is a \"trapdoor\" for including ordinary regular expression
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
465 strings in the result. Some regular expressions are clearer when
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
466 written the old way: \"[a-z]\" vs. (sregexq (char (?a . ?z))), for
29069
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
467 instance.
22537
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
468
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
469 Each CHAR-CLAUSE that is passed to (char ...) and (not-char ...)
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
470 has one of the following forms:
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
471
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
472 - a character
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
473 Adds that character to the set.
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
474
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
475 - a string
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
476 Adds all the characters in the string to the set.
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
477
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
478 - A pair (MIN . MAX)
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
479 Where MIN and MAX are characters, adds the range of characters
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
480 from MIN through MAX to the set."
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
481 `(apply 'sregex ',exps))
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
482
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
483 (defun sregex--engine (exp combine)
29069
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
484 (cond
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
485 ((stringp exp)
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
486 (if (and combine
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
487 (eq combine 'suffix)
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
488 (/= (length exp) 1))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
489 (concat "\\(?:" (regexp-quote exp) "\\)")
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
490 (regexp-quote exp)))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
491 ((symbolp exp)
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
492 (ecase exp
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
493 (any ".")
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
494 (bol "^")
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
495 (eol "$")
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
496 (wordchar "\\w")
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
497 (not-wordchar "\\W")
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
498 (bot "\\`")
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
499 (eot "\\'")
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
500 (point "\\=")
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
501 (word-boundary "\\b")
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
502 (not-word-boundary "\\B")
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
503 (bow "\\<")
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
504 (eow "\\>")))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
505 ((consp exp)
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
506 (funcall (intern (concat "sregex--"
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
507 (symbol-name (car exp))))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
508 (cdr exp)
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
509 combine))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
510 (t (error "Invalid expression: %s" exp))))
22537
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
511
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
512 (defun sregex--sequence (exps combine)
29069
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
513 (if (= (length exps) 1) (sregex--engine (car exps) combine)
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
514 (let ((re (mapconcat
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
515 (lambda (e) (sregex--engine e 'concat))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
516 exps "")))
22537
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
517 (if (eq combine 'suffix)
29069
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
518 (concat "\\(?:" re "\\)")
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
519 re))))
22537
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
520
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
521 (defun sregex--or (exps combine)
29069
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
522 (if (= (length exps) 1) (sregex--engine (car exps) combine)
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
523 (let ((re (mapconcat
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
524 (lambda (e) (sregex--engine e 'or))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
525 exps "\\|")))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
526 (if (not (eq combine 'or))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
527 (concat "\\(?:" re "\\)")
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
528 re))))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
529
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
530 (defun sregex--group (exps combine) (concat "\\(" (sregex--sequence exps nil) "\\)"))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
531
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
532 (defun sregex--backref (exps combine) (concat "\\" (int-to-string (car exps))))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
533 (defun sregex--opt (exps combine) (concat (sregex--sequence exps 'suffix) "?"))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
534 (defun sregex--0+ (exps combine) (concat (sregex--sequence exps 'suffix) "*"))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
535 (defun sregex--1+ (exps combine) (concat (sregex--sequence exps 'suffix) "+"))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
536
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
537 (defun sregex--char (exps combine) (sregex--char-aux nil exps))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
538 (defun sregex--not-char (exps combine) (sregex--char-aux t exps))
22537
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
539
29069
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
540 (defun sregex--syntax (exps combine) (format "\\s%c" (car exps)))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
541 (defun sregex--not-syntax (exps combine) (format "\\S%c" (car exps)))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
542
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
543 (defun sregex--regex (exps combine)
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
544 (if combine (concat "\\(?:" (car exps) "\\)") (car exps)))
22537
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
545
29069
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
546 (defun sregex--repeat (exps combine)
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
547 (let* ((min (or (pop exps) 0))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
548 (minstr (number-to-string min))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
549 (max (pop exps)))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
550 (concat (sregex--sequence exps 'suffix)
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
551 (concat "\\{" minstr ","
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
552 (when max (number-to-string max)) "\\}"))))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
553
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
554 (defun sregex--char-range (start end)
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
555 (let ((startc (char-to-string start))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
556 (endc (char-to-string end)))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
557 (cond
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
558 ((> end (+ start 2)) (concat startc "-" endc))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
559 ((> end (+ start 1)) (concat startc (char-to-string (1+ start)) endc))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
560 ((> end start) (concat startc endc))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
561 (t startc))))
22537
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
562
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
563 (defun sregex--char-aux (complement args)
29069
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
564 ;; regex-opt does the same, we should join effort.
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
565 (let ((chars (make-bool-vector 256 nil))) ; Yeah, right!
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
566 (dolist (arg args)
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
567 (cond ((integerp arg) (aset chars arg t))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
568 ((stringp arg) (mapcar (lambda (c) (aset chars c t)) arg))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
569 ((consp arg)
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
570 (let ((start (car arg))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
571 (end (cdr arg)))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
572 (when (> start end)
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
573 (let ((tmp start)) (setq start end) (setq end tmp)))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
574 ;; now start <= end
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
575 (let ((i start))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
576 (while (<= i end)
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
577 (aset chars i t)
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
578 (setq i (1+ i))))))))
22537
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
579 ;; now chars is a map of the characters in the class
29069
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
580 (let ((caret (aref chars ?^))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
581 (dash (aref chars ?-))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
582 (class (if (aref chars ?\]) "]" "")))
22537
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
583 (aset chars ?^ nil)
29069
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
584 (aset chars ?- nil)
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
585 (aset chars ?\] nil)
22537
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
586
29069
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
587 (let (start end)
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
588 (dotimes (i 256)
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
589 (if (aref chars i)
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
590 (progn
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
591 (unless start (setq start i))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
592 (setq end i)
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
593 (aset chars i nil))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
594 (when start
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
595 (setq class (concat class (sregex--char-range start end)))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
596 (setq start nil))))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
597 (if start
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
598 (setq class (concat class (sregex--char-range start end)))))
22537
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
599
29069
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
600 (if (> (length class) 0)
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
601 (setq class (concat class (if caret "^") (if dash "-")))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
602 (setq class (concat class (if dash "-") (if caret "^"))))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
603 (if (and (not complement) (= (length class) 1))
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
604 (regexp-quote class)
cb028b1d6345 Rewritten to take advantage of shy-groups and
Stefan Monnier <monnier@iro.umontreal.ca>
parents: 22974
diff changeset
605 (concat "[" (if complement "^") class "]")))))
22537
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
606
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
607 (provide 'sregex)
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
608
52401
695cf19ef79e Add arch taglines
Miles Bader <miles@gnu.org>
parents: 38436
diff changeset
609 ;;; arch-tag: 460c1f5a-eb6e-42ec-a451-ffac78bdf492
22537
7947a4ea28a8 Initial revision
Dan Nicolaescu <done@ece.arizona.edu>
parents:
diff changeset
610 ;;; sregex.el ends here