Mercurial > emacs
annotate lisp/nxml/nxml-rap.el @ 107521:54f3a4d055ee
Document font-use-system-font.
* cmdargs.texi (Font X): Move most content to Fonts.
* frames.texi (Fonts): New node. Document font-use-system-font.
* emacs.texi (Top):
* xresources.texi (Table of Resources):
* mule.texi (Defining Fontsets, Charsets): Update xrefs.
| author | Chong Yidong <cyd@stupidchicken.com> |
|---|---|
| date | Sat, 20 Mar 2010 13:24:06 -0400 |
| parents | 1d1d5d9bd884 |
| children | 376148b31b5e |
| rev | line source |
|---|---|
| 86361 | 1 ;;; nxml-rap.el --- low-level support for random access parsing for nXML mode |
| 2 | |
| 106815 | 3 ;; Copyright (C) 2003, 2004, 2007, 2008, 2009, 2010 Free Software Foundation, Inc. |
| 86361 | 4 |
| 5 ;; Author: James Clark | |
| 6 ;; Keywords: XML | |
| 7 | |
| 86542 | 8 ;; This file is part of GNU Emacs. |
| 9 | |
|
94666
d495d4d5452f
Switch to recommended form of GPLv3 permissions notice.
Glenn Morris <rgm@gnu.org>
parents:
87665
diff
changeset
|
10 ;; GNU Emacs is free software: you can redistribute it and/or modify |
| 86542 | 11 ;; it under the terms of the GNU General Public License as published by |
|
94666
d495d4d5452f
Switch to recommended form of GPLv3 permissions notice.
Glenn Morris <rgm@gnu.org>
parents:
87665
diff
changeset
|
12 ;; the Free Software Foundation, either version 3 of the License, or |
|
d495d4d5452f
Switch to recommended form of GPLv3 permissions notice.
Glenn Morris <rgm@gnu.org>
parents:
87665
diff
changeset
|
13 ;; (at your option) any later version. |
| 86361 | 14 |
| 86542 | 15 ;; GNU Emacs is distributed in the hope that it will be useful, |
| 16 ;; but WITHOUT ANY WARRANTY; without even the implied warranty of | |
| 17 ;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | |
| 18 ;; GNU General Public License for more details. | |
| 86361 | 19 |
| 86542 | 20 ;; You should have received a copy of the GNU General Public License |
|
94666
d495d4d5452f
Switch to recommended form of GPLv3 permissions notice.
Glenn Morris <rgm@gnu.org>
parents:
87665
diff
changeset
|
21 ;; along with GNU Emacs. If not, see <http://www.gnu.org/licenses/>. |
| 86361 | 22 |
| 23 ;;; Commentary: | |
| 24 | |
| 25 ;; This uses xmltok.el to do XML parsing. The fundamental problem is | |
| 26 ;; how to handle changes. We don't want to maintain a complete parse | |
| 27 ;; tree. We also don't want to reparse from the start of the document | |
| 28 ;; on every keystroke. However, it is not possible in general to | |
| 29 ;; parse an XML document correctly starting at a random point in the | |
| 30 ;; middle. The main problems are comments, CDATA sections and | |
| 31 ;; processing instructions: these can all contain things that are | |
| 32 ;; indistinguishable from elements. Literals in the prolog are also a | |
| 33 ;; problem. Attribute value literals are not a problem because | |
| 34 ;; attribute value literals cannot contain less-than signs. | |
| 35 ;; | |
| 36 ;; Our strategy is to keep track of just the problematic things. | |
| 37 ;; Specifically, we keep track of all comments, CDATA sections and | |
| 38 ;; processing instructions in the instance. We do this by marking all | |
| 39 ;; except the first character of these with a non-nil nxml-inside text | |
| 40 ;; property. The value of the nxml-inside property is comment, | |
| 41 ;; cdata-section or processing-instruction. The first character does | |
| 42 ;; not have the nxml-inside property so we can find the beginning of | |
| 43 ;; the construct by looking for a change in a text property value | |
| 44 ;; (Emacs provides primitives for this). We use text properties | |
| 45 ;; rather than overlays, since the implementation of overlays doesn't | |
| 46 ;; look like it scales to large numbers of overlays in a buffer. | |
| 47 ;; | |
| 48 ;; We don't in fact track all these constructs, but only track them in | |
| 49 ;; some initial part of the instance. The variable `nxml-scan-end' | |
| 50 ;; contains the limit of where we have scanned up to for them. | |
| 51 ;; | |
| 52 ;; Thus to parse some random point in the file we first ensure that we | |
| 53 ;; have scanned up to that point. Then we search backwards for a | |
| 54 ;; <. Then we check whether the < has an nxml-inside property. If it | |
| 55 ;; does we go backwards to first character that does not have an | |
| 56 ;; nxml-inside property (this character must be a <). Then we start | |
| 57 ;; parsing forward from the < we have found. | |
| 58 ;; | |
| 59 ;; The prolog has to be parsed specially, so we also keep track of the | |
| 60 ;; end of the prolog in `nxml-prolog-end'. The prolog is reparsed on | |
| 61 ;; every change to the prolog. This won't work well if people try to | |
| 62 ;; edit huge internal subsets. Hopefully that will be rare. | |
| 63 ;; | |
| 64 ;; We keep track of the changes by adding to the buffer's | |
| 65 ;; after-change-functions hook. Scanning is also done as a | |
| 66 ;; prerequisite to fontification by adding to fontification-functions | |
| 67 ;; (in the same way as jit-lock). This means that scanning for these | |
| 68 ;; constructs had better be quick. Fortunately it is. Firstly, the | |
| 69 ;; typical proportion of comments, CDATA sections and processing | |
| 70 ;; instructions is small relative to other things. Secondly, to scan | |
| 71 ;; we just search for the regexp <[!?]. | |
| 72 ;; | |
| 73 ;; One problem is unclosed comments, processing instructions and CDATA | |
| 74 ;; sections. Suppose, for example, we encounter a <!-- but there's no | |
| 75 ;; matching -->. This is not an unexpected situation if the user is | |
| 76 ;; creating a comment. It is not helpful to treat the whole of the | |
| 77 ;; file starting from the <!-- onwards as a single unclosed comment | |
| 78 ;; token. Instead we treat just the <!-- as a piece of not well-formed | |
| 79 ;; markup and continue. The problem is that if at some later stage a | |
| 80 ;; --> gets added to the buffer after the unclosed <!--, we will need | |
| 81 ;; to reparse the buffer starting from the <!--. We need to keep | |
| 82 ;; track of these reparse dependencies; they are called dependent | |
| 83 ;; regions in the code. | |
| 84 | |
| 85 ;;; Code: | |
| 86 | |
| 87 (require 'xmltok) | |
| 88 (require 'nxml-util) | |
| 89 | |
| 90 (defvar nxml-prolog-end nil | |
| 91 "Integer giving position following end of the prolog.") | |
| 92 (make-variable-buffer-local 'nxml-prolog-end) | |
| 93 | |
| 94 (defvar nxml-scan-end nil | |
| 95 "Marker giving position up to which we have scanned. | |
| 96 nxml-scan-end must be >= nxml-prolog-end. Furthermore, nxml-scan-end | |
|
96496
e374c747704b
Fix typos, and general docstring cleanup.
Juanma Barranquero <lekktu@gmail.com>
parents:
95598
diff
changeset
|
97 must not be an inside position in the following sense. A position is |
| 86361 | 98 inside if the following character is a part of, but not the first |
| 99 character of, a CDATA section, comment or processing instruction. | |
| 100 Furthermore all positions >= nxml-prolog-end and < nxml-scan-end that | |
|
96496
e374c747704b
Fix typos, and general docstring cleanup.
Juanma Barranquero <lekktu@gmail.com>
parents:
95598
diff
changeset
|
101 are inside positions must have a non-nil `nxml-inside' property whose |
|
e374c747704b
Fix typos, and general docstring cleanup.
Juanma Barranquero <lekktu@gmail.com>
parents:
95598
diff
changeset
|
102 value is a symbol specifying what it is inside. Any characters with a |
|
e374c747704b
Fix typos, and general docstring cleanup.
Juanma Barranquero <lekktu@gmail.com>
parents:
95598
diff
changeset
|
103 non-nil `fontified' property must have position < nxml-scan-end and |
|
e374c747704b
Fix typos, and general docstring cleanup.
Juanma Barranquero <lekktu@gmail.com>
parents:
95598
diff
changeset
|
104 the correct face. Dependent regions must also be established for any |
| 86361 | 105 unclosed constructs starting before nxml-scan-end. |
|
96496
e374c747704b
Fix typos, and general docstring cleanup.
Juanma Barranquero <lekktu@gmail.com>
parents:
95598
diff
changeset
|
106 There must be no `nxml-inside' properties after nxml-scan-end.") |
| 86361 | 107 (make-variable-buffer-local 'nxml-scan-end) |
| 108 | |
| 109 (defsubst nxml-get-inside (pos) | |
| 110 (get-text-property pos 'nxml-inside)) | |
| 111 | |
| 112 (defsubst nxml-clear-inside (start end) | |
| 95598 | 113 (nxml-debug-clear-inside start end) |
| 86361 | 114 (remove-text-properties start end '(nxml-inside nil))) |
| 115 | |
| 116 (defsubst nxml-set-inside (start end type) | |
| 95598 | 117 (nxml-debug-set-inside start end) |
| 86361 | 118 (put-text-property start end 'nxml-inside type)) |
| 119 | |
| 120 (defun nxml-inside-end (pos) | |
| 121 "Return the end of the inside region containing POS. | |
| 122 Return nil if the character at POS is not inside." | |
| 123 (if (nxml-get-inside pos) | |
| 124 (or (next-single-property-change pos 'nxml-inside) | |
| 125 (point-max)) | |
| 126 nil)) | |
| 127 | |
| 128 (defun nxml-inside-start (pos) | |
| 129 "Return the start of the inside region containing POS. | |
| 130 Return nil if the character at POS is not inside." | |
| 131 (if (nxml-get-inside pos) | |
| 132 (or (previous-single-property-change (1+ pos) 'nxml-inside) | |
| 133 (point-min)) | |
| 134 nil)) | |
| 135 | |
| 136 ;;; Change management | |
| 137 | |
| 138 (defun nxml-scan-after-change (start end) | |
| 139 "Restore `nxml-scan-end' invariants after a change. | |
| 140 The change happened between START and END. | |
| 141 Return position after which lexical state is unchanged. | |
|
96496
e374c747704b
Fix typos, and general docstring cleanup.
Juanma Barranquero <lekktu@gmail.com>
parents:
95598
diff
changeset
|
142 END must be > `nxml-prolog-end'. START must be outside |
| 95598 | 143 any 'inside' regions and at the beginning of a token." |
| 86361 | 144 (if (>= start nxml-scan-end) |
| 145 nxml-scan-end | |
| 146 (let ((inside-remove-start start) | |
| 147 xmltok-errors | |
| 148 xmltok-dependent-regions) | |
| 149 (while (or (when (xmltok-forward-special (min end nxml-scan-end)) | |
| 150 (when (memq xmltok-type | |
| 151 '(comment | |
| 152 cdata-section | |
| 153 processing-instruction)) | |
| 154 (nxml-clear-inside inside-remove-start | |
| 155 (1+ xmltok-start)) | |
| 156 (nxml-set-inside (1+ xmltok-start) | |
| 157 (point) | |
| 158 xmltok-type) | |
| 159 (setq inside-remove-start (point))) | |
| 160 (if (< (point) (min end nxml-scan-end)) | |
| 161 t | |
| 162 (setq end (point)) | |
| 163 nil)) | |
| 164 ;; The end of the change was inside but is now outside. | |
| 165 ;; Imagine something really weird like | |
| 166 ;; <![CDATA[foo <!-- bar ]]> <![CDATA[ stuff --> <!-- ]]> --> | |
| 167 ;; and suppose we deleted "<