Mercurial > geeqie
comparison doc/wiki2docbook/html2db/index.src.html @ 1773:2ae81598b254
scripts for converting wiki documentation to docbook
author | nadvornik |
---|---|
date | Sun, 22 Nov 2009 09:12:22 +0000 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
1772:9f3b7a089caf | 1773:2ae81598b254 |
---|---|
1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" | |
2 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" [ | |
3 <!ENTITY html2db "<code>html2db.xsl</code>"> | |
4 ]> | |
5 <html xmlns:x="http://www.w3.org/1999/xhtml" | |
6 xmlns:db="urn:docbook"> | |
7 <head> | |
8 <title>This title is ignored</title> | |
9 </head> | |
10 <body> | |
11 | |
12 <h1>html2db.xsl</h1> | |
13 | |
14 <!-- The xmlns attribute escapes into the Docbook namespace --> | |
15 <articleinfo xmlns="urn:docbook"> | |
16 <author> | |
17 <firstname>Oliver</firstname> | |
18 <surname>Steele</surname> | |
19 </author> | |
20 <revhistory> | |
21 <revision> | |
22 <revnumber>1</revnumber> | |
23 <date>2004-07-30</date> | |
24 </revision> | |
25 <revision> | |
26 <revnumber>1.0.1</revnumber> | |
27 <date>2004-08-01</date> | |
28 <revdescription><para>Editorial changes to the | |
29 readme.</para></revdescription> | |
30 </revision> | |
31 </revhistory> | |
32 <date>2004-07-30</date> | |
33 </articleinfo> | |
34 | |
35 <h2>Overview</h2> | |
36 | |
37 <p>&html2db; converts an XHTML source document into a Docbook output | |
38 document. It provides features for customizing the generation of the | |
39 output, so that the output can be tuned by annotating | |
40 the source, rather than hand-editing the output. This makes it useful | |
41 in a processing pipeline where the source documents are maintained in | |
42 HTML, although it can be used as a one-time conversion tool | |
43 too.</p> | |
44 | |
45 <p>This document is an example of &html2db; used in conjunction with | |
46 the Docbook XSL stylesheets. The <a href="index.src.html">source | |
47 file</a> is an XHTML file with some embedded Docbook elements and | |
48 processing instructions. &html2db; compiles it into a <a | |
49 href="index.xml">Docbook document</a>, which can be used to generate | |
50 this output file (which includes a Table of Contents), a <a | |
51 href="docs/index.html">chunked HTML file</a>, a <a | |
52 href="html2db.pdf">PDF</a>, or other formats.</p> | |
53 | |
54 <h2>Features</h2> | |
55 <dl> | |
56 <dt>XSLT implementation</dt> | |
57 <dd>This tool is designed to be embedded within an XSLT processing | |
58 pipeline. <code>html2html.xslt</code> can be used in a custom | |
59 stylesheet or integrated into a larger system. See <a | |
60 href="#embedding">Overriding</a>.</dd> | |
61 | |
62 <dt>Customizable</dt> | |
63 <dd>The output can be customized by the means of additonal markup in | |
64 the XHMTL source. See the section on <a | |
65 href="#customization">customization</a>.</dd> | |
66 | |
67 <dt>Creates outline structure</dt> | |
68 <dd><code>h1</code>, <code>h2</code>, etc. are turned into nested | |
69 <code>section</code> and <code>title</code> elements (as opposed to | |
70 bridge heads).</dd> | |
71 | |
72 <dt>Accepts a wide variety of XHTML</dt> | |
73 <dd>In particular, &html2db; automatically wraps <dfn>naked item | |
74 text</dfn> (text that is not enclosed in a <code><p></code>) | |
75 inside a table cell or list item. Naked text is a common property of | |
76 XHTML documents, but needs to be clothed to create valid | |
77 Docbook.<db:footnote><p>This feature is limited. See <a | |
78 href="#implicit-blocks">Implicit Blocks</a>.)</p></db:footnote></dd> | |
79 | |
80 </dl> | |
81 | |
82 <h2>Requirements</h2> | |
83 <ul> | |
84 <li>Java: JRE or JDK 1.3 or greater.</li> | |
85 <li>Xalan 2.5.0.</li> | |
86 <li>Familiarity with installing and running JAR files.</li> | |
87 </ul> | |
88 | |
89 <p>&html2db; might work with earlier versions of Java and Xalan, and | |
90 it might work with other XSLT processors such as Saxon and | |
91 xsltproc.</p> | |
92 | |
93 <h2>License</h2> | |
94 <p>This software is released under the Open Source <a href="http://www.opensource.org/licenses/artistic-license.php">Artistic License</a>.</p> | |
95 | |
96 <h2>Installation</h2> | |
97 <ul> | |
98 <li>Install JRE 1.3 or higher.</li> | |
99 <li>Install Xalan, if necessary.</li> | |
100 <li>Download <code>html2db-1.zip</code> from <a href="http://osteele.com/sources/html2db.zip">http://osteele.com/sources/html2db-1.zip</a>.</li> | |
101 <li>Unzip <code>html2db-1.zip</code>.</li> | |
102 </ul> | |
103 | |
104 <h2>Usage</h2> | |
105 <p>Use Xalan to process an XHTML source file into a Docbook file:</p> | |
106 | |
107 <pre class="example"> | |
108 java org.apache.xalan.xslt.Process -XSL html2dbk.xsl -IN doc.html > doc.xml | |
109 </pre> | |
110 | |
111 <p>See <a href="index.src.html"><code>index.src.html</code></a> for an | |
112 example of an input file.</p> | |
113 | |
114 <p>If your source files are in HTML, not XHTML, you may find the <a | |
115 href="http://tidy.sourceforge.net/">Tidy</a> tool useful. This is a | |
116 tool that converts from HTML to XHTML, and can be added to the front | |
117 of your processing pipeline.</p> | |
118 | |
119 <p>(If you need to process HTML and you don't know or can't figure out | |
120 from context what a processing pipeline is, &html2db; is probably not | |
121 the right tool for you, and you should look for a local XML or Java | |
122 guru or for a commercially supported product.)</p> | |
123 | |
124 <h2>Specification</h2> | |
125 | |
126 <h3>XHTML Elements</h3> | |
127 <p><code>code/i</code> stands for "an <code>i</code> element | |
128 immediately within a <code>code</code> element". This notation is | |
129 from XPath.</p> | |
130 | |
131 <p>XHTML elements must be in the XHTML Transitional namespace, | |
132 <code>http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd</code>.</p> | |
133 | |
134 <table> | |
135 <tr> | |
136 <th>XHTML</th> | |
137 <th>Docbook</th> | |
138 <th>Notes</th> | |
139 </tr> | |
140 | |
141 <tr> | |
142 <td><code>b</code>, <code>i</code>, <code>em</code>, <code>strong</code></td> | |
143 <td><code>emphasis</code></td> | |
144 <td>The <code>role</code> attribute is the original tag name</td> | |
145 </tr> | |
146 | |
147 <tr> | |
148 <td><code>dfn</code></td> | |
149 <td><code>glossitem</code>, and also <code>primary</code> <code>indexterm</code></td> | |
150 </tr> | |
151 | |
152 <tr> | |
153 <td><code>code/i</code>, <code>tt/i</code>, <code>pre/i</code></td> | |
154 <td><code>replaceable</code></td> | |
155 <td>In practice, <code>i</code> within a monospace content is usually used to mean replaceable text. If you're using it for emphasis, use <code>em</code> instead.</td> | |
156 </tr> | |
157 | |
158 <tr> | |
159 <td><code>pre</code>, <code>body/code</code></td> | |
160 <td><code>programlisting</code></td> | |
161 </tr> | |
162 | |
163 <tr> | |
164 <td><code>img</code></td> | |
165 <td><code>inlinemediaobject/imageobject/imagedata</code></td> | |
166 <td>In an inline context.</td> | |
167 </tr> | |
168 | |
169 <tr> | |
170 <td><code>img</code></td> | |
171 <td><code>[informal]figure/mediaobject/imageobject/imagedata</code></td> | |
172 <td>If it has a <code>title</code> attribute or <code>db:title</code> it's wrapped in a <code>figure</code>. Otherwise it's wrapped in an <code>informalfigure</code>.</td> | |
173 </tr> | |
174 | |
175 <tr> | |
176 <td><code>table</code></td> | |
177 <td><code>[informal]table</code></td> | |
178 <td>XHTML <code>table</code> becomes Docbook <code>table</code> if it has a <code>summary</code> attribute; <code>informaltable</code> otherwise.</td> | |
179 </tr> | |
180 | |
181 <tr> | |
182 <td><code>ul</code></td> | |
183 <td><code>itemizedlist</code></td> | |
184 <td>But see the processing instruction <a href="#simplelist">below</a>.</td> | |
185 </tr> | |
186 </table> | |
187 | |
188 | |
189 | |
190 <h3>Links</h3> | |
191 <table summary="Link Translation"> | |
192 <tr> | |
193 <th>XHTML</th> | |
194 <th>Docbook</th> | |
195 <th>Notes</th> | |
196 </tr> | |
197 | |
198 <tr> | |
199 <td><code><a name="<var>name</var>"></code></td> | |
200 <td><code><anchor id="{$anchor-id-prefix}<var>name</var>"></code></td> | |
201 <td>An anchor within a <code>h<var>n</var></code> element is attached to the enclosing <code>section</code> as an <code>id</code> attribute instead.</td> | |
202 </tr> | |
203 | |
204 <tr> | |
205 <td><code><a href="#<var>name</var>"></code></td> | |
206 <td><code><link linkend="{$anchor-id-prefix}<var>name</var>"></code></td> | |
207 </tr> | |
208 | |
209 <tr> | |
210 <td><code><a href="<var>url</var>"></code></td> | |
211 <td><code><ulink url="<var>name</var>"></code></td> | |
212 </tr> | |
213 | |
214 <tr> | |
215 <td><code><a name="mailto:<var>address</var>"></code></td> | |
216 <td><code><email><var>address</var></email></code></td> | |
217 </tr> | |
218 | |
219 </table> | |
220 | |
221 <h3 id="tables">Tables</h3> | |
222 | |
223 <p>XHTML <code>table</code> support is minimal. &html2db; changes the | |
224 element names and counts the columns (this is necessary to get table | |
225 footnotes to span all the columns), but it does not attempt to deal | |
226 with tables in their full generality.</p> | |
227 | |
228 <p>An XHTML <code>table</code> with a <code>summary</code> attribute | |
229 generates a <code>table</code>, whose <code>title</code> is the value | |
230 of that summary. An XHTML <code>table</code> without a | |
231 <code>summary</code> generates an <code>informaltable</code>.</p> | |
232 | |
233 <p>Any <code>tr</code>s that contain <code>th</code>s are pulled to | |
234 the top of the table, and placed inside a <code>thead</code>. Other | |
235 <code>tr</code>s are placed inside a <code>tbody</code>. This matches | |
236 the commanon XHTML <code>table</code> pattern, where the first row is | |
237 a header row.</p> | |
238 | |
239 <h3 id="implicit-blocks">Implicit Blocks</h3> | |
240 <p>XHTML allows <code>li</code>, <code>dd</code>, and <code>td</code> | |
241 elements to contain either inline text (for instance, | |
242 <code><li>a list item</li></code>) or block structure | |
243 (<code><li><p>a block</p></li></code>). The | |
244 corresponding Docbook elements require block structure, such as | |
245 <code>para</code>.</p> | |
246 | |
247 <p>&html2db; provides limited support for wrapping naked text in | |
248 these positions in <code>para</code> elements. If a list item or | |
249 table cell item directly contains text, all text up to the position of | |
250 the first element (or all text, if there is no element) is wrapped in | |
251 <code>para</code>. This handles the simple case of an item that | |
252 directly contains text, and also the case of an item that contains | |
253 text followed by blocks such as paragraphs.</p> | |
254 | |
255 <p>Note that this algorithm is easily confused. It doesn't | |
256 distinguish between block and inline XHTML elements, so it will only | |
257 wrap the first word in <code><li>some <b>bold</b> | |
258 text</li></code>, leading to badly formatted output. Twhe | |
259 workaround is to wrap troublesome content in explicit | |
260 <code><p></code> tags.</p> | |
261 | |
262 <h3 id="docbook-elements">Docbook Elements</h3> | |
263 | |
264 <p>Elements from the Docbook namespace are passed through as is. | |
265 There are two ways to include a Docbook element in your XHTML | |
266 source:</p> | |
267 | |
268 <dl> | |
269 <dt>Global prefix</dt> | |
270 <dd><p>A <dfn>fake Docbook namespace</dfn><db:footnote><p>The fake | |
271 Docbook namespace is <code>urn:docbook</code>. Docbook doesn't really | |
272 have a namespace, and if it did, it wouldn't be this one. See <a | |
273 href="#docbook-namespace">Docbook namespace</a> for a discussion of | |
274 this issue.</p></db:footnote> | |
275 | |
276 declaration may be added to the document root element. Anywhere in | |
277 the document, the prefix from this namespace declaration may be used | |
278 to include a Docbook element. This is useful if a document contains | |
279 many Docbook elements, such as <code>footnote</code> or | |
280 <code>glossterm</code>, interspersed with XHTML. (In this case it may | |
281 be more convenient to allow these elements in the XHMTL namespace and | |
282 add a customization layer that translates them to docbook elements, | |
283 however. See <a href="#customization">Customization</a>.)</p> | |
284 | |
285 <pre class="example"><![CDATA[ | |
286 <html xmlns="http://www.w3.org/1999/xhtml" | |
287 xmlns:db="urn:docbook"> | |
288 ... | |
289 <p>Some text<db:footnote>and a footnote</db:footnote>.</p> | |
290 ]]></pre></dd> | |
291 | |
292 <dt>Local namespace</dt> | |
293 <dd><p>A Docbook element may be introduced along with a prefix-less | |
294 namespace declaration. This is useful for embedding a Docbook | |
295 document fragment (a hierarchy of elements that all use Docbook tags) | |
296 within of a XHTML document.</p> | |
297 | |
298 <pre class="example"><![CDATA[ | |
299 ... | |
300 <articleinfo xmlns="urn:docbook"> | |
301 <author> | |
302 <firstname>...</firstname> | |
303 ... | |
304 ]]></pre></dd> | |
305 </dl> | |
306 | |
307 <p>The source to <a href="index.src.html">this document</a> | |
308 illustrates both of these techniques.</p> | |
309 | |
310 <p class="note">Both these techniques will cause your document to be | |
311 invalid as XHTML. In order to validate an XHTML document that | |
312 contains Docbook elements, you will need to create a custom schema. | |
313 Technically, you then ought to place your document in a different | |
314 namespace, but this will cause &html2db; not to recognize it!</p> | |
315 | |
316 | |
317 <h3>Output Processing Instructions</h3> | |
318 | |
319 <p>&html2db; adds a few of processing instructions to the output file. | |
320 The Docbook XSL stylesheets ignore these, but if you write a | |
321 customization layer for Docbook XSL, you can use the information in | |
322 these processing instructions to customize the HTML output. This can | |
323 be used, for example, to set the <code>a</code> <code>onclick</code> | |
324 and <code>target</code> attributes in the HTML files that Docbook XSL | |
325 creates to the same values they had in the input document.</p> | |
326 | |
327 <dl> | |
328 <dt><code><?html2db attribute="<var>name</var>" value="<var>value</var>"?></code></dt> | |
329 <dd>Placed inside a link element to capture the value of the <code>a</code> <code>target</code> and <code>onclick</code> attributes. <var>name</var> is the name of the attribute (<code>target</code> or <code>onclick</code>), and <var>value</var> is its value, with <code>"</code> and <code>\</code> replaced by <code>\"</code> and <code>\\</code>, respectively.</dd> | |
330 | |
331 <dt><code><?html2db element="br"?></code></dt> | |
332 <dd>Represents the location of an XHTML <code>br</code> element in the | |
333 source document.</dd> | |
334 | |
335 </dl> | |
336 | |
337 <p>You can also include <code><?db2html?></code> processing | |
338 instructions in the HTML source document, and they will be copied | |
339 through to the Docbook output file unchanged (as will all other | |
340 processing instructions).</p> | |
341 | |
342 | |
343 <h2 id="customization">Customization</h2> | |
344 <h3>XSLT Parameters</h3> | |
345 <dl> | |
346 <dt><code><xsl:param name="anchor-id-prefix" select="''/></code></dt> | |
347 <dd>Prefixed to every id generated from <code><a name=></code> | |
348 and <code><a href="#"></code>. This is useful to avoid | |
349 collisions between multiple documents that are compiled into the | |
350 same book. For instance, if a number of XHTML sources are assembled | |
351 into chapters of a book, you style each source file with a prefix of | |
352 <code><var>docid</var>.</code> where <var>docid</var> is a unique id | |
353 for each source file.</dd> | |
354 | |
355 <dt><code><xsl:param name="document-root" select="'article'"/></code></dt> | |
356 <dd>The default document root. This can be overridden by | |
357 <code><?html2db class="<var>name</var>"></code> within the | |
358 document itself, and defaults to <code>article</code>.</dd> | |
359 </dl> | |
360 | |
361 <h3 id="processing-instructions">Processing instructions</h3> | |
362 <p>Use the <code><?html2db?></code> processing instruction to | |
363 customize the transformation of the XHTML source to Docbook:</p> | |
364 | |
365 <table> | |
366 <tr> | |
367 <th>Processing instruction</th> | |
368 <th>Content</th> | |
369 <th>Effect</th> | |
370 </tr> | |
371 | |
372 <tr> | |
373 <td><code><?html2db class="<var>xxx</var>"?></code></td> | |
374 <td><code>body</code></td> | |
375 <td>Sets the output document root to <var>xxx</var>. Useful for | |
376 translating to <code>prefix</code>, <code>appendix</code>, or <code>chapter</code>; the default is | |
377 <var>$document-root</var>.</td> | |
378 </tr> | |
379 | |
380 <tr id="simplelist"> | |
381 <td><code><?html2db class="simplelist"?></code></td> | |
382 <td><code>ul</code></td> | |
383 <td>Creates a vertical <code>simplelist</code>.<db:footnote><db:para>Note that the | |
384 current implementation simply checks for the presence of <em>any</em> | |
385 <code>html2db</code> processing instruction.</db:para></db:footnote></td> | |
386 </tr> | |
387 | |
388 | |
389 <tr> | |
390 <td><code><?html2db rowsep="1"?></code></td> | |
391 <td><code>[informal]table</code></td> | |
392 <td>Sets the <code>rowsep</code> attribute on the generated <code>table</code>.<db:footnote><db:para>Note that the current implementation simply checks for the presence of <em>any</em> <code>html2db</code> processing instruction that begins with <code>rowsep</code>, and assumes the vlaue is <code>1</code>.</db:para></db:footnote></td> | |
393 </tr> | |
394 </table> | |
395 | |
396 <h3 id="embedding">Overriding the built-in templates</h3> | |
397 <p>For cases where the previous techniques don't allow for enough | |
398 customization, you can override the builtin templates. You will need | |
399 to know XSLT in order to do this, and you will need to write a new | |
400 stylesheet that uses the <code>xsl:import</code> element to import | |
401 <code>html2db.xsl</code>.</p> | |
402 | |
403 <p>The <a href="examples.xsl"><code>example.xsl</code></a> stylesheet | |
404 is an example customization layer. It recognizes the <code><div | |
405 class="abstract"></code> and <code><p class="note"></code> | |
406 classes in the <a href="index.src.html">source</a> for this document, | |
407 and generates the corresponding Docbook elements.</p> | |
408 | |
409 | |
410 <h2>FAQ</h2> | |
411 <h3>Why generate Docbook?</h3> | |
412 <p>The primary reason to use Docbook as an <em>output</em> format is | |
413 to take advantage of the Docbook XSL stylesheets. These are a | |
414 well-designed, well-documented set of XSL stylesheets that provide a | |
415 variety of publishing features that would be difficult to recreate | |
416 from scratch for HTML:</p> | |
417 | |
418 <ul> | |
419 <li>Automatic Table-of-Contents generation</li> | |
420 <li>Automatic part, chapter, and section numbering.</li> | |
421 <li>Creation of single-page, multi-page, PDF, and WinHelp files from the same source document.</li> | |
422 <li>Navigation headers, footers, and metadata for multi-page HTML | |
423 documents.</li> | |
424 <li>Link resolution and link target text insertion across multiple pages and numbered targets.</li> | |
425 <li>Figure, example, and table numbering, and tables of these.</li> | |
426 <li>Index and glossary tools.</li> | |
427 </ul> | |
428 | |
429 <h3>Why write in XHTML?</h3> | |
430 | |
431 <p>Given that Docbook is so great, why not write in it?</p> | |
432 | |
433 <p>Where there are not legacy concerns, Docbook is probably a better | |
434 choice for structured or technical documentation.</p> | |
435 | |
436 <p>Where the only legacy concern is the documents themselves, and not | |
437 the tools and skill sets of documentation contributors, you should | |
438 consider using an (X)HMTL convertor to perform a one-time conversion | |
439 of your documentation source into Docbook, and then switching | |
440 development to the result files. You can use this stylesheet to | |
441 perform this conversion, or evaluate other tools, many of which are | |
442 probably appropriate for this purpose.</p> | |
443 | |
444 <p>Often there are other legacy concerns: the availability of cheap | |
445 (including free) and usable HTML editors and editing modes; and the | |
446 fact that it's easier to teach people XHTML than Docbook. If either | |
447 of this is an issue in your organization, you may want to maintain | |
448 documentation sources in XHTML instead of Docbook</p> | |
449 | |
450 <p>For example, at <a href="http://www.laszlosystems.com/">Laszlo</a>, | |
451 most developers contribute directly to the documentation. Requiring | |
452 that developers learn Docbook, or that they wait on the doc team to | |
453 get content into the docs, would discourage this.</p> | |
454 | |
455 <h3>Why not use an existing convertor?</h3> | |
456 | |
457 <p>This isn't the first (X)HTML to Docbook convertor. Why not use one | |
458 of the exisitng ones?</p> | |
459 | |
460 <p>Each HTML to Docbook convertors that I could find had at least some | |
461 of the following limitations, some of which stemmed from their | |
462 intended use as one-time-only convertors for legacy documents:</p> | |
463 | |
464 <ul> | |
465 <li>Many only operated on a subset of HTML, and relied upon hand | |
466 editing of the output to clean up mistakes. This made them impossible | |
467 to use as part of a processing pipeline, where the source is | |
468 <em>maintained</em> in XHTML.</li> | |
469 | |
470 <li>There was no way to customize the output, except by (1) hand | |
471 editing, or (2) writing a post-processing stylesheet, which didn't | |
472 have access to the information in the XHTML source document.</li> | |
473 | |
474 <li>Many of them were difficult or impossible to customize and | |
475 extend. They were closed-source, or written in Java or Perl (which I | |
476 find to be a difficult languages to use for customizing this kind of | |
477 thing) and embedded in a larger system.</li> | |
478 | |
479 <li>They didn't take full advantage of the Docbook tag set and content | |
480 model to represent document structure. For instance, they didn't | |
481 generate nested <code>section</code> elements to represent | |
482 <code>h1</code> <code>h2</code> sequences, or <code>table</code> to | |
483 represent tables with <code>summary</code> attributes.</li> | |
484 </ul> | |
485 | |
486 <h3>I got this error. What does it mean?</h3> | |
487 <dl> | |
488 <dt>Q. <code>Fatal Error! The element type "br" must be terminated by the matching end-tag "</br>". | |
489 </code></dt> | |
490 <dd>A. Your document is HTML, not <em>X</em>HTML. You need to fix it, or run it through Tidy first.</dd> | |
491 | |
492 <dt>Q. My output document is empty except for the <code><?xml version="1.0" encoding="UTF-8"?></code> line.</dt> | |
493 <dd>A. The document is missing a namespace declaration. See the <a href="index.src.html">example</a> for an example.</dd> | |
494 | |
495 <dt>Q. Some of the headers and document sections are repeated multiple times.</dt> | |
496 <dd>A. The document has out-of-sequence headers, such as <code>h1</code> followed by <code>h3</code> (instead of <code>h2</code>). This won't work.</dd> | |
497 | |
498 <dt>Q. <code>Fatal Error! The prefix "db" for element "db:footnote" is not bound.</code></dt> | |
499 <dd>A. You haven't declared the <code>db</code> namespace prefix. See the <a href="index.src.html">example</a> for an example.</dd> | |
500 | |
501 </dl> | |
502 | |
503 | |
504 <h2>Implementation Notes</h2> | |
505 | |
506 <h3>Bugs</h3> | |
507 <ul> | |
508 <li>Improperly sequenced <code>h<var>n</var></code> (for example | |
509 <code>h1</code> followed by <code>h3</code>, instead of | |
510 <code>h2</code>) will result in duplicate text.</li> | |
511 </ul> | |
512 | |
513 | |
514 <h3>Limitations</h3> | |
515 <ul> | |
516 <li>The <code>id</code> attribute is only preserved for certain | |
517 elements (at least <code>h<var>n</var></code>, images, paragraphs, and | |
518 tables). It ought to be preserved for all of them.</li> | |
519 <li>Only the <a href="#tables">very simplest</a> table format is | |
520 implemented.</li> | |
521 <li>Always uses compact lists.</li> | |
522 <li>The string matching for <code><?html2b | |
523 class="<var>classname</var>"?></code> requires an exact match | |
524 (spaces and all).</li> | |
525 <li>The <a href="#implicit-blocks">implicit blocks</a> code is easily | |
526 confused, as documented in that section. This is | |
527 easy to fix now that I understand the difference between block and | |
528 inline elements (I didn't when I was implementing this), but I | |
529 probably won't do so until I run into the problem again.</li> | |
530 | |
531 </ul> | |
532 | |
533 | |
534 | |
535 | |
536 <h3>Wishlist</h3> | |
537 <ul> | |
538 <li>Allow <code><html2db attribute-name="<var>name</var>" | |
539 value="<var>value</var>"?></code> at any position, to set arbitrary | |
540 Docbook attributes on the generated element.</li> | |
541 | |
542 <li>Use different technique from the <a href="#docbook-elements">fake | |
543 namespace prefix</a> to name Docbook elements in the source, that | |
544 preserves the XHTML validity of the source file. For example, an | |
545 option transform <code><div class="db:footnote"></code> into | |
546 <code><footnote></code>, or to use a processing attribute | |
547 (<code><div><?html2db classname="footnote"?></code>).</li> | |
548 | |
549 <li>Parse DC metadata from XHTML <code>html/head/meta</code>.</li> | |
550 | |
551 <li>Add an option to use <code>html/head/title</code> instead of | |
552 <code>html/body/h1[1]</code> for top title.</li> | |
553 | |
554 <li>Allow an <code>id</code> on every element.</li> | |
555 | |
556 <li>Add an option to translate the XHTML <code>class</code> into a | |
557 Docbook <code>role</code>.</li> | |
558 | |
559 <li>Preserve more of the whitespace from the source document &emdash; especially within lists and tables &emdash; in order to make it easier to debug the output document.</li> | |
560 | |
561 <h3>Support</h3> | |
562 <p>This is a work in progress. It serves my needs, but doesn't | |
563 attempt to be much more general than that. If you run into anything | |
564 it can't handle, please send a note, or better yet, a patch, to <a | |
565 href="mailto:steele@osteele.com">steele@osteele.com</a>. I can't | |
566 promise to address problems (I have a day job too), but knowing what | |
567 people have run into will help my prioritize my work when I do have | |
568 time to work on this.</p> | |
569 | |
570 | |
571 </ul> | |
572 | |
573 | |
574 <h3>Design Notes</h3> | |
575 <h4 id="docbook-namespace">The Docbook Namespace</h4> | |
576 <p>&html2db; accepts elements in the "Docbook namespace" in XHTML | |
577 source. This namespace is <code>urn:docbook</code>.</p> | |
578 | |
579 <p>This isn't technically correct. Docbook doesn't really have a | |
580 namespace, and if it did, it wouldn't be this one. <a | |
581 href="http://www.faqs.org/rfcs/rfc3151.html">RFC 3151</a> suggests | |
582 <code>urn:publicid:-:OASIS:DTD+DocBook+XML+V4.1.2:EN</code> as the | |
583 Docbook namespace.</p> | |
584 | |
585 <p>There two problems with the RFC 3151 namespace. First, it's long | |
586 and hard to remember. Second, it's limited to Docbook v4.1.2 &emdash; | |
587 but &html2db; works with other versions of Docbook too, which would | |
588 presumably have other namespaces. I think it's more useful to | |
589 <em>under</em>specify the Docbook version in the spec for this tool. | |
590 Docbook itself underspecifies the version completely, by avoiding a | |
591 namespace at all, but when mixing Docbook and XHTML elements I find it | |
592 useful to be <em>more</em> specific than that.</p> | |
593 | |
594 <h3>History</h3> | |
595 <p>The original version of &html2db; was written by <a | |
596 href="http://osteele.com">Oliver Steele</a>, as part of the <a | |
597 href="http://laszlosystems.com">Laszlo Systems, Inc.</a> documentation | |
598 effort. We had a set of custom stylesheets that formatted and added | |
599 linking information to programming-language elements such as | |
600 <code>classname</code> and <code>tagname</code>, and added | |
601 Table-of-Contents to chapter documentation and numbers examples.</p> | |
602 | |
603 <p>As the documentation set grew, the doc team (John Sundman) | |
604 requested features such as inter-chapter navigation, callouts, and | |
605 index and glossary elements. I was able to beat all of these back | |
606 except for navigation, which seemed critical. After a few days trying | |
607 to implement this, I decided it would be simpler to convert the subset | |
608 of XHTML that we used into a subset of Docbook, and use the latter to | |
609 add navigation. (Once this was done, the other features came for | |
610 free.)</p> | |
611 | |
612 <p>During my August 2004 "sabbatical", I factored the general html2db | |
613 code out from the Laszlo-specific code, refactored and otherwise | |
614 cleaned it up, and wrote this documentation.</p> | |
615 | |
616 <h3>Credits</h3> | |
617 <p>&html2db; was written by <a href="http://osteele.com">Oliver Steele</a>, as part of the <a href="http://laszlosystems.com">Laszlo Systems, Inc.</a> documentation effort.</p> | |
618 | |
619 </body> | |
620 </html> |