comparison doc/wiki2docbook/html2db/index.src.html @ 1773:2ae81598b254

scripts for converting wiki documentation to docbook
author nadvornik
date Sun, 22 Nov 2009 09:12:22 +0000
parents
children
comparison
equal deleted inserted replaced
1772:9f3b7a089caf 1773:2ae81598b254
1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
2 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" [
3 <!ENTITY html2db "<code>html2db.xsl</code>">
4 ]>
5 <html xmlns:x="http://www.w3.org/1999/xhtml"
6 xmlns:db="urn:docbook">
7 <head>
8 <title>This title is ignored</title>
9 </head>
10 <body>
11
12 <h1>html2db.xsl</h1>
13
14 <!-- The xmlns attribute escapes into the Docbook namespace -->
15 <articleinfo xmlns="urn:docbook">
16 <author>
17 <firstname>Oliver</firstname>
18 <surname>Steele</surname>
19 </author>
20 <revhistory>
21 <revision>
22 <revnumber>1</revnumber>
23 <date>2004-07-30</date>
24 </revision>
25 <revision>
26 <revnumber>1.0.1</revnumber>
27 <date>2004-08-01</date>
28 <revdescription><para>Editorial changes to the
29 readme.</para></revdescription>
30 </revision>
31 </revhistory>
32 <date>2004-07-30</date>
33 </articleinfo>
34
35 <h2>Overview</h2>
36
37 <p>&html2db; converts an XHTML source document into a Docbook output
38 document. It provides features for customizing the generation of the
39 output, so that the output can be tuned by annotating
40 the source, rather than hand-editing the output. This makes it useful
41 in a processing pipeline where the source documents are maintained in
42 HTML, although it can be used as a one-time conversion tool
43 too.</p>
44
45 <p>This document is an example of &html2db; used in conjunction with
46 the Docbook XSL stylesheets. The <a href="index.src.html">source
47 file</a> is an XHTML file with some embedded Docbook elements and
48 processing instructions. &html2db; compiles it into a <a
49 href="index.xml">Docbook document</a>, which can be used to generate
50 this output file (which includes a Table of Contents), a <a
51 href="docs/index.html">chunked HTML file</a>, a <a
52 href="html2db.pdf">PDF</a>, or other formats.</p>
53
54 <h2>Features</h2>
55 <dl>
56 <dt>XSLT implementation</dt>
57 <dd>This tool is designed to be embedded within an XSLT processing
58 pipeline. <code>html2html.xslt</code> can be used in a custom
59 stylesheet or integrated into a larger system. See <a
60 href="#embedding">Overriding</a>.</dd>
61
62 <dt>Customizable</dt>
63 <dd>The output can be customized by the means of additonal markup in
64 the XHMTL source. See the section on <a
65 href="#customization">customization</a>.</dd>
66
67 <dt>Creates outline structure</dt>
68 <dd><code>h1</code>, <code>h2</code>, etc. are turned into nested
69 <code>section</code> and <code>title</code> elements (as opposed to
70 bridge heads).</dd>
71
72 <dt>Accepts a wide variety of XHTML</dt>
73 <dd>In particular, &html2db; automatically wraps <dfn>naked item
74 text</dfn> (text that is not enclosed in a <code>&lt;p&gt;</code>)
75 inside a table cell or list item. Naked text is a common property of
76 XHTML documents, but needs to be clothed to create valid
77 Docbook.<db:footnote><p>This feature is limited. See <a
78 href="#implicit-blocks">Implicit Blocks</a>.)</p></db:footnote></dd>
79
80 </dl>
81
82 <h2>Requirements</h2>
83 <ul>
84 <li>Java: JRE or JDK 1.3 or greater.</li>
85 <li>Xalan 2.5.0.</li>
86 <li>Familiarity with installing and running JAR files.</li>
87 </ul>
88
89 <p>&html2db; might work with earlier versions of Java and Xalan, and
90 it might work with other XSLT processors such as Saxon and
91 xsltproc.</p>
92
93 <h2>License</h2>
94 <p>This software is released under the Open Source <a href="http://www.opensource.org/licenses/artistic-license.php">Artistic License</a>.</p>
95
96 <h2>Installation</h2>
97 <ul>
98 <li>Install JRE 1.3 or higher.</li>
99 <li>Install Xalan, if necessary.</li>
100 <li>Download <code>html2db-1.zip</code> from <a href="http://osteele.com/sources/html2db.zip">http://osteele.com/sources/html2db-1.zip</a>.</li>
101 <li>Unzip <code>html2db-1.zip</code>.</li>
102 </ul>
103
104 <h2>Usage</h2>
105 <p>Use Xalan to process an XHTML source file into a Docbook file:</p>
106
107 <pre class="example">
108 java org.apache.xalan.xslt.Process -XSL html2dbk.xsl -IN doc.html &gt; doc.xml
109 </pre>
110
111 <p>See <a href="index.src.html"><code>index.src.html</code></a> for an
112 example of an input file.</p>
113
114 <p>If your source files are in HTML, not XHTML, you may find the <a
115 href="http://tidy.sourceforge.net/">Tidy</a> tool useful. This is a
116 tool that converts from HTML to XHTML, and can be added to the front
117 of your processing pipeline.</p>
118
119 <p>(If you need to process HTML and you don't know or can't figure out
120 from context what a processing pipeline is, &html2db; is probably not
121 the right tool for you, and you should look for a local XML or Java
122 guru or for a commercially supported product.)</p>
123
124 <h2>Specification</h2>
125
126 <h3>XHTML Elements</h3>
127 <p><code>code/i</code> stands for "an <code>i</code> element
128 immediately within a <code>code</code> element". This notation is
129 from XPath.</p>
130
131 <p>XHTML elements must be in the XHTML Transitional namespace,
132 <code>http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd</code>.</p>
133
134 <table>
135 <tr>
136 <th>XHTML</th>
137 <th>Docbook</th>
138 <th>Notes</th>
139 </tr>
140
141 <tr>
142 <td><code>b</code>, <code>i</code>, <code>em</code>, <code>strong</code></td>
143 <td><code>emphasis</code></td>
144 <td>The <code>role</code> attribute is the original tag name</td>
145 </tr>
146
147 <tr>
148 <td><code>dfn</code></td>
149 <td><code>glossitem</code>, and also <code>primary</code> <code>indexterm</code></td>
150 </tr>
151
152 <tr>
153 <td><code>code/i</code>, <code>tt/i</code>, <code>pre/i</code></td>
154 <td><code>replaceable</code></td>
155 <td>In practice, <code>i</code> within a monospace content is usually used to mean replaceable text. If you're using it for emphasis, use <code>em</code> instead.</td>
156 </tr>
157
158 <tr>
159 <td><code>pre</code>, <code>body/code</code></td>
160 <td><code>programlisting</code></td>
161 </tr>
162
163 <tr>
164 <td><code>img</code></td>
165 <td><code>inlinemediaobject/imageobject/imagedata</code></td>
166 <td>In an inline context.</td>
167 </tr>
168
169 <tr>
170 <td><code>img</code></td>
171 <td><code>[informal]figure/mediaobject/imageobject/imagedata</code></td>
172 <td>If it has a <code>title</code> attribute or <code>db:title</code> it's wrapped in a <code>figure</code>. Otherwise it's wrapped in an <code>informalfigure</code>.</td>
173 </tr>
174
175 <tr>
176 <td><code>table</code></td>
177 <td><code>[informal]table</code></td>
178 <td>XHTML <code>table</code> becomes Docbook <code>table</code> if it has a <code>summary</code> attribute; <code>informaltable</code> otherwise.</td>
179 </tr>
180
181 <tr>
182 <td><code>ul</code></td>
183 <td><code>itemizedlist</code></td>
184 <td>But see the processing instruction <a href="#simplelist">below</a>.</td>
185 </tr>
186 </table>
187
188
189
190 <h3>Links</h3>
191 <table summary="Link Translation">
192 <tr>
193 <th>XHTML</th>
194 <th>Docbook</th>
195 <th>Notes</th>
196 </tr>
197
198 <tr>
199 <td><code>&lt;a name="<var>name</var>"&gt;</code></td>
200 <td><code>&lt;anchor id="{$anchor-id-prefix}<var>name</var>"&gt;</code></td>
201 <td>An anchor within a <code>h<var>n</var></code> element is attached to the enclosing <code>section</code> as an <code>id</code> attribute instead.</td>
202 </tr>
203
204 <tr>
205 <td><code>&lt;a href="#<var>name</var>"&gt;</code></td>
206 <td><code>&lt;link linkend="{$anchor-id-prefix}<var>name</var>"&gt;</code></td>
207 </tr>
208
209 <tr>
210 <td><code>&lt;a href="<var>url</var>"&gt;</code></td>
211 <td><code>&lt;ulink url="<var>name</var>"&gt;</code></td>
212 </tr>
213
214 <tr>
215 <td><code>&lt;a name="mailto:<var>address</var>"&gt;</code></td>
216 <td><code>&lt;email&gt;<var>address</var>&lt;/email&gt;</code></td>
217 </tr>
218
219 </table>
220
221 <h3 id="tables">Tables</h3>
222
223 <p>XHTML <code>table</code> support is minimal. &html2db; changes the
224 element names and counts the columns (this is necessary to get table
225 footnotes to span all the columns), but it does not attempt to deal
226 with tables in their full generality.</p>
227
228 <p>An XHTML <code>table</code> with a <code>summary</code> attribute
229 generates a <code>table</code>, whose <code>title</code> is the value
230 of that summary. An XHTML <code>table</code> without a
231 <code>summary</code> generates an <code>informaltable</code>.</p>
232
233 <p>Any <code>tr</code>s that contain <code>th</code>s are pulled to
234 the top of the table, and placed inside a <code>thead</code>. Other
235 <code>tr</code>s are placed inside a <code>tbody</code>. This matches
236 the commanon XHTML <code>table</code> pattern, where the first row is
237 a header row.</p>
238
239 <h3 id="implicit-blocks">Implicit Blocks</h3>
240 <p>XHTML allows <code>li</code>, <code>dd</code>, and <code>td</code>
241 elements to contain either inline text (for instance,
242 <code>&lt;li&gt;a list item&lt;/li&gt;</code>) or block structure
243 (<code>&lt;li&gt;&lt;p&gt;a block&lt;/p&gt;&lt;/li&gt;</code>). The
244 corresponding Docbook elements require block structure, such as
245 <code>para</code>.</p>
246
247 <p>&html2db; provides limited support for wrapping naked text in
248 these positions in <code>para</code> elements. If a list item or
249 table cell item directly contains text, all text up to the position of
250 the first element (or all text, if there is no element) is wrapped in
251 <code>para</code>. This handles the simple case of an item that
252 directly contains text, and also the case of an item that contains
253 text followed by blocks such as paragraphs.</p>
254
255 <p>Note that this algorithm is easily confused. It doesn't
256 distinguish between block and inline XHTML elements, so it will only
257 wrap the first word in <code>&lt;li&gt;some &lt;b&gt;bold&lt;/b&gt;
258 text&lt;/li&gt;</code>, leading to badly formatted output. Twhe
259 workaround is to wrap troublesome content in explicit
260 <code>&lt;p&gt;</code> tags.</p>
261
262 <h3 id="docbook-elements">Docbook Elements</h3>
263
264 <p>Elements from the Docbook namespace are passed through as is.
265 There are two ways to include a Docbook element in your XHTML
266 source:</p>
267
268 <dl>
269 <dt>Global prefix</dt>
270 <dd><p>A <dfn>fake Docbook namespace</dfn><db:footnote><p>The fake
271 Docbook namespace is <code>urn:docbook</code>. Docbook doesn't really
272 have a namespace, and if it did, it wouldn't be this one. See <a
273 href="#docbook-namespace">Docbook namespace</a> for a discussion of
274 this issue.</p></db:footnote>
275
276 declaration may be added to the document root element. Anywhere in
277 the document, the prefix from this namespace declaration may be used
278 to include a Docbook element. This is useful if a document contains
279 many Docbook elements, such as <code>footnote</code> or
280 <code>glossterm</code>, interspersed with XHTML. (In this case it may
281 be more convenient to allow these elements in the XHMTL namespace and
282 add a customization layer that translates them to docbook elements,
283 however. See <a href="#customization">Customization</a>.)</p>
284
285 <pre class="example"><![CDATA[
286 <html xmlns="http://www.w3.org/1999/xhtml"
287 xmlns:db="urn:docbook">
288 ...
289 <p>Some text<db:footnote>and a footnote</db:footnote>.</p>
290 ]]></pre></dd>
291
292 <dt>Local namespace</dt>
293 <dd><p>A Docbook element may be introduced along with a prefix-less
294 namespace declaration. This is useful for embedding a Docbook
295 document fragment (a hierarchy of elements that all use Docbook tags)
296 within of a XHTML document.</p>
297
298 <pre class="example"><![CDATA[
299 ...
300 <articleinfo xmlns="urn:docbook">
301 <author>
302 <firstname>...</firstname>
303 ...
304 ]]></pre></dd>
305 </dl>
306
307 <p>The source to <a href="index.src.html">this document</a>
308 illustrates both of these techniques.</p>
309
310 <p class="note">Both these techniques will cause your document to be
311 invalid as XHTML. In order to validate an XHTML document that
312 contains Docbook elements, you will need to create a custom schema.
313 Technically, you then ought to place your document in a different
314 namespace, but this will cause &html2db; not to recognize it!</p>
315
316
317 <h3>Output Processing Instructions</h3>
318
319 <p>&html2db; adds a few of processing instructions to the output file.
320 The Docbook XSL stylesheets ignore these, but if you write a
321 customization layer for Docbook XSL, you can use the information in
322 these processing instructions to customize the HTML output. This can
323 be used, for example, to set the <code>a</code> <code>onclick</code>
324 and <code>target</code> attributes in the HTML files that Docbook XSL
325 creates to the same values they had in the input document.</p>
326
327 <dl>
328 <dt><code>&lt;?html2db attribute="<var>name</var>" value="<var>value</var>"?&gt;</code></dt>
329 <dd>Placed inside a link element to capture the value of the <code>a</code> <code>target</code> and <code>onclick</code> attributes. <var>name</var> is the name of the attribute (<code>target</code> or <code>onclick</code>), and <var>value</var> is its value, with <code>"</code> and <code>\</code> replaced by <code>\"</code> and <code>\\</code>, respectively.</dd>
330
331 <dt><code>&lt;?html2db element="br"?&gt;</code></dt>
332 <dd>Represents the location of an XHTML <code>br</code> element in the
333 source document.</dd>
334
335 </dl>
336
337 <p>You can also include <code>&lt;?db2html?&gt;</code> processing
338 instructions in the HTML source document, and they will be copied
339 through to the Docbook output file unchanged (as will all other
340 processing instructions).</p>
341
342
343 <h2 id="customization">Customization</h2>
344 <h3>XSLT Parameters</h3>
345 <dl>
346 <dt><code>&lt;xsl:param name="anchor-id-prefix" select="''/&gt;</code></dt>
347 <dd>Prefixed to every id generated from <code>&lt;a name=&gt;</code>
348 and <code>&lt;a href="#"&gt;</code>. This is useful to avoid
349 collisions between multiple documents that are compiled into the
350 same book. For instance, if a number of XHTML sources are assembled
351 into chapters of a book, you style each source file with a prefix of
352 <code><var>docid</var>.</code> where <var>docid</var> is a unique id
353 for each source file.</dd>
354
355 <dt><code>&lt;xsl:param name="document-root" select="'article'"/&gt;</code></dt>
356 <dd>The default document root. This can be overridden by
357 <code>&lt;?html2db class="<var>name</var>"&gt;</code> within the
358 document itself, and defaults to <code>article</code>.</dd>
359 </dl>
360
361 <h3 id="processing-instructions">Processing instructions</h3>
362 <p>Use the <code>&lt;?html2db?&gt;</code> processing instruction to
363 customize the transformation of the XHTML source to Docbook:</p>
364
365 <table>
366 <tr>
367 <th>Processing instruction</th>
368 <th>Content</th>
369 <th>Effect</th>
370 </tr>
371
372 <tr>
373 <td><code>&lt;?html2db class="<var>xxx</var>"?&gt;</code></td>
374 <td><code>body</code></td>
375 <td>Sets the output document root to <var>xxx</var>. Useful for
376 translating to <code>prefix</code>, <code>appendix</code>, or <code>chapter</code>; the default is
377 <var>$document-root</var>.</td>
378 </tr>
379
380 <tr id="simplelist">
381 <td><code>&lt;?html2db class="simplelist"?&gt;</code></td>
382 <td><code>ul</code></td>
383 <td>Creates a vertical <code>simplelist</code>.<db:footnote><db:para>Note that the
384 current implementation simply checks for the presence of <em>any</em>
385 <code>html2db</code> processing instruction.</db:para></db:footnote></td>
386 </tr>
387
388
389 <tr>
390 <td><code>&lt;?html2db rowsep="1"?&gt;</code></td>
391 <td><code>[informal]table</code></td>
392 <td>Sets the <code>rowsep</code> attribute on the generated <code>table</code>.<db:footnote><db:para>Note that the current implementation simply checks for the presence of <em>any</em> <code>html2db</code> processing instruction that begins with <code>rowsep</code>, and assumes the vlaue is <code>1</code>.</db:para></db:footnote></td>
393 </tr>
394 </table>
395
396 <h3 id="embedding">Overriding the built-in templates</h3>
397 <p>For cases where the previous techniques don't allow for enough
398 customization, you can override the builtin templates. You will need
399 to know XSLT in order to do this, and you will need to write a new
400 stylesheet that uses the <code>xsl:import</code> element to import
401 <code>html2db.xsl</code>.</p>
402
403 <p>The <a href="examples.xsl"><code>example.xsl</code></a> stylesheet
404 is an example customization layer. It recognizes the <code>&lt;div
405 class="abstract"&gt;</code> and <code>&lt;p class="note"&gt;</code>
406 classes in the <a href="index.src.html">source</a> for this document,
407 and generates the corresponding Docbook elements.</p>
408
409
410 <h2>FAQ</h2>
411 <h3>Why generate Docbook?</h3>
412 <p>The primary reason to use Docbook as an <em>output</em> format is
413 to take advantage of the Docbook XSL stylesheets. These are a
414 well-designed, well-documented set of XSL stylesheets that provide a
415 variety of publishing features that would be difficult to recreate
416 from scratch for HTML:</p>
417
418 <ul>
419 <li>Automatic Table-of-Contents generation</li>
420 <li>Automatic part, chapter, and section numbering.</li>
421 <li>Creation of single-page, multi-page, PDF, and WinHelp files from the same source document.</li>
422 <li>Navigation headers, footers, and metadata for multi-page HTML
423 documents.</li>
424 <li>Link resolution and link target text insertion across multiple pages and numbered targets.</li>
425 <li>Figure, example, and table numbering, and tables of these.</li>
426 <li>Index and glossary tools.</li>
427 </ul>
428
429 <h3>Why write in XHTML?</h3>
430
431 <p>Given that Docbook is so great, why not write in it?</p>
432
433 <p>Where there are not legacy concerns, Docbook is probably a better
434 choice for structured or technical documentation.</p>
435
436 <p>Where the only legacy concern is the documents themselves, and not
437 the tools and skill sets of documentation contributors, you should
438 consider using an (X)HMTL convertor to perform a one-time conversion
439 of your documentation source into Docbook, and then switching
440 development to the result files. You can use this stylesheet to
441 perform this conversion, or evaluate other tools, many of which are
442 probably appropriate for this purpose.</p>
443
444 <p>Often there are other legacy concerns: the availability of cheap
445 (including free) and usable HTML editors and editing modes; and the
446 fact that it's easier to teach people XHTML than Docbook. If either
447 of this is an issue in your organization, you may want to maintain
448 documentation sources in XHTML instead of Docbook</p>
449
450 <p>For example, at <a href="http://www.laszlosystems.com/">Laszlo</a>,
451 most developers contribute directly to the documentation. Requiring
452 that developers learn Docbook, or that they wait on the doc team to
453 get content into the docs, would discourage this.</p>
454
455 <h3>Why not use an existing convertor?</h3>
456
457 <p>This isn't the first (X)HTML to Docbook convertor. Why not use one
458 of the exisitng ones?</p>
459
460 <p>Each HTML to Docbook convertors that I could find had at least some
461 of the following limitations, some of which stemmed from their
462 intended use as one-time-only convertors for legacy documents:</p>
463
464 <ul>
465 <li>Many only operated on a subset of HTML, and relied upon hand
466 editing of the output to clean up mistakes. This made them impossible
467 to use as part of a processing pipeline, where the source is
468 <em>maintained</em> in XHTML.</li>
469
470 <li>There was no way to customize the output, except by (1) hand
471 editing, or (2) writing a post-processing stylesheet, which didn't
472 have access to the information in the XHTML source document.</li>
473
474 <li>Many of them were difficult or impossible to customize and
475 extend. They were closed-source, or written in Java or Perl (which I
476 find to be a difficult languages to use for customizing this kind of
477 thing) and embedded in a larger system.</li>
478
479 <li>They didn't take full advantage of the Docbook tag set and content
480 model to represent document structure. For instance, they didn't
481 generate nested <code>section</code> elements to represent
482 <code>h1</code> <code>h2</code> sequences, or <code>table</code> to
483 represent tables with <code>summary</code> attributes.</li>
484 </ul>
485
486 <h3>I got this error. What does it mean?</h3>
487 <dl>
488 <dt>Q. <code>Fatal Error! The element type "br" must be terminated by the matching end-tag "&lt;/br&gt;".
489 </code></dt>
490 <dd>A. Your document is HTML, not <em>X</em>HTML. You need to fix it, or run it through Tidy first.</dd>
491
492 <dt>Q. My output document is empty except for the <code>&lt;?xml version="1.0" encoding="UTF-8"?&gt;</code> line.</dt>
493 <dd>A. The document is missing a namespace declaration. See the <a href="index.src.html">example</a> for an example.</dd>
494
495 <dt>Q. Some of the headers and document sections are repeated multiple times.</dt>
496 <dd>A. The document has out-of-sequence headers, such as <code>h1</code> followed by <code>h3</code> (instead of <code>h2</code>). This won't work.</dd>
497
498 <dt>Q. <code>Fatal Error! The prefix "db" for element "db:footnote" is not bound.</code></dt>
499 <dd>A. You haven't declared the <code>db</code> namespace prefix. See the <a href="index.src.html">example</a> for an example.</dd>
500
501 </dl>
502
503
504 <h2>Implementation Notes</h2>
505
506 <h3>Bugs</h3>
507 <ul>
508 <li>Improperly sequenced <code>h<var>n</var></code> (for example
509 <code>h1</code> followed by <code>h3</code>, instead of
510 <code>h2</code>) will result in duplicate text.</li>
511 </ul>
512
513
514 <h3>Limitations</h3>
515 <ul>
516 <li>The <code>id</code> attribute is only preserved for certain
517 elements (at least <code>h<var>n</var></code>, images, paragraphs, and
518 tables). It ought to be preserved for all of them.</li>
519 <li>Only the <a href="#tables">very simplest</a> table format is
520 implemented.</li>
521 <li>Always uses compact lists.</li>
522 <li>The string matching for <code>&lt;?html2b
523 class="<var>classname</var>"?&gt;</code> requires an exact match
524 (spaces and all).</li>
525 <li>The <a href="#implicit-blocks">implicit blocks</a> code is easily
526 confused, as documented in that section. This is
527 easy to fix now that I understand the difference between block and
528 inline elements (I didn't when I was implementing this), but I
529 probably won't do so until I run into the problem again.</li>
530
531 </ul>
532
533
534
535
536 <h3>Wishlist</h3>
537 <ul>
538 <li>Allow <code>&lt;html2db attribute-name="<var>name</var>"
539 value="<var>value</var>"?&gt;</code> at any position, to set arbitrary
540 Docbook attributes on the generated element.</li>
541
542 <li>Use different technique from the <a href="#docbook-elements">fake
543 namespace prefix</a> to name Docbook elements in the source, that
544 preserves the XHTML validity of the source file. For example, an
545 option transform <code>&lt;div class="db:footnote"&gt;</code> into
546 <code>&lt;footnote&gt;</code>, or to use a processing attribute
547 (<code>&lt;div&gt;&lt;?html2db classname="footnote"?&gt;</code>).</li>
548
549 <li>Parse DC metadata from XHTML <code>html/head/meta</code>.</li>
550
551 <li>Add an option to use <code>html/head/title</code> instead of
552 <code>html/body/h1[1]</code> for top title.</li>
553
554 <li>Allow an <code>id</code> on every element.</li>
555
556 <li>Add an option to translate the XHTML <code>class</code> into a
557 Docbook <code>role</code>.</li>
558
559 <li>Preserve more of the whitespace from the source document &emdash; especially within lists and tables &emdash; in order to make it easier to debug the output document.</li>
560
561 <h3>Support</h3>
562 <p>This is a work in progress. It serves my needs, but doesn't
563 attempt to be much more general than that. If you run into anything
564 it can't handle, please send a note, or better yet, a patch, to <a
565 href="mailto:steele@osteele.com">steele@osteele.com</a>. I can't
566 promise to address problems (I have a day job too), but knowing what
567 people have run into will help my prioritize my work when I do have
568 time to work on this.</p>
569
570
571 </ul>
572
573
574 <h3>Design Notes</h3>
575 <h4 id="docbook-namespace">The Docbook Namespace</h4>
576 <p>&html2db; accepts elements in the "Docbook namespace" in XHTML
577 source. This namespace is <code>urn:docbook</code>.</p>
578
579 <p>This isn't technically correct. Docbook doesn't really have a
580 namespace, and if it did, it wouldn't be this one. <a
581 href="http://www.faqs.org/rfcs/rfc3151.html">RFC 3151</a> suggests
582 <code>urn:publicid:-:OASIS:DTD+DocBook+XML+V4.1.2:EN</code> as the
583 Docbook namespace.</p>
584
585 <p>There two problems with the RFC 3151 namespace. First, it's long
586 and hard to remember. Second, it's limited to Docbook v4.1.2 &emdash;
587 but &html2db; works with other versions of Docbook too, which would
588 presumably have other namespaces. I think it's more useful to
589 <em>under</em>specify the Docbook version in the spec for this tool.
590 Docbook itself underspecifies the version completely, by avoiding a
591 namespace at all, but when mixing Docbook and XHTML elements I find it
592 useful to be <em>more</em> specific than that.</p>
593
594 <h3>History</h3>
595 <p>The original version of &html2db; was written by <a
596 href="http://osteele.com">Oliver Steele</a>, as part of the <a
597 href="http://laszlosystems.com">Laszlo Systems, Inc.</a> documentation
598 effort. We had a set of custom stylesheets that formatted and added
599 linking information to programming-language elements such as
600 <code>classname</code> and <code>tagname</code>, and added
601 Table-of-Contents to chapter documentation and numbers examples.</p>
602
603 <p>As the documentation set grew, the doc team (John Sundman)
604 requested features such as inter-chapter navigation, callouts, and
605 index and glossary elements. I was able to beat all of these back
606 except for navigation, which seemed critical. After a few days trying
607 to implement this, I decided it would be simpler to convert the subset
608 of XHTML that we used into a subset of Docbook, and use the latter to
609 add navigation. (Once this was done, the other features came for
610 free.)</p>
611
612 <p>During my August 2004 "sabbatical", I factored the general html2db
613 code out from the Laszlo-specific code, refactored and otherwise
614 cleaned it up, and wrote this documentation.</p>
615
616 <h3>Credits</h3>
617 <p>&html2db; was written by <a href="http://osteele.com">Oliver Steele</a>, as part of the <a href="http://laszlosystems.com">Laszlo Systems, Inc.</a> documentation effort.</p>
618
619 </body>
620 </html>