Mercurial > geeqie.yaz
comparison doc/wiki2docbook/html2db/index.xml @ 1734:b92fc3c922ac
scripts for converting wiki documentation to docbook
author | nadvornik |
---|---|
date | Sun, 22 Nov 2009 09:12:22 +0000 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
1733:91a65afb5d77 | 1734:b92fc3c922ac |
---|---|
1 <?xml version="1.0" encoding="UTF-8"?> | |
2 <article> | |
3 | |
4 <title>html2db.xsl</title> | |
5 | |
6 | |
7 <articleinfo> | |
8 <author> | |
9 <firstname>Oliver</firstname> | |
10 <surname>Steele</surname> | |
11 </author> | |
12 <revhistory> | |
13 <revision> | |
14 <revnumber>1</revnumber> | |
15 <date>2004-07-30</date> | |
16 </revision> | |
17 <revision> | |
18 <revnumber>1.0.1</revnumber> | |
19 <date>2004-08-01</date> | |
20 <revdescription><para>Editorial changes to the | |
21 readme.</para></revdescription> | |
22 </revision> | |
23 </revhistory> | |
24 <date>2004-07-30</date> | |
25 </articleinfo> | |
26 | |
27 <para/><section><title>Overview</title> | |
28 | |
29 <para><literal>html2db.xsl</literal> converts an XHTML source document into a Docbook output | |
30 document. It provides features for customizing the generation of the | |
31 output, so that the output can be tuned by annotating | |
32 the source, rather than hand-editing the output. This makes it useful | |
33 in a processing pipeline where the source documents are maintained in | |
34 HTML, although it can be used as a one-time conversion tool | |
35 too.</para> | |
36 | |
37 <para>This document is an example of <literal>html2db.xsl</literal> used in conjunction with | |
38 the Docbook XSL stylesheets. The <ulink url="index.src.html">source | |
39 file</ulink> is an XHTML file with some embedded Docbook elements and | |
40 processing instructions. <literal>html2db.xsl</literal> compiles it into a <ulink url="index.xml">Docbook document</ulink>, which can be used to generate | |
41 this output file (which includes a Table of Contents), a <ulink url="docs/index.html">chunked HTML file</ulink>, a <ulink url="html2db.pdf">PDF</ulink>, or other formats.</para> | |
42 | |
43 <para/></section><section><title>Features</title> | |
44 <variablelist><varlistentry><term>XSLT implementation</term><listitem><para>This tool is designed to be embedded within an XSLT processing | |
45 pipeline. <literal>html2html.xslt</literal> can be used in a custom | |
46 stylesheet or integrated into a larger system. See <link linkend="embedding">Overriding</link>.</para></listitem></varlistentry><varlistentry><term>Customizable</term><listitem><para>The output can be customized by the means of additonal markup in | |
47 the XHMTL source. See the section on <link linkend="customization">customization</link>.</para></listitem></varlistentry><varlistentry><term>Creates outline structure</term><listitem><para><literal>h1</literal>, <literal>h2</literal>, etc. are turned into nested | |
48 <literal>section</literal> and <literal>title</literal> elements (as opposed to | |
49 bridge heads).</para></listitem></varlistentry><varlistentry><term>Accepts a wide variety of XHTML</term><listitem><para>In particular, <literal>html2db.xsl</literal> automatically wraps <indexterm significance="preferred"><primary>naked item | |
50 text</primary></indexterm><glossterm>naked item | |
51 text</glossterm> (text that is not enclosed in a <literal><p></literal>) | |
52 inside a table cell or list item. Naked text is a common property of | |
53 XHTML documents, but needs to be clothed to create valid | |
54 Docbook.<footnote><para>This feature is limited. See <link linkend="implicit-blocks">Implicit Blocks</link>.)</para></footnote></para></listitem></varlistentry></variablelist> | |
55 | |
56 <para/></section><section><title>Requirements</title> | |
57 <itemizedlist spacing="compact"><listitem><para>Java: JRE or JDK 1.3 or greater.</para></listitem><listitem><para>Xalan 2.5.0.</para></listitem><listitem><para>Familiarity with installing and running JAR files.</para></listitem></itemizedlist> | |
58 | |
59 <para><literal>html2db.xsl</literal> might work with earlier versions of Java and Xalan, and | |
60 it might work with other XSLT processors such as Saxon and | |
61 xsltproc.</para> | |
62 | |
63 <para/></section><section><title>License</title> | |
64 <para>This software is released under the Open Source <ulink url="http://www.opensource.org/licenses/artistic-license.php">Artistic License</ulink>.</para> | |
65 | |
66 <para/></section><section><title>Installation</title> | |
67 <itemizedlist spacing="compact"><listitem><para>Install JRE 1.3 or higher.</para></listitem><listitem><para>Install Xalan, if necessary.</para></listitem><listitem><para>Download <literal>html2db-1.zip</literal> from <ulink url="http://osteele.com/sources/html2db.zip">http://osteele.com/sources/html2db-1.zip</ulink>.</para></listitem><listitem><para>Unzip <literal>html2db-1.zip</literal>.</para></listitem></itemizedlist> | |
68 | |
69 <para/></section><section><title>Usage</title> | |
70 <para>Use Xalan to process an XHTML source file into a Docbook file:</para> | |
71 | |
72 <informalexample><programlisting> | |
73 java org.apache.xalan.xslt.Process -XSL html2dbk.xsl -IN doc.html > doc.xml | |
74 </programlisting></informalexample> | |
75 | |
76 <para>See <ulink url="index.src.html"><literal>index.src.html</literal></ulink> for an | |
77 example of an input file.</para> | |
78 | |
79 <para>If your source files are in HTML, not XHTML, you may find the <ulink url="http://tidy.sourceforge.net/">Tidy</ulink> tool useful. This is a | |
80 tool that converts from HTML to XHTML, and can be added to the front | |
81 of your processing pipeline.</para> | |
82 | |
83 <para>(If you need to process HTML and you don't know or can't figure out | |
84 from context what a processing pipeline is, <literal>html2db.xsl</literal> is probably not | |
85 the right tool for you, and you should look for a local XML or Java | |
86 guru or for a commercially supported product.)</para> | |
87 | |
88 <para/></section><section><title>Specification</title> | |
89 | |
90 <para/><section><title>XHTML Elements</title> | |
91 <para><literal>code/i</literal> stands for "an <literal>i</literal> element | |
92 immediately within a <literal>code</literal> element". This notation is | |
93 from XPath.</para> | |
94 | |
95 <para>XHTML elements must be in the XHTML Transitional namespace, | |
96 <literal>http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd</literal>.</para> | |
97 | |
98 <informaltable><tgroup cols="3"><thead><row><entry>XHTML</entry><entry>Docbook</entry><entry>Notes</entry></row> | |
99 </thead><tbody><row><entry><literal>b</literal>, <literal>i</literal>, <literal>em</literal>, <literal>strong</literal></entry><entry><literal>emphasis</literal></entry><entry>The <literal>role</literal> attribute is the original tag name</entry></row> | |
100 <row><entry><literal>dfn</literal></entry><entry><literal>glossitem</literal>, and also <literal>primary</literal> <literal>indexterm</literal></entry></row> | |
101 <row><entry><literal>code/i</literal>, <literal>tt/i</literal>, <literal>pre/i</literal></entry><entry><literal>replaceable</literal></entry><entry>In practice, <literal>i</literal> within a monospace content is usually used to mean replaceable text. If you're using it for emphasis, use <literal>em</literal> instead.</entry></row> | |
102 <row><entry><literal>pre</literal>, <literal>body/code</literal></entry><entry><literal>programlisting</literal></entry></row> | |
103 <row><entry><literal>img</literal></entry><entry><literal>inlinemediaobject/imageobject/imagedata</literal></entry><entry>In an inline context.</entry></row> | |
104 <row><entry><literal>img</literal></entry><entry><literal>[informal]figure/mediaobject/imageobject/imagedata</literal></entry><entry>If it has a <literal>title</literal> attribute or <literal>db:title</literal> it's wrapped in a <literal>figure</literal>. Otherwise it's wrapped in an <literal>informalfigure</literal>.</entry></row> | |
105 <row><entry><literal>table</literal></entry><entry><literal>[informal]table</literal></entry><entry>XHTML <literal>table</literal> becomes Docbook <literal>table</literal> if it has a <literal>summary</literal> attribute; <literal>informaltable</literal> otherwise.</entry></row> | |
106 <row><entry><literal>ul</literal></entry><entry><literal>itemizedlist</literal></entry><entry>But see the processing instruction <link linkend="simplelist">below</link>.</entry></row> | |
107 </tbody></tgroup></informaltable> | |
108 | |
109 | |
110 | |
111 <para/></section><section><title>Links</title> | |
112 <table><title>Link Translation</title><tgroup cols="3"><thead><row><entry>XHTML</entry><entry>Docbook</entry><entry>Notes</entry></row> | |
113 </thead><tbody><row><entry><literal><a name="<replaceable>name</replaceable>"></literal></entry><entry><literal><anchor id="{$anchor-id-prefix}<replaceable>name</replaceable>"></literal></entry><entry>An anchor within a <literal>h<replaceable>n</replaceable></literal> element is attached to the enclosing <literal>section</literal> as an <literal>id</literal> attribute instead.</entry></row> | |
114 <row><entry><literal><a href="#<replaceable>name</replaceable>"></literal></entry><entry><literal><link linkend="{$anchor-id-prefix}<replaceable>name</replaceable>"></literal></entry></row> | |
115 <row><entry><literal><a href="<replaceable>url</replaceable>"></literal></entry><entry><literal><ulink url="<replaceable>name</replaceable>"></literal></entry></row> | |
116 <row><entry><literal><a name="mailto:<replaceable>address</replaceable>"></literal></entry><entry><literal><email><replaceable>address</replaceable></email></literal></entry></row> | |
117 </tbody></tgroup></table> | |
118 | |
119 <para/></section><section id="tables"><title>Tables</title> | |
120 | |
121 <para>XHTML <literal>table</literal> support is minimal. <literal>html2db.xsl</literal> changes the | |
122 element names and counts the columns (this is necessary to get table | |
123 footnotes to span all the columns), but it does not attempt to deal | |
124 with tables in their full generality.</para> | |
125 | |
126 <para>An XHTML <literal>table</literal> with a <literal>summary</literal> attribute | |
127 generates a <literal>table</literal>, whose <literal>title</literal> is the value | |
128 of that summary. An XHTML <literal>table</literal> without a | |
129 <literal>summary</literal> generates an <literal>informaltable</literal>.</para> | |
130 | |
131 <para>Any <literal>tr</literal>s that contain <literal>th</literal>s are pulled to | |
132 the top of the table, and placed inside a <literal>thead</literal>. Other | |
133 <literal>tr</literal>s are placed inside a <literal>tbody</literal>. This matches | |
134 the commanon XHTML <literal>table</literal> pattern, where the first row is | |
135 a header row.</para> | |
136 | |
137 <para/></section><section id="implicit-blocks"><title>Implicit Blocks</title> | |
138 <para>XHTML allows <literal>li</literal>, <literal>dd</literal>, and <literal>td</literal> | |
139 elements to contain either inline text (for instance, | |
140 <literal><li>a list item</li></literal>) or block structure | |
141 (<literal><li><p>a block</p></li></literal>). The | |
142 corresponding Docbook elements require block structure, such as | |
143 <literal>para</literal>.</para> | |
144 | |
145 <para><literal>html2db.xsl</literal> provides limited support for wrapping naked text in | |
146 these positions in <literal>para</literal> elements. If a list item or | |
147 table cell item directly contains text, all text up to the position of | |
148 the first element (or all text, if there is no element) is wrapped in | |
149 <literal>para</literal>. This handles the simple case of an item that | |
150 directly contains text, and also the case of an item that contains | |
151 text followed by blocks such as paragraphs.</para> | |
152 | |
153 <para>Note that this algorithm is easily confused. It doesn't | |
154 distinguish between block and inline XHTML elements, so it will only | |
155 wrap the first word in <literal><li>some <b>bold</b> | |
156 text</li></literal>, leading to badly formatted output. Twhe | |
157 workaround is to wrap troublesome content in explicit | |
158 <literal><p></literal> tags.</para> | |
159 | |
160 <para/></section><section id="docbook-elements"><title>Docbook Elements</title> | |
161 | |
162 <para>Elements from the Docbook namespace are passed through as is. | |
163 There are two ways to include a Docbook element in your XHTML | |
164 source:</para> | |
165 | |
166 <variablelist><varlistentry><term>Global prefix</term><listitem><para>A <indexterm significance="preferred"><primary>fake Docbook namespace</primary></indexterm><glossterm>fake Docbook namespace</glossterm><footnote><para>The fake | |
167 Docbook namespace is <literal>urn:docbook</literal>. Docbook doesn't really | |
168 have a namespace, and if it did, it wouldn't be this one. See <link linkend="docbook-namespace">Docbook namespace</link> for a discussion of | |
169 this issue.</para></footnote> | |
170 | |
171 declaration may be added to the document root element. Anywhere in | |
172 the document, the prefix from this namespace declaration may be used | |
173 to include a Docbook element. This is useful if a document contains | |
174 many Docbook elements, such as <literal>footnote</literal> or | |
175 <literal>glossterm</literal>, interspersed with XHTML. (In this case it may | |
176 be more convenient to allow these elements in the XHMTL namespace and | |
177 add a customization layer that translates them to docbook elements, | |
178 however. See <link linkend="customization">Customization</link>.)</para> | |
179 | |
180 <informalexample><programlisting> | |
181 <html xmlns="http://www.w3.org/1999/xhtml" | |
182 xmlns:db="urn:docbook"> | |
183 ... | |
184 <p>Some text<db:footnote>and a footnote</db:footnote>.</p> | |
185 </programlisting></informalexample></listitem></varlistentry><varlistentry><term>Local namespace</term><listitem><para>A Docbook element may be introduced along with a prefix-less | |
186 namespace declaration. This is useful for embedding a Docbook | |
187 document fragment (a hierarchy of elements that all use Docbook tags) | |
188 within of a XHTML document.</para> | |
189 | |
190 <informalexample><programlisting> | |
191 ... | |
192 <articleinfo xmlns="urn:docbook"> | |
193 <author> | |
194 <firstname>...</firstname> | |
195 ... | |
196 </programlisting></informalexample></listitem></varlistentry></variablelist> | |
197 | |
198 <para>The source to <ulink url="index.src.html">this document</ulink> | |
199 illustrates both of these techniques.</para> | |
200 | |
201 <note><para>Both these techniques will cause your document to be | |
202 invalid as XHTML. In order to validate an XHTML document that | |
203 contains Docbook elements, you will need to create a custom schema. | |
204 Technically, you then ought to place your document in a different | |
205 namespace, but this will cause <literal>html2db.xsl</literal> not to recognize it!</para></note> | |
206 | |
207 | |
208 <para/></section><section><title>Output Processing Instructions</title> | |
209 | |
210 <para><literal>html2db.xsl</literal> adds a few of processing instructions to the output file. | |
211 The Docbook XSL stylesheets ignore these, but if you write a | |
212 customization layer for Docbook XSL, you can use the information in | |
213 these processing instructions to customize the HTML output. This can | |
214 be used, for example, to set the <literal>a</literal> <literal>onclick</literal> | |
215 and <literal>target</literal> attributes in the HTML files that Docbook XSL | |
216 creates to the same values they had in the input document.</para> | |
217 | |
218 <variablelist><varlistentry><term><literal><?html2db attribute="<replaceable>name</replaceable>" value="<replaceable>value</replaceable>"?></literal></term><listitem><para>Placed inside a link element to capture the value of the <literal>a</literal> <literal>target</literal> and <literal>onclick</literal> attributes. <replaceable>name</replaceable> is the name of the attribute (<literal>target</literal> or <literal>onclick</literal>), and <replaceable>value</replaceable> is its value, with <literal>"</literal> and <literal>\</literal> replaced by <literal>\"</literal> and <literal>\\</literal>, respectively.</para></listitem></varlistentry><varlistentry><term><literal><?html2db element="br"?></literal></term><listitem><para>Represents the location of an XHTML <literal>br</literal> element in the | |
219 source document.</para></listitem></varlistentry></variablelist> | |
220 | |
221 <para>You can also include <literal><?db2html?></literal> processing | |
222 instructions in the HTML source document, and they will be copied | |
223 through to the Docbook output file unchanged (as will all other | |
224 processing instructions).</para> | |
225 | |
226 | |
227 <para/></section></section><section id="customization"><title>Customization</title> | |
228 <para/><section><title>XSLT Parameters</title> | |
229 <variablelist><varlistentry><term><literal><xsl:param name="anchor-id-prefix" select="''/></literal></term><listitem><para>Prefixed to every id generated from <literal><a name=></literal> | |
230 and <literal><a href="#"></literal>. This is useful to avoid | |
231 collisions between multiple documents that are compiled into the | |
232 same book. For instance, if a number of XHTML sources are assembled | |
233 into chapters of a book, you style each source file with a prefix of | |
234 <literal><replaceable>docid</replaceable>.</literal> where <replaceable>docid</replaceable> is a unique id | |
235 for each source file.</para></listitem></varlistentry><varlistentry><term><literal><xsl:param name="document-root" select="'article'"/></literal></term><listitem><para>The default document root. This can be overridden by | |
236 <literal><?html2db class="<replaceable>name</replaceable>"></literal> within the | |
237 document itself, and defaults to <literal>article</literal>.</para></listitem></varlistentry></variablelist> | |
238 | |
239 <para/></section><section id="processing-instructions"><title>Processing instructions</title> | |
240 <para>Use the <literal><?html2db?></literal> processing instruction to | |
241 customize the transformation of the XHTML source to Docbook:</para> | |
242 | |
243 <informaltable><tgroup cols="3"><thead><row><entry>Processing instruction</entry><entry>Content</entry><entry>Effect</entry></row> | |
244 </thead><tbody><row><entry><literal><?html2db class="<replaceable>xxx</replaceable>"?></literal></entry><entry><literal>body</literal></entry><entry>Sets the output document root to <replaceable>xxx</replaceable>. Useful for | |
245 translating to <literal>prefix</literal>, <literal>appendix</literal>, or <literal>chapter</literal>; the default is | |
246 <replaceable>$document-root</replaceable>.</entry></row> | |
247 <row id="simplelist"><entry><literal><?html2db class="simplelist"?></literal></entry><entry><literal>ul</literal></entry><entry>Creates a vertical <literal>simplelist</literal>.<footnote><para>Note that the | |
248 current implementation simply checks for the presence of <emphasis role="em">any</emphasis> | |
249 <literal>html2db</literal> processing instruction.</para></footnote></entry></row> | |
250 <row><entry><literal><?html2db rowsep="1"?></literal></entry><entry><literal>[informal]table</literal></entry><entry>Sets the <literal>rowsep</literal> attribute on the generated <literal>table</literal>.<footnote><para>Note that the current implementation simply checks for the presence of <emphasis role="em">any</emphasis> <literal>html2db</literal> processing instruction that begins with <literal>rowsep</literal>, and assumes the vlaue is <literal>1</literal>.</para></footnote></entry></row> | |
251 </tbody></tgroup></informaltable> | |
252 | |
253 <para/></section><section id="embedding"><title>Overriding the built-in templates</title> | |
254 <para>For cases where the previous techniques don't allow for enough | |
255 customization, you can override the builtin templates. You will need | |
256 to know XSLT in order to do this, and you will need to write a new | |
257 stylesheet that uses the <literal>xsl:import</literal> element to import | |
258 <literal>html2db.xsl</literal>.</para> | |
259 | |
260 <para>The <ulink url="examples.xsl"><literal>example.xsl</literal></ulink> stylesheet | |
261 is an example customization layer. It recognizes the <literal><div | |
262 class="abstract"></literal> and <literal><p class="note"></literal> | |
263 classes in the <ulink url="index.src.html">source</ulink> for this document, | |
264 and generates the corresponding Docbook elements.</para> | |
265 | |
266 | |
267 <para/></section></section><section><title>FAQ</title> | |
268 <para/><section><title>Why generate Docbook?</title> | |
269 <para>The primary reason to use Docbook as an <emphasis role="em">output</emphasis> format is | |
270 to take advantage of the Docbook XSL stylesheets. These are a | |
271 well-designed, well-documented set of XSL stylesheets that provide a | |
272 variety of publishing features that would be difficult to recreate | |
273 from scratch for HTML:</para> | |
274 | |
275 <itemizedlist spacing="compact"><listitem><para>Automatic Table-of-Contents generation</para></listitem><listitem><para>Automatic part, chapter, and section numbering.</para></listitem><listitem><para>Creation of single-page, multi-page, PDF, and WinHelp files from the same source document.</para></listitem><listitem><para>Navigation headers, footers, and metadata for multi-page HTML | |
276 documents.</para></listitem><listitem><para>Link resolution and link target text insertion across multiple pages and numbered targets.</para></listitem><listitem><para>Figure, example, and table numbering, and tables of these.</para></listitem><listitem><para>Index and glossary tools.</para></listitem></itemizedlist> | |
277 | |
278 <para/></section><section><title>Why write in XHTML?</title> | |
279 | |
280 <para>Given that Docbook is so great, why not write in it?</para> | |
281 | |
282 <para>Where there are not legacy concerns, Docbook is probably a better | |
283 choice for structured or technical documentation.</para> | |
284 | |
285 <para>Where the only legacy concern is the documents themselves, and not | |
286 the tools and skill sets of documentation contributors, you should | |
287 consider using an (X)HMTL convertor to perform a one-time conversion | |
288 of your documentation source into Docbook, and then switching | |
289 development to the result files. You can use this stylesheet to | |
290 perform this conversion, or evaluate other tools, many of which are | |
291 probably appropriate for this purpose.</para> | |
292 | |
293 <para>Often there are other legacy concerns: the availability of cheap | |
294 (including free) and usable HTML editors and editing modes; and the | |
295 fact that it's easier to teach people XHTML than Docbook. If either | |
296 of this is an issue in your organization, you may want to maintain | |
297 documentation sources in XHTML instead of Docbook</para> | |
298 | |
299 <para>For example, at <ulink url="http://www.laszlosystems.com/">Laszlo</ulink>, | |
300 most developers contribute directly to the documentation. Requiring | |
301 that developers learn Docbook, or that they wait on the doc team to | |
302 get content into the docs, would discourage this.</para> | |
303 | |
304 <para/></section><section><title>Why not use an existing convertor?</title> | |
305 | |
306 <para>This isn't the first (X)HTML to Docbook convertor. Why not use one | |
307 of the exisitng ones?</para> | |
308 | |
309 <para>Each HTML to Docbook convertors that I could find had at least some | |
310 of the following limitations, some of which stemmed from their | |
311 intended use as one-time-only convertors for legacy documents:</para> | |
312 | |
313 <itemizedlist spacing="compact"><listitem><para>Many only operated on a subset of HTML, and relied upon hand | |
314 editing of the output to clean up mistakes. This made them impossible | |
315 to use as part of a processing pipeline, where the source is | |
316 <emphasis role="em">maintained</emphasis> in XHTML.</para></listitem><listitem><para>There was no way to customize the output, except by (1) hand | |
317 editing, or (2) writing a post-processing stylesheet, which didn't | |
318 have access to the information in the XHTML source document.</para></listitem><listitem><para>Many of them were difficult or impossible to customize and | |
319 extend. They were closed-source, or written in Java or Perl (which I | |
320 find to be a difficult languages to use for customizing this kind of | |
321 thing) and embedded in a larger system.</para></listitem><listitem><para>They didn't take full advantage of the Docbook tag set and content | |
322 model to represent document structure. For instance, they didn't | |
323 generate nested <literal>section</literal> elements to represent | |
324 <literal>h1</literal> <literal>h2</literal> sequences, or <literal>table</literal> to | |
325 represent tables with <literal>summary</literal> attributes.</para></listitem></itemizedlist> | |
326 | |
327 <para/></section><section><title>I got this error. What does it mean?</title> | |
328 <variablelist><varlistentry><term>Q. <literal>Fatal Error! The element type "br" must be terminated by the matching end-tag "</br>". | |
329 </literal></term><listitem><para>A. Your document is HTML, not <emphasis role="em">X</emphasis>HTML. You need to fix it, or run it through Tidy first.</para></listitem></varlistentry><varlistentry><term>Q. My output document is empty except for the <literal><?xml version="1.0" encoding="UTF-8"?></literal> line.</term><listitem><para>A. The document is missing a namespace declaration. See the <ulink url="index.src.html">example</ulink> for an example.</para></listitem></varlistentry><varlistentry><term>Q. Some of the headers and document sections are repeated multiple times.</term><listitem><para>A. The document has out-of-sequence headers, such as <literal>h1</literal> followed by <literal>h3</literal> (instead of <literal>h2</literal>). This won't work.</para></listitem></varlistentry><varlistentry><term>Q. <literal>Fatal Error! The prefix "db" for element "db:footnote" is not bound.</literal></term><listitem><para>A. You haven't declared the <literal>db</literal> namespace prefix. See the <ulink url="index.src.html">example</ulink> for an example.</para></listitem></varlistentry></variablelist> | |
330 | |
331 | |
332 <para/></section></section><section><title>Implementation Notes</title> | |
333 | |
334 <para/><section><title>Bugs</title> | |
335 <itemizedlist spacing="compact"><listitem><para>Improperly sequenced <literal>h<replaceable>n</replaceable></literal> (for example | |
336 <literal>h1</literal> followed by <literal>h3</literal>, instead of | |
337 <literal>h2</literal>) will result in duplicate text.</para></listitem></itemizedlist> | |
338 | |
339 | |
340 <para/></section><section><title>Limitations</title> | |
341 <itemizedlist spacing="compact"><listitem><para>The <literal>id</literal> attribute is only preserved for certain | |
342 elements (at least <literal>h<replaceable>n</replaceable></literal>, images, paragraphs, and | |
343 tables). It ought to be preserved for all of them.</para></listitem><listitem><para>Only the <link linkend="tables">very simplest</link> table format is | |
344 implemented.</para></listitem><listitem><para>Always uses compact lists.</para></listitem><listitem><para>The string matching for <literal><?html2b | |
345 class="<replaceable>classname</replaceable>"?></literal> requires an exact match | |
346 (spaces and all).</para></listitem><listitem><para>The <link linkend="implicit-blocks">implicit blocks</link> code is easily | |
347 confused, as documented in that section. This is | |
348 easy to fix now that I understand the difference between block and | |
349 inline elements (I didn't when I was implementing this), but I | |
350 probably won't do so until I run into the problem again.</para></listitem></itemizedlist> | |
351 | |
352 | |
353 | |
354 | |
355 <para/></section><section><title>Wishlist</title> | |
356 <itemizedlist spacing="compact"><listitem><para>Allow <literal><html2db attribute-name="<replaceable>name</replaceable>" | |
357 value="<replaceable>value</replaceable>"?></literal> at any position, to set arbitrary | |
358 Docbook attributes on the generated element.</para></listitem><listitem><para>Use different technique from the <link linkend="docbook-elements">fake | |
359 namespace prefix</link> to name Docbook elements in the source, that | |
360 preserves the XHTML validity of the source file. For example, an | |
361 option transform <literal><div class="db:footnote"></literal> into | |
362 <literal><footnote></literal>, or to use a processing attribute | |
363 (<literal><div><?html2db classname="footnote"?></literal>).</para></listitem><listitem><para>Parse DC metadata from XHTML <literal>html/head/meta</literal>.</para></listitem><listitem><para>Add an option to use <literal>html/head/title</literal> instead of | |
364 <literal>html/body/h1[1]</literal> for top title.</para></listitem><listitem><para>Allow an <literal>id</literal> on every element.</para></listitem><listitem><para>Add an option to translate the XHTML <literal>class</literal> into a | |
365 Docbook <literal>role</literal>.</para></listitem><listitem><para>Preserve more of the whitespace from the source document especially within lists and tables in order to make it easier to debug the output document.</para></listitem></itemizedlist> | |
366 | |
367 | |
368 <para/></section><section><title>Design Notes</title> | |
369 <para/><section id="docbook-namespace"><title>The Docbook Namespace</title> | |
370 <para><literal>html2db.xsl</literal> accepts elements in the "Docbook namespace" in XHTML | |
371 source. This namespace is <literal>urn:docbook</literal>.</para> | |
372 | |
373 <para>This isn't technically correct. Docbook doesn't really have a | |
374 namespace, and if it did, it wouldn't be this one. <ulink url="http://www.faqs.org/rfcs/rfc3151.html">RFC 3151</ulink> suggests | |
375 <literal>urn:publicid:-:OASIS:DTD+DocBook+XML+V4.1.2:EN</literal> as the | |
376 Docbook namespace.</para> | |
377 | |
378 <para>There two problems with the RFC 3151 namespace. First, it's long | |
379 and hard to remember. Second, it's limited to Docbook v4.1.2 | |
380 but <literal>html2db.xsl</literal> works with other versions of Docbook too, which would | |
381 presumably have other namespaces. I think it's more useful to | |
382 <emphasis role="em">under</emphasis>specify the Docbook version in the spec for this tool. | |
383 Docbook itself underspecifies the version completely, by avoiding a | |
384 namespace at all, but when mixing Docbook and XHTML elements I find it | |
385 useful to be <emphasis role="em">more</emphasis> specific than that.</para> | |
386 | |
387 <para/></section></section><section><title>History</title> | |
388 <para>The original version of <literal>html2db.xsl</literal> was written by <ulink url="http://osteele.com">Oliver Steele</ulink>, as part of the <ulink url="http://laszlosystems.com">Laszlo Systems, Inc.</ulink> documentation | |
389 effort. We had a set of custom stylesheets that formatted and added | |
390 linking information to programming-language elements such as | |
391 <literal>classname</literal> and <literal>tagname</literal>, and added | |
392 Table-of-Contents to chapter documentation and numbers examples.</para> | |
393 | |
394 <para>As the documentation set grew, the doc team (John Sundman) | |
395 requested features such as inter-chapter navigation, callouts, and | |
396 index and glossary elements. I was able to beat all of these back | |
397 except for navigation, which seemed critical. After a few days trying | |
398 to implement this, I decided it would be simpler to convert the subset | |
399 of XHTML that we used into a subset of Docbook, and use the latter to | |
400 add navigation. (Once this was done, the other features came for | |
401 free.)</para> | |
402 | |
403 <para>During my August 2004 "sabbatical", I factored the general html2db | |
404 code out from the Laszlo-specific code, refactored and otherwise | |
405 cleaned it up, and wrote this documentation.</para> | |
406 | |
407 <para/></section><section><title>Credits</title> | |
408 <para><literal>html2db.xsl</literal> was written by <ulink url="http://osteele.com">Oliver Steele</ulink>, as part of the <ulink url="http://laszlosystems.com">Laszlo Systems, Inc.</ulink> documentation effort.</para> | |
409 | |
410 <para/></section></section></article> |