Docbook XML to HTML

September 17th, 2006   1 Comment

I’ve got some DocBook XML files I want to output as HTML files. The DocBook XSL files allow you to do this fairly easy and produce usable results. Note: the out-of-the box results aren’t something you’d show to customers, but, as I say, they’re usable.

Download the XSL stylesheets from SourcForge:

http://sourceforge.net/projects/docbook/

Install an XSLT processor to do the transformation. I’m doing this on Windows XP and I chose to use xsltproc. I tried using Eclipse first of all (for a GUI way of doing things) but this failed with unhelpful errors, with no hits on Google, so I reverted to the old DOS command-line method using xsltproc.

To do this, first download and install xsltproc. See Chapter 3 of Bob Stayton’s DocBook XSL: The Complete Guide, Third Edition:

www.sagehill.net/docbookxsl/InstallingAProcessor.html

As the instructions tell you, you download 4 zip files (for libxml, libxslt, zlib, and iconv) from:

www.zlatkovic.com/libxml.en.html

Unzip these and put them in the Windows Path.

Note: It’s not enough to put the 4 bin directories of the unzipped folders in the Windows Path. If you try doing this, when you run xsltproc you’ll get a dialog box saying:

=========
xsltproc.exe – Entry Point Not Found
————–
The procedure entry point xmlCtxtUseOptions could not be located in the dynamic link library libxml2.dll

=========

All the DLLs you unzipped need to be in the same directory as the xsltproc executable or (weirdly, if you’ve put the directories in you Windows Path) the exe file won’t be able to find them.

The best thing to do is either to put all of the DLLs and exe files in Windows’ system32 directory, or (better, because this way they’re less likely to remain kicking around system32 long after you’ve stopped using them) put the unzipped directories in a directory somewhere within your own area (e.g. D:\myStuff\XML-stuff\XSLT-stuff\xsltproc) and then copy the DLLs and exes out of the bin directories into the top level of this directory and then add this path to your Windows “Path” environment variable.

For details of how to edit the Path environment variable in Windows, see:
www.itauthor.com/notes/archives/2003/11/html_tidy.html

Once you’ve done this you can run xsltproc on an XML file.

Browse to the directory containing the XML file you want to process (here called inputfile.xml) and run the following command:

xsltproc --output outputfile.html --stringparam use.extensions 0 D:/myStuff/XML-stuff/docbook-xsl/html/docbook.xsl inputfile.xml

This creates an HTML version of the XML file called outputfile.html, in the same directory as the XML file, using the DocBook XSL files.

If you look at the source of the HTML it’s compacted together and not pretty to look at. To fix this you can use the “tidy” program.

For example, run the following from a DOS console:

tidy -i -f errorLog.txt "original file.html" > newfile.html

This creates newfile.html, with indentation (the -i), from “original file.html” (which is left untouched), and writes error messages to errorLog.txt.

Get “tidy” from:
http://dev.int64.org/tidy.html

And documentation from:
www.w3.org/People/Raggett/tidy/

Chunking

So far, we’ve just transformed a single page. A more useful operation is to divide a large file, like a chapter, into smaller chunks. You do this by using the html\chunk.xsl XSL file, rather than html\docbook.xsl.

You can also pass various parameters to the processor, to affect the way the HTML is output. For example:

--stringparam base.dir chunkedOutput/
--stringparam html.stylesheet styles/myStyles.css
--stringparam admon.graphics 1
--stringparam navig.graphics 1

These parameters tell the processor:

–stringparam base.dir chunkedOutput/

Put the output files into a subdirectory called “chunkedOutput”.
Note 1: You need the backslash, otherwise “chunkedOutput” gets added to the output file names and they’re saved in the same directory as the input files.
Note 2: This directory must exist, it isn’t created for you.

–stringparam html.stylesheet styles/myStyles.css
Add a stylesheet link in the head of each HTML file that is output. Note: This will only have an effect if the chunkedOutput/styles/myStyles.css file actually exists.

–stringparam admon.graphics 1
Add graphics for Note, Important, etc.
To make these have an effect, copy the images directory, from the Docbook XSL stylesheet directory you downloaded, into the chunkedOutput/ directory.

–stringparam navig.graphics 1
This adds the Next, Previous and Home icons to the headers and footers of pages. Again, this only has an effect once you copy the images directory into place within the chunkedOutput/ directory.

The available parameters are listed here:

http://docbook.sourceforge.net/release/xsl/current/doc/html/

Now you can generate a chunked file. But to be really useful, you want to process the whole book at once. To do this you need to create a book XML file that includes references to the chapter files. Once you have created this, you can run xsltproc against this file. The book.xml file should look something like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book SYSTEM "file:///D:/myWork/documentation/manuals/authoringDocs/mxDocBook/mxDocBook-DTD/mxDocBook.dtd">

<book>
<title>Administrator's Guide</title>
<para>This guide shows you how to use the software.</para>

<xi:include xmlns:xi="http://www.w3.org/2001/XInclude"  href="introduction.xml" /> 

<xi:include xmlns:xi="http://www.w3.org/2001/XInclude"  href="license.xml" /> 

<xi:include xmlns:xi="http://www.w3.org/2001/XInclude"  href="installingonWindows.xml" /> 

<xi:include xmlns:xi="http://www.w3.org/2001/XInclude"  href="appendixConfigFileDetail.xml" /> 

</book>

Save this file alongside the XML files you want to process and then, from this directory, run xsltproc. For example:

xsltproc --stringparam base.dir chunkedOutput/ --stringparam html.stylesheet styles/myStyles.css --stringparam admon.graphics 1 --stringparam navig.graphics 1 --xinclude D:\myStuff\XML-stuff\docbook-xsl\html\chunk.xsl book.xml

This produces a mini Web site for your book, with navigation button to allow you to browse from page to page.

Potentially similar posts

Comments are closed

  1. User Gravatar Tan said:

    May 12th, 2012 at 6:03 pm (#)

    Thanks, this post is really helpful!