Docbook XML to HTML
September 17th, 2006
I've got some DocBook XML files I want to output as HTML files. The DocBook XSL files allow you to do this fairly easy and produce usable results. Note: the out-of-the box results aren't something you'd show to customers, but, as I say, they're usable.
Download the XSL stylesheets from SourcForge:
http://sourceforge.net/projects/docbook/
Install an XSLT processor to do the transformation. I'm doing this on Windows XP and I chose to use xsltproc. I tried using Eclipse first of all (for a GUI way of doing things) but this failed with unhelpful errors, with no hits on Google, so I reverted to the old DOS command-line method using xsltproc.
To do this, first download and install xsltproc. See Chapter 3 of Bob Stayton's DocBook XSL: The Complete Guide, Third Edition:
www.sagehill.net/docbookxsl/InstallingAProcessor.html
As the instructions tell you, you download 4 zip files (for libxml, libxslt, zlib, and iconv) from:
www.zlatkovic.com/libxml.en.html
Unzip these and put them in the Windows Path.
Note: It's not enough to put the 4 bin directories of the unzipped folders in the Windows Path. If you try doing this, when you run xsltproc you'll get a dialog box saying:
=========
xsltproc.exe - Entry Point Not Found
--------------
The procedure entry point xmlCtxtUseOptions could not be located in the dynamic link library libxml2.dll
=========
All the DLLs you unzipped need to be in the same directory as the xsltproc executable or (weirdly, if you've put the directories in you Windows Path) the exe file won't be able to find them.
The best thing to do is either to put all of the DLLs and exe files in Windows' system32 directory, or (better, because this way they're less likely to remain kicking around system32 long after you've stopped using them) put the unzipped directories in a directory somewhere within your own area (e.g. D:\myStuff\XML-stuff\XSLT-stuff\xsltproc) and then copy the DLLs and exes out of the bin directories into the top level of this directory and then add this path to your Windows "Path" environment variable.
For details of how to edit the Path environment variable in Windows, see:
www.itauthor.com/notes/archives/2003/11/html_tidy.html
Once you've done this you can run xsltproc on an XML file.
Browse to the directory containing the XML file you want to process (here called inputfile.xml) and run the following command:
xsltproc --output outputfile.html --stringparam use.extensions 0 D:/myStuff/XML-stuff/docbook-xsl/html/docbook.xsl inputfile.xml
This creates an HTML version of the XML file called outputfile.html, in the same directory as the XML file, using the DocBook XSL files.
If you look at the source of the HTML it's compacted together and not pretty to look at. To fix this you can use the "tidy" program.
For example, run the following from a DOS console:
tidy -i -f errorLog.txt "original file.html" > newfile.html
This creates newfile.html, with indentation (the -i), from "original file.html" (which is left untouched), and writes error messages to errorLog.txt.
Get "tidy" from:
http://dev.int64.org/tidy.html
And documentation from:
www.w3.org/People/Raggett/tidy/
Potentially similar posts
- Perl basics for beginners (on Windows) – August 2010
- Maintaining a Flare project in Google Code – January 2010
- Listening to RealAudio on your MP3 player – May 2009
- Getting tag changes to show up in Windows Media Player – December 2008
- Solving disk space problems with SkyDrive – September 2008