Home » Featured, General Computing, Headline

How to Setup AsciiDoc, Pygment and FOP on Windows for Beautiful PDF and XHTML Documentation

5 November 2010 9 Comments
How to Setup AsciiDoc, Pygment and FOP on Windows for Beautiful PDF and XHTML Documentation

Good documentation is a critical component in any project. A common myth is that developers hate writing documentation but I don’t think this is true. Developers are happy writing the content but it’s the formatting that’s a major headache. Do you use Word, or HTML? Do you use a Content Database? or some other knowledge management system? Mix in multiple developers and you end up with inconsistent formatting. This kills the effort.

If you are looking to produce standards-based content which is consistent across authors, which can be converted into multiple formats then I highly recommend AsciiDoc. Other contenders are TXT2Tags, Markdown and ReStructuredText but I looked and each of these came back to asciiDoc.

From what I understand O’Reilly uses Docbook for its actual books and from personal experience I followed a very similar process when writing my book for Wrox/Wiley.

Using AsciiDoc has the following benefits:

  • write clean documentation using only a text editor.
  • Generate images from text with automatic layout
  • Convert into HTML
  • Convert into Docbook
  • Consolidate multiple files into a single document (make a book)
  • Automatic cross-references both internal and external
  • Convert into PDF
  • Convert into Wiki (confluence)
  • Manage documentation through source control

Steve Streeting has an excellent Blog on how to set this up on Windows which I have followed for the most part. I did run into some minor issues which I’m documenting here for my own sake.

http://www.stevestreeting.com/2010/03/07/building-a-new-technical-documentation-tool-chain/

Installing the tools:
* Again full credit to Steve, I’m just updating a bit of information based on newer versions and some gotchas I encountered. BTW I did try this on Cygwin and got the basics to work but not the source code highlighting so I ended up abandoning Cygwin and sticking to an all-windows setup.

Syntax Highlighting –
- Pygments – http://pypi.python.org/pypi/Pygments – I’m using version 1.3.1 and make sure you get the TAR/GZ version.

Graphics – Image generation
- Graphviz – http://www.graphviz.org/ – I’m using version 2.27 (developer build) and installed it from the command line (msiexec /a grap***….)

I placed everything under c:\asciiDoc\

I made a few new folders to store documents:

\in\ – where my TXT files go, this is the input folder
\out\ – where the output files end up, PDFs, XHTML will be my primary output.
\xml\ – where the DocBook XML files go. This is an intermediary step in the toolchain.
\fo – where the .FO files go. This is an intermediary step in PDF generation.

and at the end of it all my directory looked like this:

Directory of C:\asciiDoc

<DIR>          asciidoc-8.6.2
               197 CatalogManager.properties
<DIR>          docbook-xsl-1.76.1
<DIR>          docbook4.5
<DIR>          fo
<DIR>          fop-1.0-bin
            1,911 fo_steve.xsl
<DIR>          in
<DIR>          out
<DIR>          Pyg
               50 pygmentize.bat
<DIR>          saxon6-5-5
<DIR>          xml
<DIR>          xml-commons-resolver-1.2
<DIR>          xslthl-2.0.2

Once I unzipped/installed all of the above, there were a few Environment Variables to setup.

I added the following to the PATH:
c:\python27;C:\asciiDoc\fop-1.0-bin\fop-1.0;C:\Program Files\Graphviz2.27\bin

In order, these are the locations of
python.exe, fop.bat, dot.exe

and the following to my CLASSPATH:
c:\asciidoc;C:\asciiDoc\xslthl-2.0.2

I followed Steve’s instructions and ended up with a file named CatalogManager.properties under c:\asciidoc


catalogs=c:/asciidoc/docbook-xsl-1.76.1/catalog.xml;c:/asciidoc/docbook4.5/catalog.xml
relative-catalogs=false
static-catalog=yes
catalog-class-name=org.apache.xml.resolver.Resolver
verbosity=1

XHTML source code highlighting uses Pygment. To make asciidoc use Pygment place the following in the asciidoc.conf file:

pygment=

I checked the filter under asciidoc-8.6.2/filters/source called “source-highlight-filter.conf”. On line 62, the filter for Pygment is defined as follows:

ifdef::pygments[source-style=template="source-highlight-block",presubs=(),postsubs=("callouts",),posattrs=("style","language","src_numbered"),filter="pygmentize -f html -l {language} {src_numbered?-O linenos=table}"]

The bit I’m interested in is how Pygment is invoked (pygmentize -f html -l {language} {src_numbered?-O linenos=table})

I had unzipped the TAR/GZ download of Pygment (which is mostly Python scripts) to c:\asciidoc\Pyg\

So on windows, to get Pygment working, I made a new BAT file called “pygmentize.bat” (important as this maps to the filter and expects certain parameters).


@echo off 
python Pyg/pygmentize -l %4 -f html 

Notice the %4, this is the parameter passed in from the asciidoc invocation.

Last thing, for source highlighting in PDFs, I just replicated Steve’s XML file. Mine ended up being located under:
c:\asciidoc\fo_steve.xsl

<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:fo="http://www.w3.org/1999/XSL/Format"
				xmlns:xslthl="http://xslthl.sf.net"
                exclude-result-prefixes="xslthl">
	<!-- Include basic AsciiDoc FOP formatting -->
	<xsl:import href="file:///c:/asciiDoc/asciidoc-8.6.2/docbook-xsl/fo.xsl"/>
	<!-- Include source syntax highlighting -->
	<xsl:import href="http://docbook.sourceforge.net/release/xsl/current/highlighting/common.xsl"/>
	<!-- This contains the default source highlight styling rules -->
	<xsl:import href="http://docbook.sourceforge.net/release/xsl/current/fo/highlight.xsl"/>
	<!-- Without the below, line numbering doesn't work -->
	<xsl:param name="use.extensions" select="'1'"/> 
	<xsl:param name="linenumbering.extension" select="'1'"/> 
	<xsl:param name="linenumbering.everyNth" select="'1'"/> 
	<!-- My style customisations -->	
	<xsl:template match='xslthl:keyword' mode="xslthl">
		<fo:inline font-weight="normal" color="#AA22FF"><xsl:apply-templates mode="xslthl"/></fo:inline>
	</xsl:template>
	<xsl:template match="xslthl:doccomment|xslthl:doctype" mode="xslthl">
		<fo:inline font-weight="normal" color="green">
			<xsl:apply-templates mode="xslthl"/>
		</fo:inline>
	</xsl:template>
	<xsl:template match="xslthl:annotation" mode="xslthl">
		<fo:inline font-weight="normal" color="teal">
			<xsl:apply-templates mode="xslthl"/>
		</fo:inline>
	</xsl:template>
	<xsl:template match="xslthl:string" mode="xslthl">
		<fo:inline font-weight="normal" font-style="italic" color="brown">
			<xsl:apply-templates mode="xslthl"/>
		</fo:inline>
	</xsl:template>
	<xsl:template match="xslthl:directive" mode="xslthl">
		<fo:inline font-weight="normal" font-style="italic" color="blue">
			<xsl:apply-templates mode="xslthl"/>
		</fo:inline>
	</xsl:template>

</xsl:stylesheet>

One tricky little gotcha was around PDF generation through FOP. I noticed text was getting cut-off (not wrapping) and during PDF generation I would get things like:

Nov 5, 2010 5:33:53 PM org.apache.fop.events.LoggingEventListener processEvent
WARNING: Line 1 of a paragraph overflows the available area by 22725 millipoints
. (See position 17:686)

Turns out the fix for this was to go into the docbook-csl-1.76.1 folder and search for all files containing “no-wrap”. I switched these all to wrap (brute force approach I know) and this resulted in FO xml with the correct wrap-option being set.

Finally I made some Batch files to help me make the documents. Here’s what I came up with:

MakeHTML.bat : This expects a file located in the directory in\ to be passed in, e.g. for a file called c:\asciidoc\in\test.txt you call it as follows:


python asciidoc-8.6.2/asciidoc.py -a icons -a toc --backend xhtml11 --doctype article  --out-file out/%1.xhtml in/%1.txt

“makeHTML test”

MakeDoc.bat: This produces both an XHTML file and a PDF and is called as before.

call makeHtml %1
copy out\*.png fo\*.png

python asciidoc-8.6.2/asciidoc.py --backend docbook  --doctype article  --out-file xml/%1.xml in/%1.txt
java -cp "c:/asciiDoc/saxon6-5-5/saxon.jar;c:/asciiDoc/xslthl-2.0.2/xslthl-2.0.2.jar;c:/asciiDoc/docbook-xsl-1.76.1/extensions/saxon65.jar;c:/asciiDoc/xml-commons-resolver-1.2/resolver.jar;c:/asciiDoc/" -Dxslthl.config="file:///c:/asciiDoc/docbook-xsl-1.76.1/highlighting/xslthl-config.xml" com.icl.saxon.StyleSheet -x org.apache.xml.resolver.tools.ResolvingXMLReader -y org.apache.xml.resolver.tools.ResolvingXMLReader -r org.apache.xml.resolver.tools.CatalogResolver -o fo\%1.fo xml\%1.xml C:\asciiDoc\fo_steve.xsl
fop -fo fo\%1.fo -pdf out\%1.pdf

MakeAll.bat: generates all files within the \in folder (named *.txt)

for /f %%a IN ('dir /b in\*.txt') do call makedoc %%~na 

Just call it like this “MakeAll”

How it works – AsciiDoc converts TXT files to XHTML with Pygment and Graphviz for the syntax highlighing. If that’s all you want, you don’t need FOP. If you don’t plan on generating graphs you don’t need Graphviz.

Asciidoc is run a 2nd time to convert the TXT to XML output (in DocBook format). The Docbook output is then transformed using Saxon into .FO output. Then finally FOP is used to convert this into an actual PDF.

Summary: So we have a full tool chain. Setup is a bear but once it’s setup things work reasonably well and you can’t argue with the quality of the output. I can write TXT files quickly and easily and get multiple outputs from the same input. I can even combine multiple files into a single output document and get automatic cross-references updated such as Table of Contents and other linkages. Overall I think it’s well worth the effort.

9 Comments »

  • Rajesh Pillai said:

    Thanks for wiring things up in one place as I was looking something similar in this line. Will give this a try.

  • Umfaan said:

    Hi – great article thanks. One minor issue though, when setting up asciidoc.conf file for using Pygment, the line added to the file should be:

    pygments=

    and not

    pygment=

    Cheers!

  • Max said:

    Hi Francis,

    Thanks for clear instructions. I managed to configure all and it seemed to work however when processing with fop I was getting error:

    24-Jan-2011 18:39:32 org.apache.fop.events.LoggingEventListener processEvent
    SEVERE: Image not found. URI: images/icons/note.png. (No context info available)

    Clearly that was because docbook-xsl had problem with paths. I was using different version of docbook to yours though. I could also not find relevant config where I would change that path either.

    Finally, I created images\icons folder in asciidoc and copied *.png from docbook-xsl-ns-1.76.1\images and all worked fine.

    Thanks again
    Max

  • Max said:

    In response to own post… we are all constantly learning :), after actually reading documentation I learned from Appendix H that icons folder is configurable by passing iconsdir backend attribute.

    Maybe this will help others…

  • Tomas said:

    Dude, you rock. This is so much easier than word and my documentation is looking snazz-e. Thx for the instructions, would not have even attempted it otherwise.

  • Jon said:

    What tool did you use to convert the asciidoc to the Wiki (confluence) markup?

  • Francis (author) said:

    Sorry, never did find a converter for confluence. We ended up just generating static HTML. Could probably create one easily enough with some XSLT sheets but the tricky part is keeping the links in sync.

  • Juan Carlos Vergara said:

    Thank you Francis it works without any errors

  • Dierk Höppner said:

    Francis,

    great work! But there is a “but”: I want to create EPUBs. If you don’t have xsltproc installed, you’ll fail when just install the software you describe.

    There are nice working WIN32 binaries provided by zlatkovic.com (see http://www.zlatkovic.com/libxml.en.html) Install all the provided binaries and dll’s in a path where the programs can find them. Add this path to your PATH environment variable.

    Caution: If you have a fill cygwin installation with all the xsl stuff _and_ cygwin/bin is in your PATH the a2x toolchain might fail.

    I’m not an xslt expert and don’t know much about docbook, xml, xslt etc., so I was not able to mend this in the cygwin installation to work correctly. I just wanted an install-and-work-solution.

    I ended up with the above mentioned binaries, your recipe and ‘deactivating’ cygwin (renaming it simply). Then this should work:

    python c:/asciidoc/asciidoc-8.6.8/a2x.py –icons -a toc -d book -f epub -L -v %1

    Call it from any path where your asciidoc file is and the output will be generated at the same place. (I do not use your directories ‘in’, ‘out’ and ‘fo’)

    Cheers

    Dierk

Leave your response!

Add your comment below, or trackback from your own site. You can also subscribe to these comments via RSS.

Be nice. Keep it clean. Stay on topic. No spam.

You can use these tags:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

This is a Gravatar-enabled weblog. To get your own globally-recognized-avatar, please register at Gravatar.