Pretty printing xhtml with nokogiri and xslt
[UPDATE]
Check this gist for a command line version of xml indenter in this post.
Today I was looking for a way to pretty print xhtml. Good’ol REXML supports this in a very simple way:
Document.new("<some>XML</some>")doc.write($stdout, indent_spaces = 4)
This generates a nicely indented xml document. But REXML was not robust enough for my needs. Luckily, we now have a couple of excellent choices on ruby for parsing xml, including hpricot, nokogiri and libxml-ruby bindings.
I did not find a way to pretty print xhtml as easy as you can do with REXML with any of these libraries, though. But I did find a way of doing it using XSLT. Nokogiri supports applying XSLT to an XML document (probably libxml bindings do too, hpricot does not). Here is how:
xsl = Nokogiri::XSLT(File.read("pretty_print.xsl"))
html = Nokogiri(File.read("source.html"))
File.open("output.html", "w") { |f| f << xsl.apply_to(html).to_s }
That’s it, simple enough. Got the idea from this dzone snippet.
For the xslt file I used this nice one I found on this site: http://www.printk.net/~bds/indent.html
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" encoding="ISO-8859-1"/>
<xsl:param name="indent-increment" select="' '"/>
<xsl:template name="newline">
<xsl:text disable-output-escaping="yes">
</xsl:text>
</xsl:template>
<xsl:template match="comment() | processing-instruction()">
<xsl:param name="indent" select="''"/>
<xsl:call-template name="newline"/>
<xsl:value-of select="$indent"/>
<xsl:copy />
</xsl:template>
<xsl:template match="text()">
<xsl:param name="indent" select="''"/>
<xsl:call-template name="newline"/>
<xsl:value-of select="$indent"/>
<xsl:value-of select="normalize-space(.)"/>
</xsl:template>
<xsl:template match="text()[normalize-space(.)='']"/>
<xsl:template match="*">
<xsl:param name="indent" select="''"/>
<xsl:call-template name="newline"/>
<xsl:value-of select="$indent"/>
<xsl:choose>
<xsl:when test="count(child::*) > 0">
<xsl:copy>
<xsl:copy-of select="@*"/>
<xsl:apply-templates select="*|text()">
<xsl:with-param name="indent" select="concat ($indent, $indent-increment)"/>
</xsl:apply-templates>
<xsl:call-template name="newline"/>
<xsl:value-of select="$indent"/>
</xsl:copy>
</xsl:when>
<xsl:otherwise>
<xsl:copy-of select="."/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>

Hey, thanks for the tip. In your example, you have Nokogiri::XSLT(“pretty_print.xsl”), you need to pass the contents of the file, not the filename.. so it should be Nokogiri::XSLT(File.read(“pretty_print.xsl”))
Hey, thanks for the comment. Fixed!
Hey man this is great, saved me a lot of time!
One (two) issues though, maybe you know a fix…
1) If the node is empty, it converts it into a self closing node. This is a problem for HTML:
so
becomes
2) If the node has text, the closing tag is not indented:
so
Some Textbecomes:
\tSome Text
(Hope the formatting worked out). Notice the first label tag has a “\t” before it, but the text and following closing tag aren’t properly indented.
Any quick fixes for this?
Let me try to print the code again:
becomes
and
becomes
One last time
(textarea)(/textarea)
becomes
(textarea/)
and
(label)some text(/label)
becomes
\t(label)some text
(/label)