Pretty printing xhtml with nokogiri and xslt
[UPDATE]
Check this gist for a command line version of xml indenter in this post.
Today I was looking for a way to pretty print xhtml. Good’ol REXML supports this in a very simple way:
Document.new("<some>XML</some>")doc.write($stdout, indent_spaces = 4)
This generates a nicely indented xml document. But REXML was not robust enough for my needs. Luckily, we now have a couple of excellent choices on ruby for parsing xml, including hpricot, nokogiri and libxml-ruby bindings.
I did not find a way to pretty print xhtml as easy as you can do with REXML with any of these libraries, though. But I did find a way of doing it using XSLT. Nokogiri supports applying XSLT to an XML document (probably libxml bindings do too, hpricot does not). Here is how:
xsl = Nokogiri::XSLT(File.read("pretty_print.xsl")) html = Nokogiri(File.read("source.html")) File.open("output.html", "w") { |f| f << xsl.apply_to(html).to_s }
That’s it, simple enough. Got the idea from this dzone snippet.
For the xslt file I used this nice one I found on this site: http://www.printk.net/~bds/indent.html
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" encoding="ISO-8859-1"/> <xsl:param name="indent-increment" select="' '"/> <xsl:template name="newline"> <xsl:text disable-output-escaping="yes"> </xsl:text> </xsl:template> <xsl:template match="comment() | processing-instruction()"> <xsl:param name="indent" select="''"/> <xsl:call-template name="newline"/> <xsl:value-of select="$indent"/> <xsl:copy /> </xsl:template> <xsl:template match="text()"> <xsl:param name="indent" select="''"/> <xsl:call-template name="newline"/> <xsl:value-of select="$indent"/> <xsl:value-of select="normalize-space(.)"/> </xsl:template> <xsl:template match="text()[normalize-space(.)='']"/> <xsl:template match="*"> <xsl:param name="indent" select="''"/> <xsl:call-template name="newline"/> <xsl:value-of select="$indent"/> <xsl:choose> <xsl:when test="count(child::*) > 0"> <xsl:copy> <xsl:copy-of select="@*"/> <xsl:apply-templates select="*|text()"> <xsl:with-param name="indent" select="concat ($indent, $indent-increment)"/> </xsl:apply-templates> <xsl:call-template name="newline"/> <xsl:value-of select="$indent"/> </xsl:copy> </xsl:when> <xsl:otherwise> <xsl:copy-of select="."/> </xsl:otherwise> </xsl:choose> </xsl:template> </xsl:stylesheet>
5 comments