Pretty printing xhtml with nokogiri and xslt
[UPDATE]
Check this gist for a command line version of xml indenter in this post.
Today I was looking for a way to pretty print xhtml. Good’ol REXML supports this in a very simple way:
Document.new("<some>XML</some>")doc.write($stdout, indent_spaces = 4)
This generates a nicely indented xml document. But REXML was not robust enough for my needs. Luckily, we now have a couple of excellent choices on ruby for parsing xml, including hpricot, nokogiri and libxml-ruby bindings.
I did not find a way to pretty print xhtml as easy as you can do with REXML with any of these libraries, though. But I did find a way of doing it using XSLT. Nokogiri supports applying XSLT to an XML document (probably libxml bindings do too, hpricot does not). Here is how:
xsl = Nokogiri::XSLT(File.read("pretty_print.xsl")) html = Nokogiri(File.read("source.html")) File.open("output.html", "w") { |f| f << xsl.apply_to(html).to_s }
That’s it, simple enough. Got the idea from this dzone snippet.
For the xslt file I used this nice one I found on this site: http://www.printk.net/~bds/indent.html
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" encoding="ISO-8859-1"/> <xsl:param name="indent-increment" select="' '"/> <xsl:template name="newline"> <xsl:text disable-output-escaping="yes"> </xsl:text> </xsl:template> <xsl:template match="comment() | processing-instruction()"> <xsl:param name="indent" select="''"/> <xsl:call-template name="newline"/> <xsl:value-of select="$indent"/> <xsl:copy /> </xsl:template> <xsl:template match="text()"> <xsl:param name="indent" select="''"/> <xsl:call-template name="newline"/> <xsl:value-of select="$indent"/> <xsl:value-of select="normalize-space(.)"/> </xsl:template> <xsl:template match="text()[normalize-space(.)='']"/> <xsl:template match="*"> <xsl:param name="indent" select="''"/> <xsl:call-template name="newline"/> <xsl:value-of select="$indent"/> <xsl:choose> <xsl:when test="count(child::*) > 0"> <xsl:copy> <xsl:copy-of select="@*"/> <xsl:apply-templates select="*|text()"> <xsl:with-param name="indent" select="concat ($indent, $indent-increment)"/> </xsl:apply-templates> <xsl:call-template name="newline"/> <xsl:value-of select="$indent"/> </xsl:copy> </xsl:when> <xsl:otherwise> <xsl:copy-of select="."/> </xsl:otherwise> </xsl:choose> </xsl:template> </xsl:stylesheet>
Hey, thanks for the tip. In your example, you have Nokogiri::XSLT(«pretty_print.xsl»), you need to pass the contents of the file, not the filename.. so it should be Nokogiri::XSLT(File.read(«pretty_print.xsl»))
Hey, thanks for the comment. Fixed!
Hey man this is great, saved me a lot of time!
One (two) issues though, maybe you know a fix…
1) If the node is empty, it converts it into a self closing node. This is a problem for HTML:
so
becomes
2) If the node has text, the closing tag is not indented:
so
Some Text
becomes:
\tSome Text
(Hope the formatting worked out). Notice the first label tag has a «\t» before it, but the text and following closing tag aren’t properly indented.
Any quick fixes for this?
Let me try to print the code again:
becomes
and
becomes
One last time 🙂
(textarea)(/textarea)
becomes
(textarea/)
and
(label)some text(/label)
becomes
\t(label)some text
(/label)