Old Emmanuel Oga's Weblog (new one is at www.emmanueloga.com)

Pretty printing xhtml with nokogiri and xslt

Posted in ruby, xhtml, xml by emmanueloga on septiembre 29, 2009


Check this gist for a command line version of  xml indenter in this post.

Today I was looking for a way to pretty print xhtml. Good’ol REXML supports this in a very simple way:

Document.new("<some>XML</some>")doc.write($stdout, indent_spaces = 4)

This generates a nicely indented xml document. But REXML was not robust enough for my needs. Luckily, we now have a couple of excellent choices on ruby for parsing xml, including hpricot, nokogiri and libxml-ruby bindings.

I did not find a way to pretty print xhtml as easy as you can do with REXML with any of these libraries, though. But I did find a way of doing it using XSLT. Nokogiri supports applying XSLT to an XML document (probably libxml bindings do too, hpricot does not). Here is how:

    xsl = Nokogiri::XSLT(File.read("pretty_print.xsl"))
   html = Nokogiri(File.read("source.html"))
   File.open("output.html", "w") { |f| f << xsl.apply_to(html).to_s }

That’s it, simple enough. Got the idea from this dzone snippet.

For the xslt file I used this nice one I found on this site: http://www.printk.net/~bds/indent.html

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" encoding="ISO-8859-1"/>
  <xsl:param name="indent-increment" select="'   '"/>

  <xsl:template name="newline">
    <xsl:text disable-output-escaping="yes">

  <xsl:template match="comment() | processing-instruction()">
    <xsl:param name="indent" select="''"/>
    <xsl:call-template name="newline"/>
    <xsl:value-of select="$indent"/>
    <xsl:copy />

  <xsl:template match="text()">
    <xsl:param name="indent" select="''"/>
    <xsl:call-template name="newline"/>
    <xsl:value-of select="$indent"/>
    <xsl:value-of select="normalize-space(.)"/>

  <xsl:template match="text()[normalize-space(.)='']"/>

  <xsl:template match="*">
    <xsl:param name="indent" select="''"/>
    <xsl:call-template name="newline"/>
    <xsl:value-of select="$indent"/>
       <xsl:when test="count(child::*) > 0">
         <xsl:copy-of select="@*"/>
         <xsl:apply-templates select="*|text()">
           <xsl:with-param name="indent" select="concat ($indent, $indent-increment)"/>
         <xsl:call-template name="newline"/>
         <xsl:value-of select="$indent"/>
        <xsl:copy-of select="."/>
Tagged with: , , ,