Old Emmanuel Oga's Weblog (new one is at www.emmanueloga.com)

Pretty printing xhtml with nokogiri and xslt

Posted in ruby, xhtml, xml by emmanueloga on septiembre 29, 2009

[UPDATE]

Check this gist for a command line version of  xml indenter in this post.

Today I was looking for a way to pretty print xhtml. Good’ol REXML supports this in a very simple way:

Document.new("<some>XML</some>")doc.write($stdout, indent_spaces = 4)

This generates a nicely indented xml document. But REXML was not robust enough for my needs. Luckily, we now have a couple of excellent choices on ruby for parsing xml, including hpricot, nokogiri and libxml-ruby bindings.

I did not find a way to pretty print xhtml as easy as you can do with REXML with any of these libraries, though. But I did find a way of doing it using XSLT. Nokogiri supports applying XSLT to an XML document (probably libxml bindings do too, hpricot does not). Here is how:

    xsl = Nokogiri::XSLT(File.read("pretty_print.xsl"))
   html = Nokogiri(File.read("source.html"))
   File.open("output.html", "w") { |f| f << xsl.apply_to(html).to_s }

That’s it, simple enough. Got the idea from this dzone snippet.

For the xslt file I used this nice one I found on this site: http://www.printk.net/~bds/indent.html

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" encoding="ISO-8859-1"/>
  <xsl:param name="indent-increment" select="'   '"/>

  <xsl:template name="newline">
    <xsl:text disable-output-escaping="yes">
</xsl:text>
  </xsl:template>

  <xsl:template match="comment() | processing-instruction()">
    <xsl:param name="indent" select="''"/>
    <xsl:call-template name="newline"/>
    <xsl:value-of select="$indent"/>
    <xsl:copy />
  </xsl:template>

  <xsl:template match="text()">
    <xsl:param name="indent" select="''"/>
    <xsl:call-template name="newline"/>
    <xsl:value-of select="$indent"/>
    <xsl:value-of select="normalize-space(.)"/>
  </xsl:template>

  <xsl:template match="text()[normalize-space(.)='']"/>

  <xsl:template match="*">
    <xsl:param name="indent" select="''"/>
    <xsl:call-template name="newline"/>
    <xsl:value-of select="$indent"/>
      <xsl:choose>
       <xsl:when test="count(child::*) > 0">
        <xsl:copy>
         <xsl:copy-of select="@*"/>
         <xsl:apply-templates select="*|text()">
           <xsl:with-param name="indent" select="concat ($indent, $indent-increment)"/>
         </xsl:apply-templates>
         <xsl:call-template name="newline"/>
         <xsl:value-of select="$indent"/>
        </xsl:copy>
       </xsl:when>
       <xsl:otherwise>
        <xsl:copy-of select="."/>
       </xsl:otherwise>
     </xsl:choose>
  </xsl:template>
</xsl:stylesheet>
Tagged with: , , ,