Old Emmanuel Oga's Weblog (new one is at www.emmanueloga.com)

Pretty printing xhtml with nokogiri and xslt

Posted in ruby, xhtml, xml by emmanueloga on septiembre 29, 2009

[UPDATE]

Check this gist for a command line version of  xml indenter in this post.

Today I was looking for a way to pretty print xhtml. Good’ol REXML supports this in a very simple way:

Document.new("<some>XML</some>")doc.write($stdout, indent_spaces = 4)

This generates a nicely indented xml document. But REXML was not robust enough for my needs. Luckily, we now have a couple of excellent choices on ruby for parsing xml, including hpricot, nokogiri and libxml-ruby bindings.

I did not find a way to pretty print xhtml as easy as you can do with REXML with any of these libraries, though. But I did find a way of doing it using XSLT. Nokogiri supports applying XSLT to an XML document (probably libxml bindings do too, hpricot does not). Here is how:

    xsl = Nokogiri::XSLT(File.read("pretty_print.xsl"))
   html = Nokogiri(File.read("source.html"))
   File.open("output.html", "w") { |f| f << xsl.apply_to(html).to_s }

That’s it, simple enough. Got the idea from this dzone snippet.

For the xslt file I used this nice one I found on this site: http://www.printk.net/~bds/indent.html

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" encoding="ISO-8859-1"/>
  <xsl:param name="indent-increment" select="'   '"/>

  <xsl:template name="newline">
    <xsl:text disable-output-escaping="yes">
</xsl:text>
  </xsl:template>

  <xsl:template match="comment() | processing-instruction()">
    <xsl:param name="indent" select="''"/>
    <xsl:call-template name="newline"/>
    <xsl:value-of select="$indent"/>
    <xsl:copy />
  </xsl:template>

  <xsl:template match="text()">
    <xsl:param name="indent" select="''"/>
    <xsl:call-template name="newline"/>
    <xsl:value-of select="$indent"/>
    <xsl:value-of select="normalize-space(.)"/>
  </xsl:template>

  <xsl:template match="text()[normalize-space(.)='']"/>

  <xsl:template match="*">
    <xsl:param name="indent" select="''"/>
    <xsl:call-template name="newline"/>
    <xsl:value-of select="$indent"/>
      <xsl:choose>
       <xsl:when test="count(child::*) > 0">
        <xsl:copy>
         <xsl:copy-of select="@*"/>
         <xsl:apply-templates select="*|text()">
           <xsl:with-param name="indent" select="concat ($indent, $indent-increment)"/>
         </xsl:apply-templates>
         <xsl:call-template name="newline"/>
         <xsl:value-of select="$indent"/>
        </xsl:copy>
       </xsl:when>
       <xsl:otherwise>
        <xsl:copy-of select="."/>
       </xsl:otherwise>
     </xsl:choose>
  </xsl:template>
</xsl:stylesheet>
Tagged with: , , ,

5 respuestas

Subscribe to comments with RSS.

  1. Aaron Blohowiak said, on diciembre 10, 2009 at 4:00 pm

    Hey, thanks for the tip. In your example, you have Nokogiri::XSLT(«pretty_print.xsl»), you need to pass the contents of the file, not the filename.. so it should be Nokogiri::XSLT(File.read(«pretty_print.xsl»))

  2. lance pollard said, on abril 12, 2010 at 5:02 am

    Hey man this is great, saved me a lot of time!

    One (two) issues though, maybe you know a fix…

    1) If the node is empty, it converts it into a self closing node. This is a problem for HTML:

    so
    becomes

    2) If the node has text, the closing tag is not indented:

    so Some Text
    becomes:

    \tSome Text

    (Hope the formatting worked out). Notice the first label tag has a «\t» before it, but the text and following closing tag aren’t properly indented.

    Any quick fixes for this?

  3. lance pollard said, on abril 12, 2010 at 5:04 am

    Let me try to print the code again:

    
    

    becomes

    
    

    and

    some text
    

    becomes

    \tsome text
    
    
  4. lance pollard said, on abril 12, 2010 at 5:05 am

    One last time 🙂

    (textarea)(/textarea)
    becomes
    (textarea/)

    and

    (label)some text(/label)
    becomes
    \t(label)some text
    (/label)


Deja un comentario