Old Emmanuel Oga's Weblog (new one is at www.emmanueloga.com)

Pretty printing xhtml with nokogiri and xslt

Posted in ruby, xhtml, xml by emmanueloga on septiembre 29, 2009


Check this gist for a command line version of聽 xml indenter in this post.

Today I was looking for a way to pretty print xhtml. Good’ol REXML supports this in a very simple way:

Document.new("<some>XML</some>")doc.write($stdout, indent_spaces = 4)

This generates a nicely indented xml document. But REXML was not robust enough for my needs. Luckily, we now have a couple of excellent choices on ruby for parsing xml, including hpricot, nokogiri and libxml-ruby bindings.

I did not find a way to pretty print xhtml as easy as you can do with REXML with any of these libraries, though. But I did find a way of doing it using XSLT. Nokogiri supports applying XSLT to an XML document (probably libxml bindings do too, hpricot does not). Here is how:

    xsl = Nokogiri::XSLT(File.read("pretty_print.xsl"))
   html = Nokogiri(File.read("source.html"))
   File.open("output.html", "w") { |f| f << xsl.apply_to(html).to_s }

That’s it, simple enough. Got the idea from this dzone snippet.

For the xslt file I used this nice one I found on this site: http://www.printk.net/~bds/indent.html

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" encoding="ISO-8859-1"/>
  <xsl:param name="indent-increment" select="'   '"/>

  <xsl:template name="newline">
    <xsl:text disable-output-escaping="yes">

  <xsl:template match="comment() | processing-instruction()">
    <xsl:param name="indent" select="''"/>
    <xsl:call-template name="newline"/>
    <xsl:value-of select="$indent"/>
    <xsl:copy />

  <xsl:template match="text()">
    <xsl:param name="indent" select="''"/>
    <xsl:call-template name="newline"/>
    <xsl:value-of select="$indent"/>
    <xsl:value-of select="normalize-space(.)"/>

  <xsl:template match="text()[normalize-space(.)='']"/>

  <xsl:template match="*">
    <xsl:param name="indent" select="''"/>
    <xsl:call-template name="newline"/>
    <xsl:value-of select="$indent"/>
       <xsl:when test="count(child::*) > 0">
         <xsl:copy-of select="@*"/>
         <xsl:apply-templates select="*|text()">
           <xsl:with-param name="indent" select="concat ($indent, $indent-increment)"/>
         <xsl:call-template name="newline"/>
         <xsl:value-of select="$indent"/>
        <xsl:copy-of select="."/>
Tagged with: , , ,

5 comentarios

Subscribe to comments with RSS.

  1. Aaron Blohowiak said, on diciembre 10, 2009 at 4:00 pm

    Hey, thanks for the tip. In your example, you have Nokogiri::XSLT(“pretty_print.xsl”), you need to pass the contents of the file, not the filename.. so it should be Nokogiri::XSLT(File.read(“pretty_print.xsl”))

  2. lance pollard said, on abril 12, 2010 at 5:02 am

    Hey man this is great, saved me a lot of time!

    One (two) issues though, maybe you know a fix…

    1) If the node is empty, it converts it into a self closing node. This is a problem for HTML:


    2) If the node has text, the closing tag is not indented:

    so Some Text

    \tSome Text

    (Hope the formatting worked out). Notice the first label tag has a “\t” before it, but the text and following closing tag aren’t properly indented.

    Any quick fixes for this?

  3. lance pollard said, on abril 12, 2010 at 5:04 am

    Let me try to print the code again:





    some text


    \tsome text
  4. lance pollard said, on abril 12, 2010 at 5:05 am

    One last time 馃檪



    (label)some text(/label)
    \t(label)some text


Introduce tus datos o haz clic en un icono para iniciar sesi贸n:

Logo de WordPress.com

Est谩s comentando usando tu cuenta de WordPress.com. Cerrar sesi贸n /  Cambiar )

Google+ photo

Est谩s comentando usando tu cuenta de Google+. Cerrar sesi贸n /  Cambiar )

Imagen de Twitter

Est谩s comentando usando tu cuenta de Twitter. Cerrar sesi贸n /  Cambiar )

Foto de Facebook

Est谩s comentando usando tu cuenta de Facebook. Cerrar sesi贸n /  Cambiar )


Conectando a %s

A %d blogueros les gusta esto: