Generating CSV from XML

I was helping a friend out recently who wanted to import some XML data he got into a more useful format [ ed. WHAT? err, useful to him, ‘kay?].  It seems like there are a few services out there that will give you data in some kind of home-grown XML format in a record-oriented structure, e.g.

<contacts>
    <contact>
        <id>...</id>
        <name>....</name>
        <email>...</email
    </contact>
    ... <!-- more contact elements -->
</contacts>

When you have data like this, what you’ve got is essentially a degenerate spreadsheet, easily represented as CSV.  But if the service doesn’t provide CSV export, you can get it fairly easily via XSLT.  The idea is, you want to output one row (the header) with the names of the elements in each record, and then output each row thereafter.  What matters, as far as the input, is that it has the structure mentioned above: the document consists of a root element with a number of child elements, each one of which represents a record in the data. Note that the following restriction applies: each record element must contain the same number of child elements in the same order. In order to make it a little more robust, I added some logic to quote non-numeric values, which should provide a reasonable amount of protection from values that contain commas.   For extra fun (and this was my friend’s idea, and I was too lazy to follow through the steps) you could register this XSLT as a filter in OpenOffice.org so you can (nearly) automatically import these files into oocalc. It’s not entirely elegant (the logic for outputting the header row is duplicated with the logic for outputting a normal row), but it gets the job done. So here it is, I place it in the public domain.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output method="text"
    encoding="iso-8859-1"/>
    
    <xsl:template match="/">
        <xsl:variable name="records" select="*/*"/>
        <xsl:call-template name="header-row">
            <xsl:with-param name="header" select="$records[1]"/>
        </xsl:call-template>
        <xsl:for-each select="*/*">
            <xsl:call-template name="output-row"/>
        </xsl:for-each>
    </xsl:template>
    
    <xsl:template name="output-row">
        <xsl:for-each select="child::*">
            <xsl:variable name="numeric" select="not(string(number(.)) = 'NaN')"/>
            <xsl:choose>
                <xsl:when test="$numeric">
                    <xsl:value-of select="normalize-space(.)"/>
                </xsl:when>
                <xsl:otherwise>
                    <xsl:text>"</xsl:text>
                    <xsl:value-of select="normalize-space(.)"/>
                    <xsl:text>"</xsl:text>
                </xsl:otherwise>
            </xsl:choose>
            
        <xsl:choose>
            <xsl:when test="position() = last()">
                <xsl:text>&#13;&#10;</xsl:text>
            </xsl:when>
            <xsl:otherwise>
            <xsl:text>,</xsl:text>
            </xsl:otherwise>
        </xsl:choose>
        </xsl:for-each>
    </xsl:template>
    
    <xsl:template name="header-row">
        <xsl:param name="header"/>
        <xsl:for-each select="$header/*">
            <xsl:call-template name="quotevalue"/>
        </xsl:for-each>
    </xsl:template>
    
    <xsl:template name="quotevalue">
        <xsl:text>"</xsl:text>
        <xsl:value-of select="normalize-space(name(.))"/>
        <xsl:text>"</xsl:text>
        <xsl:choose>
            <xsl:when test="position() != last()">
                <xsl:text>,</xsl:text>
            </xsl:when>
            <xsl:otherwise>
                <xsl:text>&#13;&#10;</xsl:text>
            </xsl:otherwise>
        </xsl:choose>        
    </xsl:template>    
</xsl:stylesheet>