<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Clowns In My Coffee &#187; nerdination</title>
	<atom:link href="http://clownsinmycoffee.net/category/nerdination/feed/" rel="self" type="application/rss+xml" />
	<link>http://clownsinmycoffee.net</link>
	<description>Inanity of the most cogent sort you can find.</description>
	<lastBuildDate>Tue, 11 May 2010 23:16:11 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>w3c.recommend(xproc)</title>
		<link>http://clownsinmycoffee.net/2010/05/11/83/</link>
		<comments>http://clownsinmycoffee.net/2010/05/11/83/#comments</comments>
		<pubDate>Tue, 11 May 2010 23:13:33 +0000</pubDate>
		<dc:creator>adam</dc:creator>
				<category><![CDATA[Tools]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[nerdination]]></category>
		<category><![CDATA[nerdination XML]]></category>

		<guid isPermaLink="false">http://clownsinmycoffee.net/?p=83</guid>
		<description><![CDATA[As an unabashed fan of the angle brackety type things, I&#8217;m chuffed to learn, via Norman Walsh,  that XProc is now a W3C recommendation. Congratulations to all the people who put in all the work to get it there.  Take &#8230; <a href="http://clownsinmycoffee.net/2010/05/11/83/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>As an unabashed fan of the angle brackety type things, I&#8217;m chuffed to learn, <a title="Norm Walsh on XProc becoming a Recommendation" href="http://norman.walsh.name/2010/05/11/xproc">via Norman Walsh</a>,  that <a href="http://www.w3.org/TR/2010/REC-xproc-20100511/">XProc is now a W3C recommendation</a>.  Congratulations to all the people who put in all the work to get it there.  Take a look at if if you need to run your XML documents through a bunch of steps and produce a bunch of results (and do other things along the way).</p>
<p>I&#8217;ve used XProc in a limited way to run a sort of enhanced XSLT process, and it was slow to get started, but once I wrapped my head around the central concepts, the rest went like butter.  Given that the specification provides for making HTTP requests, I&#8217;d think it could serve as an especially useful component in a RESTful document publishing architecture.  But then, I would say that.</p>
]]></content:encoded>
			<wfw:commentRss>http://clownsinmycoffee.net/2010/05/11/83/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>On Nostalgia</title>
		<link>http://clownsinmycoffee.net/2010/02/06/on-nostalgia/</link>
		<comments>http://clownsinmycoffee.net/2010/02/06/on-nostalgia/#comments</comments>
		<pubDate>Sat, 06 Feb 2010 03:11:22 +0000</pubDate>
		<dc:creator>adam</dc:creator>
				<category><![CDATA[nerdination]]></category>
		<category><![CDATA[vendorprisey]]></category>

		<guid isPermaLink="false">http://clownsinmycoffee.net/?p=78</guid>
		<description><![CDATA[There&#8217;s a whiff of wistfulness out there on the &#8216;tubes for the passing of Sun Microsystems, and I&#8217;ve got to admit I&#8217;ve participated a bit in that; the absorption by Oracle of Sun&#8217;s assets certainly marks some kind of transition &#8230; <a href="http://clownsinmycoffee.net/2010/02/06/on-nostalgia/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>There&#8217;s a whiff of wistfulness out there on the &#8216;tubes for the passing of Sun Microsystems, and I&#8217;ve got to admit I&#8217;ve participated a bit in that; the absorption by Oracle of Sun&#8217;s assets certainly marks some kind of transition in the industry that helps me pay my bills, call it &#8216;maturity&#8217; or &#8216;loss of innocence&#8217;, or &#8220;oh no, we&#8217;re all doomed!&#8221;</p>
<p>On the other hand, I was clearing out some gunk in my attic this evening, and I came across a pretty hefty printout that details how to write a very simple custom component for Java Server Faces 1.0; it clocks in at around 15 pages or so.  And then, you know, maybe I&#8217;m not so surprised at what happened to Sun.</p>
<p>And then it hits me that what&#8217;s swallowing Sun is Oracle.  And then I&#8217;m surprised again.</p>
<p>File under &#8220;Vendorprisey&#8221;</p>
]]></content:encoded>
			<wfw:commentRss>http://clownsinmycoffee.net/2010/02/06/on-nostalgia/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Third G Drops</title>
		<link>http://clownsinmycoffee.net/2009/09/24/the-third-g-drops/</link>
		<comments>http://clownsinmycoffee.net/2009/09/24/the-third-g-drops/#comments</comments>
		<pubDate>Thu, 24 Sep 2009 00:35:06 +0000</pubDate>
		<dc:creator>adam</dc:creator>
				<category><![CDATA[carrboro]]></category>
		<category><![CDATA[nerdination]]></category>
		<category><![CDATA[cellphones 3g carrboro]]></category>

		<guid isPermaLink="false">http://clownsinmycoffee.net/2009/09/24/the-third-g-drops/</guid>
		<description><![CDATA[I&#8217;ve been thinking of it as effectively a rumour up until now, but today, my Android phone started getting a 3G signal in Chapel Hill and Carrboro (that&#8217;s T-Mobile, in case you didn&#8217;t know). So, now I go from having &#8230; <a href="http://clownsinmycoffee.net/2009/09/24/the-third-g-drops/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been thinking of it as effectively a rumour up until now, but today, my Android phone started getting a 3G signal in Chapel Hill and Carrboro (that&#8217;s T-Mobile, in case you didn&#8217;t know).  So, now I go from having been a double early-adopter sucker (64Kbits/s and the first generation phone) to a reasonably fast-browsin&#8217; (750-880Kbits/s) early-adopter sucker.</p>
<p>It&#8217;s progress, I guess.  I understand that it&#8217;s possible to write applications for these things.   If only I knew how to use a computer.</p>
]]></content:encoded>
			<wfw:commentRss>http://clownsinmycoffee.net/2009/09/24/the-third-g-drops/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>In Which I Become a Food Blogger</title>
		<link>http://clownsinmycoffee.net/2009/09/24/in-which-i-become-a-food-blogger/</link>
		<comments>http://clownsinmycoffee.net/2009/09/24/in-which-i-become-a-food-blogger/#comments</comments>
		<pubDate>Thu, 24 Sep 2009 00:28:51 +0000</pubDate>
		<dc:creator>adam</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[food]]></category>
		<category><![CDATA[nerdination]]></category>
		<category><![CDATA[gastronomonomy molecular]]></category>

		<guid isPermaLink="false">http://clownsinmycoffee.net/?p=73</guid>
		<description><![CDATA[Some time ago, a friend pointed me to this magical stuff that turns fats into powders. T&#8217;other day, I finally got my hands on some of this tapioca maltodextrin, as it&#8217;s called; it&#8217;s a starch, and there&#8217;s really not much &#8230; <a href="http://clownsinmycoffee.net/2009/09/24/in-which-i-become-a-food-blogger/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Some time ago, a friend pointed me to this magical stuff that turns fats into powders.  T&#8217;other day, I finally got my hands on some of this <em>tapioca maltodextrin</em>, as it&#8217;s called; it&#8217;s a starch, and there&#8217;s really not much more to turning a really fatty thing into a powder than mixing the two things together.</p>
<p>The starch is close enough to flavourless, but any statements you may have encountered to the effect that you put the powdered (olive oil/peanut butter/hazelnut-and-chocolate-spread) into your mouth and voilà! it&#8217;s the original stuff again! are not really operative.  There&#8217;s a noticeable effect on the texture, and you&#8217;ve got a bunch of starch that wasn&#8217;t there before.</p>
<p>Still, it&#8217;s amazing to work with and it&#8217;s truly a surprise for your taste buds.</p>
]]></content:encoded>
			<wfw:commentRss>http://clownsinmycoffee.net/2009/09/24/in-which-i-become-a-food-blogger/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Generating CSV from XML</title>
		<link>http://clownsinmycoffee.net/2009/05/14/generating-csv-from-xml/</link>
		<comments>http://clownsinmycoffee.net/2009/05/14/generating-csv-from-xml/#comments</comments>
		<pubDate>Thu, 14 May 2009 14:01:57 +0000</pubDate>
		<dc:creator>adam</dc:creator>
				<category><![CDATA[nerdination]]></category>
		<category><![CDATA[csv]]></category>
		<category><![CDATA[records]]></category>
		<category><![CDATA[spreadsheet]]></category>
		<category><![CDATA[xml]]></category>

		<guid isPermaLink="false">http://clownsinmycoffee.net/?p=59</guid>
		<description><![CDATA[I was helping a friend out recently who wanted to import some XML data he got into a more useful format [ ed. WHAT? err, useful to him, 'kay?].  It seems like there are a few services out there that &#8230; <a href="http://clownsinmycoffee.net/2009/05/14/generating-csv-from-xml/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I was helping a friend out recently who wanted to import some XML data he got into a more useful format [ ed. <span style="color: red;">WHAT?</span> err, useful <em>to him</em>, 'kay?].  It seems like there are a few services out there that will give you data in some kind of home-grown XML format in a record-oriented structure, e.g.</p>
<pre style="border: 1px solid rgb(0, 0, 0); padding: 1em; background-color: rgb(204, 204, 204);">
&lt;contacts&gt;
    &lt;contact&gt;
        &lt;id&gt;...&lt;/id&gt;
        &lt;name&gt;....&lt;/name&gt;
        &lt;email&gt;...&lt;/email
    &lt;/contact&gt;
    ... &lt;!-- more contact elements --&gt;
&lt;/contacts&gt;
</pre>
<p>
When you have data like this, what you&#8217;ve got is essentially a degenerate spreadsheet, easily represented as CSV.  But if the service doesn&#8217;t provide CSV export, you can get it fairly easily via XSLT.  The idea is, you want to output one row (the header) with the names of the elements in each record, and then output each row thereafter.  What matters, as far as the input, is that it has the structure mentioned above: the document consists of a root element with a number of child elements, each one of which represents a record in the data.  Note that the following restriction applies: each record element must contain the same number of child elements in the same order.  In order to make it a little more robust, I added some logic to quote non-numeric values, which should provide a reasonable amount of protection from values that contain commas.   For extra fun (and this was my friend&#8217;s idea, and I was too lazy to follow through the steps) you could register this XSLT as a filter in OpenOffice.org so you can (nearly) automatically import these files into oocalc.   It&#8217;s not entirely elegant (the logic for outputting the header row is duplicated with the logic for outputting a normal row), but it gets the job done.  So here it is, I place it in the public domain.
</p>
<pre style="border: 1px solid rgb(0, 0, 0); padding: 1em; background-color: rgb(204, 204, 204);  overflow: scroll;">
&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"&gt;
    &lt;xsl:output method="text"
    encoding="iso-8859-1"/&gt;

    &lt;xsl:template match="/"&gt;
        &lt;xsl:variable name="records" select="*/*"/&gt;
        &lt;xsl:call-template name="header-row"&gt;
            &lt;xsl:with-param name="header" select="$records[1]"/&gt;
        &lt;/xsl:call-template&gt;
        &lt;xsl:for-each select="*/*"&gt;
            &lt;xsl:call-template name="output-row"/&gt;
        &lt;/xsl:for-each&gt;
    &lt;/xsl:template&gt;

    &lt;xsl:template name="output-row"&gt;
        &lt;xsl:for-each select="child::*"&gt;
            &lt;xsl:variable name="numeric" select="not(string(number(.)) = 'NaN')"/&gt;
            &lt;xsl:choose&gt;
                &lt;xsl:when test="$numeric"&gt;
                    &lt;xsl:value-of select="normalize-space(.)"/&gt;
                &lt;/xsl:when&gt;
                &lt;xsl:otherwise&gt;
                    &lt;xsl:text&gt;"&lt;/xsl:text&gt;
                    &lt;xsl:value-of select="normalize-space(.)"/&gt;
                    &lt;xsl:text&gt;"&lt;/xsl:text&gt;
                &lt;/xsl:otherwise&gt;
            &lt;/xsl:choose&gt;

        &lt;xsl:choose&gt;
            &lt;xsl:when test="position() = last()"&gt;
                &lt;xsl:text&gt;&amp;#13;&amp;#10;&lt;/xsl:text&gt;
            &lt;/xsl:when&gt;
            &lt;xsl:otherwise&gt;
            &lt;xsl:text&gt;,&lt;/xsl:text&gt;
            &lt;/xsl:otherwise&gt;
        &lt;/xsl:choose&gt;
        &lt;/xsl:for-each&gt;
    &lt;/xsl:template&gt;

    &lt;xsl:template name="header-row"&gt;
        &lt;xsl:param name="header"/&gt;
        &lt;xsl:for-each select="$header/*"&gt;
            &lt;xsl:call-template name="quotevalue"/&gt;
        &lt;/xsl:for-each&gt;
    &lt;/xsl:template&gt;

    &lt;xsl:template name="quotevalue"&gt;
        &lt;xsl:text&gt;"&lt;/xsl:text&gt;
        &lt;xsl:value-of select="normalize-space(name(.))"/&gt;
        &lt;xsl:text&gt;"&lt;/xsl:text&gt;
        &lt;xsl:choose&gt;
            &lt;xsl:when test="position() != last()"&gt;
                &lt;xsl:text&gt;,&lt;/xsl:text&gt;
            &lt;/xsl:when&gt;
            &lt;xsl:otherwise&gt;
                &lt;xsl:text&gt;&amp;#13;&amp;#10;&lt;/xsl:text&gt;
            &lt;/xsl:otherwise&gt;
        &lt;/xsl:choose&gt;
    &lt;/xsl:template&gt;
&lt;/xsl:stylesheet&gt;
</pre>
]]></content:encoded>
			<wfw:commentRss>http://clownsinmycoffee.net/2009/05/14/generating-csv-from-xml/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A Few Minutes With Apache Sling</title>
		<link>http://clownsinmycoffee.net/2008/11/26/a-few-minutes-with-apache-sling/</link>
		<comments>http://clownsinmycoffee.net/2008/11/26/a-few-minutes-with-apache-sling/#comments</comments>
		<pubDate>Wed, 26 Nov 2008 13:02:06 +0000</pubDate>
		<dc:creator>adam</dc:creator>
				<category><![CDATA[Tools]]></category>
		<category><![CDATA[nerdination]]></category>
		<category><![CDATA[content]]></category>
		<category><![CDATA[OSGi]]></category>
		<category><![CDATA[REST]]></category>
		<category><![CDATA[sling]]></category>

		<guid isPermaLink="false">http://clownsinmycoffee.net/?p=54</guid>
		<description><![CDATA[Apache Sling is almost painfully hip, in a way only a dedicated nerd could appreciate (or, ok, believe) &#8212; it provides a RESTful frontend to a Java Content Repository, and the whole thing is based on OSGi. Roughly, it gives &#8230; <a href="http://clownsinmycoffee.net/2008/11/26/a-few-minutes-with-apache-sling/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>
<a href="http://incubator.apache.org/sling">Apache Sling</a> is almost painfully hip, in a way only a dedicated nerd could appreciate (or, ok, <em>believe</em>) &#8212; it provides a RESTful frontend to a Java Content Repository, and the whole thing is based on OSGi.  Roughly, it gives you a content repository with customizable processing and presentation for different types of content, and the only &#8216;driver&#8217; you need is a library that truly understands HTTP.
</p>
<p>
As part of evaluating it for the day job, I put together an s5 presentation with that other reST, and the result is <a href="http://www.unc.edu/home/adamc/sling-overview.html">Apache Sling Overview</a>.  I also dug into the codebase to figure out a bit more about Sling&#8217;s <a href="http://www.unc.edu/home/adamc/post-servlet.html">default POST processing servlet</a>.  I do hope I didn&#8217;t say too many materially false things.</p>
]]></content:encoded>
			<wfw:commentRss>http://clownsinmycoffee.net/2008/11/26/a-few-minutes-with-apache-sling/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Straight Outta Victoria</title>
		<link>http://clownsinmycoffee.net/2008/10/02/straight-outta-victoria/</link>
		<comments>http://clownsinmycoffee.net/2008/10/02/straight-outta-victoria/#comments</comments>
		<pubDate>Thu, 02 Oct 2008 01:55:51 +0000</pubDate>
		<dc:creator>adam</dc:creator>
				<category><![CDATA[nerdination]]></category>
		<category><![CDATA[whodathunk]]></category>

		<guid isPermaLink="false">http://clownsinmycoffee.net/?p=49</guid>
		<description><![CDATA[UVic&#8217;s Electronic Textual Cultures Lab encodes a song by some music guy in the Text Encoding Initiative XML format. There is, of course, a video. What I want to know is, does this mean XML is cool or hopelessly pass&#38;eacute;?  &#8230; <a href="http://clownsinmycoffee.net/2008/10/02/straight-outta-victoria/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>UVic&#8217;s <a href="http://etcl.uvic.ca/">Electronic Textual Cultures Lab</a> encodes a <a title="&amp;quot;Subterranean Homesick Blues&amp;quot; in TEI" href="http://etcl.uvic.ca/tei/?p=14">song by some music guy</a> in the <a href="http://www.tei-c.org/index.xml">Text Encoding Initiative</a> XML format. There is, of course, a <a title="Youtube video: TEI + Dylan" href="http://www.youtube.com/watch?v=4sHYDfITjHY">video</a>.</p>
<p>What I want to know is, does this mean XML is cool or hopelessly pass&amp;eacute;?  Discuss.</p>
]]></content:encoded>
			<wfw:commentRss>http://clownsinmycoffee.net/2008/10/02/straight-outta-victoria/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Goings On About Town</title>
		<link>http://clownsinmycoffee.net/2008/10/02/goings-on-about-town/</link>
		<comments>http://clownsinmycoffee.net/2008/10/02/goings-on-about-town/#comments</comments>
		<pubDate>Thu, 02 Oct 2008 01:46:55 +0000</pubDate>
		<dc:creator>adam</dc:creator>
				<category><![CDATA[nerdination]]></category>
		<category><![CDATA[mobile]]></category>

		<guid isPermaLink="false">http://clownsinmycoffee.net/?p=47</guid>
		<description><![CDATA[So, there&#8217;s this small office in downtown Chapel Hill that used to have a paper &#8220;Google&#8221; banner in the window.  Today, on a trip past the Cosmic Cantina, I noticed that the window now has a more permanent logo for &#8230; <a href="http://clownsinmycoffee.net/2008/10/02/goings-on-about-town/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>So, there&#8217;s this small office in downtown Chapel Hill that used to have a paper &#8220;Google&#8221; banner in the window.  Today, on a trip past the Cosmic Cantina, I noticed that the window now has a more permanent logo for <a href="http://code.google.com/android/">Android</a>. I&#8217;ll admit that the basic <a title="Open Handset Alliance" href="http://www.openhandsetalliance.com/">idea behind Android</a><a title="Android overview" href="http://www.openhandsetalliance.com/android_overview.html"> </a>&#8211; a generally open cellphone platform mostly developed by the &#8216;net&#8217;s largest advertising distribution network &#8212; is really appealing, given how disappointing the current situation in the US is (e.g. I can take a picture on my current phone, but I can&#8217;t transfer it off the phone without emailing it, which means I&#8217;d have to pay a fee; and, equally important is the fact that the applications installed on the phone &#8230; well, they <em>suck</em>).  If it takes off, it could open up a range of possibilities for &#8220;mobile computing,&#8221; and I seriously hope it pulls the rest of the industry along with it.  We have these pocket communicators and the dominant business model for them is oriented around <em>ringtones</em>, fer gosh sakes.  Hm, on second thought, I don&#8217;t see how a modular, extensible platform&#8217;s going to change that, but maybe it will let me ignore it somewhat, which is good enough.</p>
<p>But why&#8217;s there an &#8220;Android&#8221; office in downtown Chapel Hill?  One <a title="iphone developer university program" href="http://developer.apple.com/iphone/program/university.html">related development</a> suggests itself &#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://clownsinmycoffee.net/2008/10/02/goings-on-about-town/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Fun With Copyright Renewal Records</title>
		<link>http://clownsinmycoffee.net/2008/07/01/fun-with-copyright-renewal-records/</link>
		<comments>http://clownsinmycoffee.net/2008/07/01/fun-with-copyright-renewal-records/#comments</comments>
		<pubDate>Tue, 01 Jul 2008 03:32:08 +0000</pubDate>
		<dc:creator>adam</dc:creator>
				<category><![CDATA[RDF]]></category>
		<category><![CDATA[Tools]]></category>
		<category><![CDATA[conferences]]></category>
		<category><![CDATA[nerdination]]></category>

		<guid isPermaLink="false">http://clownsinmycoffee.net/?p=46</guid>
		<description><![CDATA[Based on an enormous amount of work by contributors to Project Gutenberg and the Distributed Proofreaders, combined with healthy sourcing of the US copyright office&#8217;s records, Google has compiled a a list of works originally copyrighted between 1923 and 1963 &#8230; <a href="http://clownsinmycoffee.net/2008/07/01/fun-with-copyright-renewal-records/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Based on an enormous amount of work by contributors to <a href="http://www.gutenberg.org/wiki/Main_Page">Project Gutenberg</a> and the <a href="http://www.pgdp.net/c/">Distributed Proofreaders</a>, combined with healthy sourcing of the US <a href="http://www.copyright.gov/records/">copyright office&#8217;s records</a>, Google has compiled a <a href="http://booksearch.blogspot.com/2008/06/us-copyright-renewal-records-available.html">a list of works originally copyrighted between 1923 and 1963</a> which have been renewed at some point, the upshot being that if a given work published in that time span is <em>not</em> on the list, it&#8217;s likely in the public domain.
</p>
<p>
One problem with the list that the database is a 370+ megabyte XML file, which is hard to load up in an XML-aware editor and even caused <a title="eXist open source XML database" href="http://exist.sourceforge.net">eXist</a> to choke.  So I broke it up into chunks with a shortish Groovy script, for neat ingestion into an XML database.  The heart of the script is a SAX handler that basically churns each record in the XML file into a Groovy object, and a closure (there&#8217;s that word again!) that handles each record as it is constructed.  As written, the script simply breaks the big file into a bunch of files, one for each year (you will of course have to edit the paths).  By supplying a different closure, you could do all sorts of different things with the records, e.g. stuff them into a relational database.
</p>
<p>
In the spirit of the thing, the script is in the public domain &#8212; but I make no representations as to the quality, idiomaticity or overall efficiency of the script; despite being SAX-based, it still manages to chew up quite a bit of memory, so watch out.  Note that you will need <a href="http://commons.apache.org/lang">Apache Commons Lang</a> (say, version 2.4) on the classpath (e.g. in <code>$HOME/.groovy/lib</code>) for this script to work.  Developed with Groovy 1.5.6.
</p>
<p style="color: red">I&#8217;ve tried to stop wordpress from &#8216;prettyfying&#8217; the output, which appears to mangle quotes.  I hope to have that fixed soon &#8230;</p>
<pre style="border: 1px solid #000; padding: 1em; background-color: #ccc;">import org.xml.sax.helpers.DefaultHandler
import org.xml.sax.Attributes
import org.xml.sax.helpers.XMLReaderFactory
import org.xml.sax.InputSource

import org.apache.commons.lang.StringEscapeUtils
import org.xml.sax.Locator

/**
 * Represents an individual &lt;Record&gt; element
 * in the document.
 **/
class Record {
    def file

    def lines

    def recno

    def title

   def copyrightYear

    def copyrights = []

    def renewalYear

    def renewals = [] 

    // where it was published
    def published

    // rare!
    def note

    // source of the copyright info
    def source
    def snippet
    def md5sum

    // contributors, holders, and pseudonyms
    def people = []

    /**
     * Get the XML representing this element.  Note
     * that proper functioning here depends on how the
     * handler builds the elements.
     * @return a string containing this record's XML.
     */
    def xml() {
        def buf = new StringBuffer()
        buf &lt;&lt; """
&lt;Record&gt;
    &lt;Title&gt;${title}&lt;/Title&gt;
    &lt;File&gt;${file}&lt;/File&gt;
    &lt;Lines&gt;${lines}&lt;/Lines&gt;
    &lt;MD5Sum&gt;${md5sum}&lt;/MD5Sum&gt;
"""
        if (snippet) {
            buf &lt;&lt; "\t&lt;Snippet&gt;${snippet}&lt;/Snippet&gt;\n"
        }
        if (note) {
            buf &lt;&lt;"\t&lt;Note&gt;${note}&lt;/Note&gt;\n"
        }
        buf &lt;&lt;
"""
    &lt;Source&gt;${source}&lt;/Source&gt;
    &lt;CopyrightYear&gt;${copyrightYear}&lt;/CopyrightYear&gt;
    &lt;RenewalYear&gt;${renewalYear}&lt;/RenewalYear&gt;
"""
        copyrights.each() {
            buf &lt;&lt; it.xml()
        }
        renewals.each() {
            buf &lt;&lt; it.xml()
        }
        people.each() {
                buf &lt;&lt; it.xml()
        }
        buf &lt;&lt; "&lt;/Record&gt;\n"
        return buf.toString()
    }
}

/**
 * An inelegant class representing the elements that denote
 * people (copyright holders, contributors, aliases, etc.)
 **/
class Person {

    static ELEMENTS = ["Holder" :   [ "Name", "Type" ],
                        "Contrib" : [ "Name", "Role" ],
                        "Pseudonym" : [ "Pseudo", "Real" ],
                        "Neenym" : [ "Nee", "Now" ],
                        "Aka" : [ "Alias", "Real" ] ]

    static ROLES = ELEMENTS.keySet()

    def role

    def name

    def honorific

    def type

    def xml() {
        def firstElement = ELEMENTS[role][0]
        def secondElement = ELEMENTS[role][1]
        def buf = new StringBuffer()

        buf &lt;&lt; """
&lt;${role}&gt;
    &lt;${firstElement}&gt;${name}&lt;/${firstElement}&gt;
    &lt;${secondElement}&gt;$type&lt;/${secondElement}&gt;"""
    if ( honorific ) {
        buf &lt;&lt; "\t&lt;Hon&gt;${honorific}&lt;/Hon&gt;\n"
        }
    buf &lt;&lt; "&lt;/${role}&gt;\n"
    return buf.toString()
    }
}

/**
 * Represents copyright and renewal date elements.
 */
class RecordDate {

	static ELEMENTS = ["Copyright", "Renewal"]

    def role
    def date
    def id
    def xml() {
        return """&lt;${role}&gt;
    &lt;Date&gt;${date}&lt;/Date&gt;
    &lt;Id&gt;${id}&lt;/Id&gt;
&lt;/${role}&gt;"""
    }
}

/**
 * SAX handler that turns each &lt;code&gt;Record&lt;/code&gt; element
 * into a &lt;code&gt;Record&lt;/code&gt; domain object.
 **/
class RecordHandler extends DefaultHandler {

    /**
     * Stack of strings that represents the current
     * element context.
     **/
    Stack context = new Stack()

    /**
     * the current record being built.
     **/
    Record currentRec

    /**
     * the current Person element being built.
     **/
    Person currentPerson

    /**
     * The current date information being collected.
     **/
    RecordDate currentRecDate

    /**
     * A closure which will be called as each record is
     * read in.
     **/
    def recordListener

    /**
     * a buffer to collect the current text, since SAX might
     * not report all contiguous chunks of text at once.
     **/
    StringBuilder currentText = new StringBuilder()

    def locator

    @Override
    public void setDocumentLocator(Locator locator)
    {    println "Got a locator: ${locator}"
        this.locator = locator
    }

    @Override
    public void startElement(String uri, String localName, String qName, Attributes atts)
    {
        context &lt;&lt; localName
        switch( localName ) {
            case "Record":
                currentRec = new Record()
                break
            case Person.ROLES:
                currentPerson = new Person()
                currentPerson.role = localName
                break
            case RecordDate.ELEMENTS:
                currentRecDate = new RecordDate()
                currentRecDate.role = localName
                break
        }
    }

    @Override
    public void characters(char [] ch, int start, int len)
    {
        currentText.append(ch,start,len)
    }

    @Override
    public void endElement(String uri, String localName, String qName)
    {
        String txt = StringEscapeUtils.escapeXml(currentText.toString().trim())
        switch(localName) {
            case Person.ROLES:
                currentRec.people &lt;&lt; currentPerson
                break
            case ["Type", "Role", "Real", "Now"]:
                currentPerson.type = txt
                break
            case ["Name", "Pseudo", "Nee", "Alias"]:
                currentPerson.name = txt
                break
            case "Hon":
                currentPerson.honorific = txt
               break;
            case "CopyrightYear":
                currentRec.copyrightYear = Integer.parseInt(txt)
                break
            case "Date":
                currentRecDate.date = txt
                break
            case "Id":
                currentRecDate.id = txt
                break
            case "Copyright":
                currentRec.copyrights &lt;&lt;currentRecDate
                break
            case "RenewalYear":
                currentRec.renewalYear = Integer.parseInt(txt)
                break
            case "Renewal":
                currentRec.renewals &lt;&lt; currentRecDate
                break
            case "Recno":
                currentRec.recno = txt
                break
            case "Source":
                currentRec.source = txt
                break
            case "Lines":
                currentRec.lines = txt
                break
            case "MD5Sum":
                currentRec.md5sum = txt
                break
            case "File":
                currentRec.file = txt
                break
            case "Snippet":
                currentRec.snippet = txt
                break
            case "Title":
                currentRec.title = txt
                break
            case "Published":
                currentRec.published = txt
                break
            case "Record":
                recordListener(currentRec)
                break
            case "Note":
                currentRec.note = txt
                break
            case "CopyrightRenewalRecords":
                break
            default:
                println "Unrecognized element '${localName}' at line ${locator.lineNumber}"
                System.exit(1)
            }
        currentText.length = 0
    }

}

def file = new File("input-dir/google-renewals-20080624/google-renewals-20080624.xml")

/**
 * A listener that will output each record into a different stream depending
 * on the CopyrightYear of the record.
 **/
def listenerBase = { Map streams, Record it -&gt;
    if ( !streams.containsKey(it.copyrightYear) ) {
        def f = new File("/output/dir/copyright-${it.copyrightYear}.xml")
        println "creating ${f.absolutePath}"
        def stream = f.newWriter()
        streams[it.copyrightYear] = stream
        stream.append("&lt;CopyrightRenewalRecords&gt;")
    }
    Writer s = (Writer)streams[it.copyrightYear]
    s.append(it.xml())
    s.flush()
}

def reader = XMLReaderFactory.createXMLReader()
def handler = new RecordHandler()
def outputStreams = [:]
handler.recordListener = listenerBase.curry(outputStreams)
reader.setContentHandler( handler )

try {
    reader.parse( new InputSource( file.newInputStream() ) )
} catch (Exception x) {
    x.printStackTrace()
    println "Error at line ${handler.locator.lineNumber}"
}

outputStreams.each() {
    k, BufferedOutputStream v -&gt;
        println "Closing ${k}"
        v.append("&lt;/CopyrightRenewalRecords&gt;")
        v.flush()
        v.close()
}</pre>
]]></content:encoded>
			<wfw:commentRss>http://clownsinmycoffee.net/2008/07/01/fun-with-copyright-renewal-records/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Adding Shift-select with jQuery</title>
		<link>http://clownsinmycoffee.net/2008/04/18/add-shift-select-with-jquery/</link>
		<comments>http://clownsinmycoffee.net/2008/04/18/add-shift-select-with-jquery/#comments</comments>
		<pubDate>Fri, 18 Apr 2008 17:48:07 +0000</pubDate>
		<dc:creator>adam</dc:creator>
				<category><![CDATA[Tools]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[nerdination]]></category>
		<category><![CDATA[closure]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[jquery]]></category>

		<guid isPermaLink="false">http://clownsinmycoffee.net/?p=45</guid>
		<description><![CDATA[I&#8217;ve been using jQuery a bit here and there to add some (I hope) usability enhancements and for light AJAJ work. Today I encountered a situation where I thought adding the &#8220;shift-select&#8221; feature on a longish list of checkboxes would &#8230; <a href="http://clownsinmycoffee.net/2008/04/18/add-shift-select-with-jquery/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>
 I&#8217;ve been using <a href="http://jquery.com">jQuery</a> a bit here and there to add some (I hope) usability enhancements and for light AJAJ work.  Today I encountered a situation where I thought adding the &#8220;shift-select&#8221; feature on a longish list of checkboxes would be a good thing. This sort of feature pops up in webmail interfaces, where you tick off one box, scroll down through 750 spam messages, and then, while holding down shift on the 751st piece of spam in a row, click its checkbox to select all of the rows in between.  It turns out that adding this with jQuery is pretty elegant, so here&#8217;s the code.  I don&#8217;t for a moment think this is the best implementation of this idea, but I was struck by how concise the result was, while supporting &#8212; via a straightforward use of a <a href="http://en.wikipedia.org/wiki/Closure_(computer_science)">closure</a>, multiple instances on the same page.  For giggles, I added a feature that allows you to de-select a range, although I&#8217;m not convinced it works in an intuitive way.
</p>
<p>To use this, you&#8217;ll need jQuery (tested against 1.2.3) in your page, and a CSS selector that matches the checkboxes you want to enable shift-select on. Then call <code>$(selector).shiftSelect();</code> and you&#8217;re done.
</p>
<pre>
 jQuery.fn.shiftSelect = function() {
    var checkboxes = this;
    var lastSelected;
    jQuery(this).click( function(event) {

        if ( !lastSelected ) {
            lastSelected = this;
            return;
        }

        if ( event.shiftKey ) {
            var selIndex = checkboxes.index(this);
            var lastIndex = checkboxes.index(lastSelected);
            /*
             * if you find the "select/unselect" behavior unseemly,
             * remove this assignment and replace 'checkValue'
             * with 'true' below.
             */
            var checkValue = lastSelected.checked;
            if ( selIndex == lastIndex ) {
                return true;
            }

            var end = Math.max(selIndex,lastIndex);
            var start = Math.min(selIndex,lastIndex);
            for(i=start;i&lt;=end;i++) {
                checkboxes[i].checked = checkValue;
            }
        }
        lastSelected = this;
    });
}
</pre>
]]></content:encoded>
			<wfw:commentRss>http://clownsinmycoffee.net/2008/04/18/add-shift-select-with-jquery/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
