<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>

<channel>
	<title>Clowns In My Coffee &#187; conferences</title>
	<atom:link href="http://clownsinmycoffee.net/category/conferences/feed/" rel="self" type="application/rss+xml" />
	<link>http://clownsinmycoffee.net</link>
	<description>Inanity of the most cogent sort you can find.</description>
	<pubDate>Thu, 02 Oct 2008 11:41:56 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.2</generator>
	<language>en</language>
			<item>
		<title>Fun With Copyright Renewal Records</title>
		<link>http://clownsinmycoffee.net/2008/07/01/fun-with-copyright-renewal-records/</link>
		<comments>http://clownsinmycoffee.net/2008/07/01/fun-with-copyright-renewal-records/#comments</comments>
		<pubDate>Tue, 01 Jul 2008 03:32:08 +0000</pubDate>
		<dc:creator>adam</dc:creator>
		
		<category><![CDATA[RDF]]></category>

		<category><![CDATA[Tools]]></category>

		<category><![CDATA[conferences]]></category>

		<category><![CDATA[nerdination]]></category>

		<guid isPermaLink="false">http://clownsinmycoffee.net/?p=46</guid>
		<description><![CDATA[Based on an enormous amount of work by contributors to Project Gutenberg and the Distributed Proofreaders, combined with healthy sourcing of the US copyright office&#8217;s records, Google has compiled a a list of works originally copyrighted between 1923 and 1963 which have been renewed at some point, the upshot being that if a given work [...]]]></description>
			<content:encoded><![CDATA[<p>Based on an enormous amount of work by contributors to <a href="http://www.gutenberg.org/wiki/Main_Page">Project Gutenberg</a> and the <a href="http://www.pgdp.net/c/">Distributed Proofreaders</a>, combined with healthy sourcing of the US <a href="http://www.copyright.gov/records/">copyright office&#8217;s records</a>, Google has compiled a <a href="http://booksearch.blogspot.com/2008/06/us-copyright-renewal-records-available.html">a list of works originally copyrighted between 1923 and 1963</a> which have been renewed at some point, the upshot being that if a given work published in that time span is <em>not</em> on the list, it&#8217;s likely in the public domain.
</p>
<p>
One problem with the list that the database is a 370+ megabyte XML file, which is hard to load up in an XML-aware editor and even caused <a title="eXist open source XML database" href="http://exist.sourceforge.net">eXist</a> to choke.  So I broke it up into chunks with a shortish Groovy script, for neat ingestion into an XML database.  The heart of the script is a SAX handler that basically churns each record in the XML file into a Groovy object, and a closure (there&#8217;s that word again!) that handles each record as it is constructed.  As written, the script simply breaks the big file into a bunch of files, one for each year (you will of course have to edit the paths).  By supplying a different closure, you could do all sorts of different things with the records, e.g. stuff them into a relational database.
</p>
<p>
In the spirit of the thing, the script is in the public domain &#8212; but I make no representations as to the quality, idiomaticity or overall efficiency of the script; despite being SAX-based, it still manages to chew up quite a bit of memory, so watch out.  Note that you will need <a href="http://commons.apache.org/lang">Apache Commons Lang</a> (say, version 2.4) on the classpath (e.g. in <code>$HOME/.groovy/lib</code>) for this script to work.  Developed with Groovy 1.5.6.
</p>
<p style="color: red">I&#8217;ve tried to stop wordpress from &#8216;prettyfying&#8217; the output, which appears to mangle quotes.  I hope to have that fixed soon &#8230;</p>
<pre style="border: 1px solid #000; padding: 1em; background-color: #ccc;">import org.xml.sax.helpers.DefaultHandler
import org.xml.sax.Attributes
import org.xml.sax.helpers.XMLReaderFactory
import org.xml.sax.InputSource

import org.apache.commons.lang.StringEscapeUtils
import org.xml.sax.Locator

/**
 * Represents an individual &lt;Record&gt; element
 * in the document.
 **/
class Record {
    def file

    def lines

    def recno

    def title

   def copyrightYear

    def copyrights = []

    def renewalYear

    def renewals = [] 

    // where it was published
    def published

    // rare!
    def note

    // source of the copyright info
    def source
    def snippet
    def md5sum

    // contributors, holders, and pseudonyms
    def people = []

    /**
     * Get the XML representing this element.  Note
     * that proper functioning here depends on how the
     * handler builds the elements.
     * @return a string containing this record's XML.
     */
    def xml() {
        def buf = new StringBuffer()
        buf &lt;&lt; """
&lt;Record&gt;
    &lt;Title&gt;${title}&lt;/Title&gt;
    &lt;File&gt;${file}&lt;/File&gt;
    &lt;Lines&gt;${lines}&lt;/Lines&gt;
    &lt;MD5Sum&gt;${md5sum}&lt;/MD5Sum&gt;
"""
        if (snippet) {
            buf &lt;&lt; "\t&lt;Snippet&gt;${snippet}&lt;/Snippet&gt;\n"
        }
        if (note) {
            buf &lt;&lt;"\t&lt;Note&gt;${note}&lt;/Note&gt;\n"
        }
        buf &lt;&lt;
"""
    &lt;Source&gt;${source}&lt;/Source&gt;
    &lt;CopyrightYear&gt;${copyrightYear}&lt;/CopyrightYear&gt;
    &lt;RenewalYear&gt;${renewalYear}&lt;/RenewalYear&gt;
"""
        copyrights.each() {
            buf &lt;&lt; it.xml()
        }
        renewals.each() {
            buf &lt;&lt; it.xml()
        }
        people.each() {
                buf &lt;&lt; it.xml()
        }
        buf &lt;&lt; "&lt;/Record&gt;\n"
        return buf.toString()
    }
}

/**
 * An inelegant class representing the elements that denote
 * people (copyright holders, contributors, aliases, etc.)
 **/
class Person {

    static ELEMENTS = ["Holder" :   [ "Name", "Type" ],
                        "Contrib" : [ "Name", "Role" ],
                        "Pseudonym" : [ "Pseudo", "Real" ],
                        "Neenym" : [ "Nee", "Now" ],
                        "Aka" : [ "Alias", "Real" ] ]

    static ROLES = ELEMENTS.keySet()

    def role

    def name

    def honorific

    def type

    def xml() {
        def firstElement = ELEMENTS[role][0]
        def secondElement = ELEMENTS[role][1]
        def buf = new StringBuffer()

        buf &lt;&lt; """
&lt;${role}&gt;
    &lt;${firstElement}&gt;${name}&lt;/${firstElement}&gt;
    &lt;${secondElement}&gt;$type&lt;/${secondElement}&gt;"""
    if ( honorific ) {
        buf &lt;&lt; "\t&lt;Hon&gt;${honorific}&lt;/Hon&gt;\n"
        }
    buf &lt;&lt; "&lt;/${role}&gt;\n"
    return buf.toString()
    }
}

/**
 * Represents copyright and renewal date elements.
 */
class RecordDate {

	static ELEMENTS = ["Copyright", "Renewal"]

    def role
    def date
    def id
    def xml() {
        return """&lt;${role}&gt;
    &lt;Date&gt;${date}&lt;/Date&gt;
    &lt;Id&gt;${id}&lt;/Id&gt;
&lt;/${role}&gt;"""
    }
}

/**
 * SAX handler that turns each &lt;code&gt;Record&lt;/code&gt; element
 * into a &lt;code&gt;Record&lt;/code&gt; domain object.
 **/
class RecordHandler extends DefaultHandler {

    /**
     * Stack of strings that represents the current
     * element context.
     **/
    Stack context = new Stack()

    /**
     * the current record being built.
     **/
    Record currentRec

    /**
     * the current Person element being built.
     **/
    Person currentPerson

    /**
     * The current date information being collected.
     **/
    RecordDate currentRecDate

    /**
     * A closure which will be called as each record is
     * read in.
     **/
    def recordListener

    /**
     * a buffer to collect the current text, since SAX might
     * not report all contiguous chunks of text at once.
     **/
    StringBuilder currentText = new StringBuilder()

    def locator

    @Override
    public void setDocumentLocator(Locator locator)
    {    println "Got a locator: ${locator}"
        this.locator = locator
    }

    @Override
    public void startElement(String uri, String localName, String qName, Attributes atts)
    {
        context &lt;&lt; localName
        switch( localName ) {
            case "Record":
                currentRec = new Record()
                break
            case Person.ROLES:
                currentPerson = new Person()
                currentPerson.role = localName
                break
            case RecordDate.ELEMENTS:
                currentRecDate = new RecordDate()
                currentRecDate.role = localName
                break
        }
    }

    @Override
    public void characters(char [] ch, int start, int len)
    {
        currentText.append(ch,start,len)
    }

    @Override
    public void endElement(String uri, String localName, String qName)
    {
        String txt = StringEscapeUtils.escapeXml(currentText.toString().trim())
        switch(localName) {
            case Person.ROLES:
                currentRec.people &lt;&lt; currentPerson
                break
            case ["Type", "Role", "Real", "Now"]:
                currentPerson.type = txt
                break
            case ["Name", "Pseudo", "Nee", "Alias"]:
                currentPerson.name = txt
                break
            case "Hon":
                currentPerson.honorific = txt
               break;
            case "CopyrightYear":
                currentRec.copyrightYear = Integer.parseInt(txt)
                break
            case "Date":
                currentRecDate.date = txt
                break
            case "Id":
                currentRecDate.id = txt
                break
            case "Copyright":
                currentRec.copyrights &lt;&lt;currentRecDate
                break
            case "RenewalYear":
                currentRec.renewalYear = Integer.parseInt(txt)
                break
            case "Renewal":
                currentRec.renewals &lt;&lt; currentRecDate
                break
            case "Recno":
                currentRec.recno = txt
                break
            case "Source":
                currentRec.source = txt
                break
            case "Lines":
                currentRec.lines = txt
                break
            case "MD5Sum":
                currentRec.md5sum = txt
                break
            case "File":
                currentRec.file = txt
                break
            case "Snippet":
                currentRec.snippet = txt
                break
            case "Title":
                currentRec.title = txt
                break
            case "Published":
                currentRec.published = txt
                break
            case "Record":
                recordListener(currentRec)
                break
            case "Note":
                currentRec.note = txt
                break
            case "CopyrightRenewalRecords":
                break
            default:
                println "Unrecognized element '${localName}' at line ${locator.lineNumber}"
                System.exit(1)
            }
        currentText.length = 0
    }

}

def file = new File("input-dir/google-renewals-20080624/google-renewals-20080624.xml")

/**
 * A listener that will output each record into a different stream depending
 * on the CopyrightYear of the record.
 **/
def listenerBase = { Map streams, Record it -&gt;
    if ( !streams.containsKey(it.copyrightYear) ) {
        def f = new File("/output/dir/copyright-${it.copyrightYear}.xml")
        println "creating ${f.absolutePath}"
        def stream = f.newWriter()
        streams[it.copyrightYear] = stream
        stream.append("&lt;CopyrightRenewalRecords&gt;")
    }
    Writer s = (Writer)streams[it.copyrightYear]
    s.append(it.xml())
    s.flush()
}

def reader = XMLReaderFactory.createXMLReader()
def handler = new RecordHandler()
def outputStreams = [:]
handler.recordListener = listenerBase.curry(outputStreams)
reader.setContentHandler( handler )

try {
    reader.parse( new InputSource( file.newInputStream() ) )
} catch (Exception x) {
    x.printStackTrace()
    println "Error at line ${handler.locator.lineNumber}"
}

outputStreams.each() {
    k, BufferedOutputStream v -&gt;
        println "Closing ${k}"
        v.append("&lt;/CopyrightRenewalRecords&gt;")
        v.flush()
        v.close()
}</pre>
]]></content:encoded>
			<wfw:commentRss>http://clownsinmycoffee.net/2008/07/01/fun-with-copyright-renewal-records/feed/</wfw:commentRss>
		</item>
		<item>
		<title>It&#8217;s On (Again)</title>
		<link>http://clownsinmycoffee.net/2007/07/05/its-on-again/</link>
		<comments>http://clownsinmycoffee.net/2007/07/05/its-on-again/#comments</comments>
		<pubDate>Thu, 05 Jul 2007 13:18:50 +0000</pubDate>
		<dc:creator>adam</dc:creator>
		
		<category><![CDATA[BarCampRDU]]></category>

		<category><![CDATA[conferences]]></category>

		<guid isPermaLink="false">http://clownsinmycoffee.net/2007/07/05/its-on-again/</guid>
		<description><![CDATA[ BarCampRDU instance the second, that is.  August 4th, 2007, at Red Hat&#8217;s Centennial Campus location.  Fred Stutzman issues the call for signup and for some more organizers.
]]></description>
			<content:encoded><![CDATA[<p> <a href="http://barcamp.org/BarCampRDU">BarCampRDU</a> instance the second, that is.  August 4th, 2007, at Red Hat&#8217;s Centennial Campus location.  <a href="http://chimprawk.blogspot.com/2007/07/sign-up-for-barcamprdu-2007.html">Fred Stutzman issues the call</a> for signup and for some more organizers.</p>
]]></content:encoded>
			<wfw:commentRss>http://clownsinmycoffee.net/2007/07/05/its-on-again/feed/</wfw:commentRss>
		</item>
		<item>
		<title>BarCampRDU</title>
		<link>http://clownsinmycoffee.net/2006/07/20/barcamprdu/</link>
		<comments>http://clownsinmycoffee.net/2006/07/20/barcamprdu/#comments</comments>
		<pubDate>Thu, 20 Jul 2006 00:11:20 +0000</pubDate>
		<dc:creator>adam</dc:creator>
		
		<category><![CDATA[conferences]]></category>

		<guid isPermaLink="false">http://clownsinmycoffee.net/2006/07/20/barcamprdu/</guid>
		<description><![CDATA[So, BarCampRDU is almost upon us.  I like the idea of a general-purpose nerdfest, but besides the fact that it&#8217;s in the area, there are some good sessions lined up (man, we have a lot of Atom fans in the RDU area), and, almost as important, good food.   Hopefully Scott&#8217;s open source [...]]]></description>
			<content:encoded><![CDATA[<p>So, <a title="Bar Camp RDU home page" href="http://barcamp.org/BarCampRDU">BarCampRDU</a> is almost upon us.  I like the idea of a <a title="Bar Camp main site" href="http://barcamp.org">general-purpose nerdfest</a>, but besides the fact that it&#8217;s in the area, there are some <a title="BarCamp RDU sessions" href="http://barcamp.org/BarCampRDUClaimedSessions">good sessions</a> lined up (man, we have a lot of Atom fans in the RDU area), and, almost as important, good food.   Hopefully Scott&#8217;s open source VOIP session will not be <a title="Teen Girl Squad Episode 11 (flash)" href="http://www.homestarrunner.com/tgs11.html">blinding</a>.</p>
<p>Perhaps I will do some liveblogging of the event.</p>
]]></content:encoded>
			<wfw:commentRss>http://clownsinmycoffee.net/2006/07/20/barcamprdu/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>
