Worldwide file ID hackathon

What happens when you get a bunch of developers from all over the world together on the Internet for one day of intensive work? A lot! For one thing, there’s the “Louis Wu’s birthday” effect; this “24-hour hackathon” was more like 48 hours. (In Niven and Pournelle’s Ringworld, Wu makes his birthday party last 48 hours by hopping from time zone to time zone with teleporters.) We didn’t have teleporters, so we made do with Twitter, IRC, and Google Hangouts. People in Australia started, and things wound down on the US west coast or maybe Hawaii.

Several things were happening, but the two most notable from my perspective were the Format Corpus project and the fork of FITS.

I watched the Format Corpus project with interest, though I didn’t participate in it. This is an openly licensed set of small example files in a wide variety of formats, as well as signature information. It could have a lot of uses; I’ll need to incorporate it into JHOVE testing.

People had been talking in advance of the hackathon about the need to improve the efficiency of FITS, a meta-tool developed by Harvard’s OIS (now LTS) to run various validation tools together on files. Internal ingest was and is the main purpose of FITS, but it was put up as open source and has been used in other places. I’d never worked on FITS proper at OTS (though I wrote parts of OTS-Schemas, which was broken out of FITS), but I’m familiar with the OIS style of coding, so I forked it on to Github and started looking at it. When Randy Stern at Harvard expressed concerns that the fork would create confusion (though I’d put a clear disclaimer from the beginning that it wasn’t the official version), I renamed it to OpenFITS.

The work is summarized on the hackathon wiki. The results are unclear at this point, but just opening the code up to more eyes could produce long-term benefits. The very first file I tested FITS on turned up a bug in JHOVE, and I wound up doing more work improving JHOVE than FITS. One source of potential significant improvements that I added was the ability to specify local copies of any XML schema. If you’re validating a lot of XML files that use the same schema, JHOVE has to get it from the Web, slowing the processing down. It’s necessary to do local configuration to take advantage of this, since every installation could need different schemas. The code is checked in but not available in a build yet.

It was thrilling to get to work with such an enthusiastic crowd from so many different places and, in a single 48-hour day, to see other people picking up my work and running it. I think there are already two or three third-generation forks of OpenFITS, including a Debian-Ubuntu package.

Comments are closed.