Tag Archives: Open Planets Foundation

Open Planets Foundation is now Open Preservation Foundation

The Open Planets Foundation is now the Open Preservation Foundation. This name change reflects its function; the old name grew out of the Planets project and never really made sense.

For the present, it’s still found on the Internet as openplanetsfoundation.org.

FITS website

Last spring, I attended a Hackathon at the University of Leeds, which resulted in my getting a SPRUCE Grant for a month’s work enhancing FITS, a tool which at the time was technically open source but which the Harvard Library treated a bit possessively. After I finished, it seemed for a while that nothing was happening with my work, but it was just a matter of being patient enough. Collaboration between Harvard and the Open Planets Foundation has resulted in a more genuinely open FITS, which now has its own website. There’s also a GitHub repository with five contributors, none of which are me since my work was on an earlier repository that was incorporated into this one.

It really makes me happy to see my work reach this kind of fruition, even if I’m so busy on other things now that I don’t have time to participate.

The FITS Blitz

Back in May, after an enjoyable trip to the University of Leeds, I worked for a month on improving the Harvard Library’s FITS tool for combining the results of several file format identification and validation tools. The results were well received and the Harvard Library incorporated some of my work in the main line of FITS. Still, there were a lot of loose ends left and more work to be done.

Things are picking up again with a “FITS Blitz” that’s starting this week. Paul Wheatley writes that “in partnership with Harvard and the Open Planets Foundation (with support from Creative Pragmatics), SPRUCE is supporting a two week project to get the technical infrastructure in place to make FITS genuinely maintainable by the community. ‘FITS Blitz’ will merge the existing code branches and establish a comprehensive testing setup so that further code developments only find their way in when there is confidence that other bits of functionality haven’t been damaged by the changes.”

I’ve moved on to other things, so I won’t be able to participate, but I wish them every success.

Optimizing FITS

January’s mostly over, and I’ve only posted three times to this blog. Files that Last has been keeping me busy. My posting should pick up again before long, once I get a draft out to first readers.

One thing I’ve been looking at, with an eye to the upcoming SPRUCE Hackathon, is things that can be done with FITS. I’ve written up the results of some profiling experiments and quick attempts at optimization. FITS puts together a lot of tools for extracting file metadata, but there have been some complaints that it’s not as fast as it might be. The first results were surprising; the easiest way to get a small improvement was to factor out the initialization of namespace URIs for parsing XML. You wouldn’t think that would make any detectable difference, but the initialization of URIs in Xerces is surprisingly slow.

Another possibility to explore is improving the connection between FITS and JHOVE. Even though JHOVE is intended for use as a callable library, among other things, it’s designed to write to an output file. Some simple changes would let it provide an in-memory response without writing a file, which would be more useful to an application like FITS.

Worldwide file ID hackathon

What happens when you get a bunch of developers from all over the world together on the Internet for one day of intensive work? A lot! For one thing, there’s the “Louis Wu’s birthday” effect; this “24-hour hackathon” was more like 48 hours. (In Niven and Pournelle’s Ringworld, Wu makes his birthday party last 48 hours by hopping from time zone to time zone with teleporters.) We didn’t have teleporters, so we made do with Twitter, IRC, and Google Hangouts. People in Australia started, and things wound down on the US west coast or maybe Hawaii.

Several things were happening, but the two most notable from my perspective were the Format Corpus project and the fork of FITS.

I watched the Format Corpus project with interest, though I didn’t participate in it. This is an openly licensed set of small example files in a wide variety of formats, as well as signature information. It could have a lot of uses; I’ll need to incorporate it into JHOVE testing.

People had been talking in advance of the hackathon about the need to improve the efficiency of FITS, a meta-tool developed by Harvard’s OIS (now LTS) to run various validation tools together on files. Internal ingest was and is the main purpose of FITS, but it was put up as open source and has been used in other places. I’d never worked on FITS proper at OTS (though I wrote parts of OTS-Schemas, which was broken out of FITS), but I’m familiar with the OIS style of coding, so I forked it on to Github and started looking at it. When Randy Stern at Harvard expressed concerns that the fork would create confusion (though I’d put a clear disclaimer from the beginning that it wasn’t the official version), I renamed it to OpenFITS.

The work is summarized on the hackathon wiki. The results are unclear at this point, but just opening the code up to more eyes could produce long-term benefits. The very first file I tested FITS on turned up a bug in JHOVE, and I wound up doing more work improving JHOVE than FITS. One source of potential significant improvements that I added was the ability to specify local copies of any XML schema. If you’re validating a lot of XML files that use the same schema, JHOVE has to get it from the Web, slowing the processing down. It’s necessary to do local configuration to take advantage of this, since every installation could need different schemas. The code is checked in but not available in a build yet.

It was thrilling to get to work with such an enthusiastic crowd from so many different places and, in a single 48-hour day, to see other people picking up my work and running it. I think there are already two or three third-generation forks of OpenFITS, including a Debian-Ubuntu package.

Notes on Friday’s Hackathon

The information on just how Friday’s CURATEcamp 24 hour worldwide file id hackathon will work has been tricky for me to find, so here’s a summary for participants who read this blog:

Twitter: Hashtag #fileidhack
IRC: Server is irc.oftc.net, channel is #openarchives

The information is on the main wiki page for the hackathon, but it’s a little hard to spot with everything else that’s there.

See some of you there!

Online file ID hackathon

CURATEcamp and Open Planets Foundation will hold a 24-hour (possibly more, due to time zones) online hackathon on file identification on Friday, November 16. The announcement says:

24hour+ live hackathon event where multi-time zone teams work on common technical projects related to the CURATEcamp iPres 2012 file id discussions.

Project proposals can be made by anyone.

We will start the day with New Zealand (GMT +12:00) and end with North America West Coast wrapping up project(s), hopefully with one or two solid deliverables by 12 midnight-ish PST (GMT -8:00).

FIDO 1.0.0 from Open Planets Foundation

Open Planets Foundation has announced FIDO 1.0.0, “a Python command line tool to identify the file formats of digital objects. A lot of improvements to the code and functionality have been made.”