It would be helpful for me to have at least a partial list of institutions that are using Harvard’s FITS (File Information Tool Set). If you can help me build this list, could you reply here or contact me by other usual channels? Thanks.
Monthly Archives: March 2013
It’s been a long wait, but version 2.1.0 of JHOVE2 is now out! Sheila Morrissey writes:
Version 2.1.0 of JHOVE2 includes 3 new format modules, 1 new identifier module, 1 new displayer module, and several bug fixes and enhancements from the Issues page on the JHOVE2 wiki.
The new format modules included in this release are for the ARC, WARC, and GZIP formats.
The new Identifier module uses the UNIX “file” utility, giving JHOVE2 users the choice of employing either DROID or file for identification of file formats.
The new XSLDisplayer module (which extends XMLDisplayer) can do XSLT transformations on the XML output before displaying it.
This release also reflects a new milestone in the JHOVE2 development community. The new format and identifier modules are the contribution of developers from institutions (Bibliothéque Nationale de France and NETARKIVET.DK) beyond the original project participants (California Digital Library, Portico, and Stanford University Libraries).
The release notes are available on the project site.
Congratulations to everyone who helped bring this release out!
I’ve just gotten back from a “hackathon” at the University of Leeds, where about twenty specialists in digital preservation software got together and coded for two days. It was exciting to be with so many people in the field whom I’d previously known only through the Internet or hadn’t seen in years.
After an initial struggle with the university Wi-Fi, we coalesced into four groups to try to get demo-worthy projects done in the time available. There was a lot of interest in the Tika content analysis tool, with two of the projects being directly related to it. I was glad to learn that JHOVE2 is still alive, after a long period of seeming stagnation, and that a new release will be out soon.
It was evident from the discussions that once JHOVE2 becomes more widely used, there will be a lot of confusion about it and JHOVE, which are two entirely different products in spite of the similarity of names. Should JHOVE become “JHOVE Classic”? Should JHOVE2 get a new name? Any thoughts on this?
The bit that I was working on was extending FITS to add Tika to its collection of tools. Spencer McEwen, an ex-colleague from Harvard, nicely headed up the effort; Michael (last name?) from York also participated, and we got occasional help from several people outside our team. The messiest issue we ran into was getting Tika to give us the name of a file’s format (in addition to its MIME type, which is easy); also, we found Tika’s metadata vocabulary rather haphazard. We worked past these problems, though, and were able to get a demo that showed (if you were willing to read through piles of XML output) that Tika was being used along with the other tools and extracting some metadata about JPEG and PDF files.
We worked from Spencer’s fork of Harvard’s GitHub FITS project, which may replace the Google Code repository. This got us into issues of multiple users working on the same project at the same time and resolving code collisions. Git is supposed to have excellent facilities for this sort of thing, but they clearly take some learning. I could “stash” a repository but then couldn’t figure out how to get it back.
It was very energizing just to sit down with people and throw together code without meetings and managers to get in the way, as if I were a college student again. Hopefully some long-lasting results will come of this. I wouldn’t mind doing something like this again, though a trip to England is expensive.
I’ll add links to other posts on the event as I find them:
Someone called Henry Gladney has filed a US patent application which could be used to troll digital archiving operations in an attempt to force them to pay money for what they’ve been doing all along. The patent is more readable than many I’ve seen, and it’s simply a composite of existing standard practices such as schema-based XML, digital authentication, public key authentication, and globally unique identifiers. The application openly states that its PIP (Preservation Information Package) “is also an Archival Information Package as described within the forthcoming ISO OAIS standard.”
I won’t say this is unpatentable; all kinds of absurd software patents have been granted. As far as I’m concerned, software patents are inherently absurd; every piece of software is a new invention, each one builds on techniques used in previously written software, and the pace at which this happens makes a patent’s lifetime of fourteen to twenty years an eternity. If the first person to use any software technique were consistently deemed to own it and others were required to get permission to reuse it, we’d never have ventured outside the caves of assembly language. That’s not the view Congress takes, though.
Patent law does say, though, that you can’t patent something that’s already been done; the term is “prior art.” I can’t see anything in the application that’s new beyond the specific implementation. If it’s only that implementation which is patented, then archivists can and will simply use a different structure and not have to pay patent fees. If the application is granted and is used to get money out of anyone who creates archiving packages, there will be some nasty legal battles ahead, further demonstrating how counterproductive the software patent system is.
Update: There’s discussion on LinkedIn. Registration is required to comment, but not to just read.