Category Archives: News

JHOVE 1.10b3

JHOVE 1.10b3 is now available. This is the release candidate, and there won’t be any further changes beyond the version number designation unless a serious problem shows up.

JHOVE 1.10b2

I’ve put up JHOVE 1.10b2. It has a bit of optimization for the PDF module, though files with huge structure trees are still painfully slow.

JHOVE 1.10b1

I’ve put up a new beta version of JHOVE, 1.10b1, on SourceForge.

The major change since last time is the handling of structure trees in PDF files; this should keep JHOVE from hanging or running out of memory on some PDF files as it used to. Please report any problems soon.

Using DROID with Java 7

It’s been a problem for a while that DROID 6 won’t run under Java 7. Matt Palmer has reported a simple fix for this, requiring only a change in pom.xml. Hopefully a release incorporating this change will appear soon.

Files that Last

Just in case you don’t follow the other channels in which I’ve been talking it up, Files that Last, my new e-book on digital preservation for “everygeek,” is now out. It covers issues of backup, archiving, file formats, and long-term planning. Right now it’s available from Smashwords, Kobo, and the iTunes Store. It hasn’t shown up on Amazon yet, but I expect it will soon.

I’m not exactly impartial on this, but I think you’ll find it a valuable resource for preservation planning on the personal level and for large and small organizations.

Slide show on FITS progress

Last Friday’s CURATEcamp AVpres was a collaboration between several physical sites, using Google Hangout and IRC. I’d been asked if I could do a lightning presentation online on my work on FITS, but I had a commitment on the 19th, so Andrea Goethals at the Harvard Library said she’d do one.

That, unfortunately, was the day the Tsarnaev brothers went on their spree in Cambridge, and Harvard was closed for the day. Paul Wheatley picked up the job on short notice and did a presentation; the slide show is online. Paul suggested people should look at the work I’m putting on the Github repository after I’m finished at the end of April, but I wouldn’t mind if people tried it out now, while I’m still devoting my time to the project.

FFident

A simple but useful tool that’s part of FITS’s collection is FFident, written by Marco Schmidt. He apparently is no longer maintaining it, and its page disappeared from the Web but was retained on the Internet Archive. It seemed like a good idea to make it more readily available, so I’ve put it, using its LGPL license, into a Github repository.

FITS uses its own copy of the source code, so this really isn’t tested at all in its own right, but it’s there for people to play with. I added a build.xml file and organized the code the way Eclipse likes it. I don’t have any plans to support it, but if anyone wants to play with it, it’s there.

JHOVE2 2.1.0

It’s been a long wait, but version 2.1.0 of JHOVE2 is now out! Sheila Morrissey writes:

Version 2.1.0 of JHOVE2 includes 3 new format modules, 1 new identifier module, 1 new displayer module, and several bug fixes and enhancements from the Issues page on the JHOVE2 wiki.

The new format modules included in this release are for the ARC, WARC, and GZIP formats.

The new Identifier module uses the UNIX “file” utility, giving JHOVE2 users the choice of employing either DROID or file for identification of file formats.

The new XSLDisplayer module (which extends XMLDisplayer) can do XSLT transformations on the XML output before displaying it.

This release also reflects a new milestone in the JHOVE2 development community. The new format and identifier modules are the contribution of developers from institutions (Bibliothéque Nationale de France and NETARKIVET.DK) beyond the original project participants (California Digital Library, Portico, and Stanford University Libraries).

The release notes are available on the project site.

Congratulations to everyone who helped bring this release out!

Hackathon at Leeds

I’ve just gotten back from a “hackathon” at the University of Leeds, where about twenty specialists in digital preservation software got together and coded for two days. It was exciting to be with so many people in the field whom I’d previously known only through the Internet or hadn’t seen in years.

After an initial struggle with the university Wi-Fi, we coalesced into four groups to try to get demo-worthy projects done in the time available. There was a lot of interest in the Tika content analysis tool, with two of the projects being directly related to it. I was glad to learn that JHOVE2 is still alive, after a long period of seeming stagnation, and that a new release will be out soon.

It was evident from the discussions that once JHOVE2 becomes more widely used, there will be a lot of confusion about it and JHOVE, which are two entirely different products in spite of the similarity of names. Should JHOVE become “JHOVE Classic”? Should JHOVE2 get a new name? Any thoughts on this?

The bit that I was working on was extending FITS to add Tika to its collection of tools. Spencer McEwen, an ex-colleague from Harvard, nicely headed up the effort; Michael (last name?) from York also participated, and we got occasional help from several people outside our team. The messiest issue we ran into was getting Tika to give us the name of a file’s format (in addition to its MIME type, which is easy); also, we found Tika’s metadata vocabulary rather haphazard. We worked past these problems, though, and were able to get a demo that showed (if you were willing to read through piles of XML output) that Tika was being used along with the other tools and extracting some metadata about JPEG and PDF files.

We worked from Spencer’s fork of Harvard’s GitHub FITS project, which may replace the Google Code repository. This got us into issues of multiple users working on the same project at the same time and resolving code collisions. Git is supposed to have excellent facilities for this sort of thing, but they clearly take some learning. I could “stash” a repository but then couldn’t figure out how to get it back.

It was very energizing just to sit down with people and throw together code without meetings and managers to get in the way, as if I were a college student again. Hopefully some long-lasting results will come of this. I wouldn’t mind doing something like this again, though a trip to England is expensive.

I’ll add links to other posts on the event as I find them: