Category Archives: News

Have some JHOVE with your turkey

The trouble with being self-employed is you’re still at work on the holidays.

I’ve given up on the Kickstarter project for JHOVE, since the lack of any feedback on what features users want suggests insufficient interest, but I’m continuing with the bug fixes. there’s a new JHOVE alpha release on SourceForge, mostly to fix some silly mistakes. Please report any problems.

Here’s the new portion of the release notes:

GENERAL

  1. Jhove.java and JhoveView.java now get their version information from JhoveBase.java. Before it was redundantly kept in three places, and sometimes they didn’t all get updated for a new release. Like in 1.8.
  2. ConfigWriter was in the package edu.harvard.hul.ois.jhove.viewer, which caused a NoClassDefFoundError if non-GUI configurations didn’t include JhoveViewer.jar in the classpath. It’s been moved to edu.harvard.hul.ois.jhove.
  3. Added script packagejhove.sh and made md5.pl part of the CVS repository to make packaging for delivery easier.

TIFF MODULE

  1. TIFF files that had strip or tile offsets but no corresponding byte counts were throwing an exception all the way to the top level. Now they’re correctly being reported as invalid.

XML MODULE

  1. Cleaned up reporting of schemas, Added some small classes to replace the use of string arrays for information structures.

Notes on Friday’s Hackathon

The information on just how Friday’s CURATEcamp 24 hour worldwide file id hackathon will work has been tricky for me to find, so here’s a summary for participants who read this blog:

Twitter: Hashtag #fileidhack
IRC: Server is irc.oftc.net, channel is #openarchives

The information is on the main wiki page for the hackathon, but it’s a little hard to spot with everything else that’s there.

See some of you there!

JHOVE 1.8

I hadn’t heard any bug reports since 1.8 beta, which hopefully means it’s working smoothly for everyone, so I’ve now released JHOVE 1.8. Let me know ASAP if anything’s broken.

Release notes:

GENERAL

1. If JHOVE doesn’t find a configuration file, it creates a default one.

2. Generics widely added to clean up the code.

3. build.xml files fixed to force compilation to Java 1.5.

4. Shell script “jhove” no longer makes you figure out where JAVA_HOME is.

PDF MODULE

1. Several errors in checking for PDF-A compliance were corrected. Aside from fixing some outright bugs, the Contents key for non-text Annotations is no longer checked, as its presence is only recommended and not required.

2. Improved code by Hökan Svenson is now used for finding the trailer.

TIFF MODULE

1. TIFF tag 700 (XMP) now accepts field type 7 (UNDEFINED) as well as 1 (BYTE), on the basis of Adobe’s XMP spec, part 3.

2. If compression scheme 6 is used in a file, an InfoMessage will report that the file uses deprecated compression.

WAVE MODULE

1. The Originator Reference property, found in the Broadcast Wave Extension (BEXT) chunk, is now reported.

JHOVE 1.8b2

Oops… The Java 7 compiler on Ubuntu won’t build backwards-compatible classes, so JHOVE 1.8b1 wouldn’t run on earlier versions of Java. JHOVE 1.8b2 should fix the problem.

“Just solve the problem”

Running concurrently with National Novel Writing Month (aka NaNoWriMo) is “Just Solve the Problem,” an effort to get lots of people to attack the “formats problem” for 30 days.

Here’s “the problem,” slightly expurgated to avoid triggering nannyware:

In the last couple centuries, we’ve created a number of self-encapsulated data sets, or “files”. Be they letters, programs, tapes, stamped foil, piano rolls, you name it. And while many of those data sets are self- evident, a ****-ton are not. They’re obscure. They’re weird. And worst of all, many of them are the vital link to scores of historical information.

First thought: That’s not a statement of a solvable problem. It’s a statement of a situation which gives rise to many different problems. Still, throwing in some of my efforts can lead to professional contacts and maybe even a paying contract, and it’s the kind of thing I’d be doing anyway, so I’ve signed up for the wiki.

Extra points to anyone who can write a novel about the formats problem in 30 days.

Online file ID hackathon

CURATEcamp and Open Planets Foundation will hold a 24-hour (possibly more, due to time zones) online hackathon on file identification on Friday, November 16. The announcement says:

24hour+ live hackathon event where multi-time zone teams work on common technical projects related to the CURATEcamp iPres 2012 file id discussions.

Project proposals can be made by anyone.

We will start the day with New Zealand (GMT +12:00) and end with North America West Coast wrapping up project(s), hopefully with one or two solid deliverables by 12 midnight-ish PST (GMT -8:00).

JHOVE 1.8 beta

A beta version of JHOVE 1.8 is now available for testing. Please report any problems. New stuff:

  • If JHOVE doesn’t find a configuration file, it creates a default one.
  • Generics widely added to clean up the code.
  • Several errors in checking for PDF-A compliance were corrected. Aside from
    fixing some outright bugs, the Contents key for non-text Annotations is
    no longer checked, as its presence is only recommended and not required.
  • Improved code by Hökan Svenson is now used for finding the trailer.
  • TIFF tag 700 (XMP) now accepts field type 7 (UNDEFINED) as well as 1
    (BYTE), on the basis of Adobe’s XMP spec, part 3.
  • If compression scheme 6 is used in a file, an InfoMessage will report
    that the file uses deprecated compression.
  • In WAVE files the Originator Reference property, found in the Broadcast Wave Extension
    (BEXT) chunk, is now reported.

PDF/A-3

The latest version of PDF/A, a subset of PDF suitable for long-term archiving, is now available as ISO standard 19005-3:2012. According to the PDF/A Association Newsletter, “there is only one new feature with PDF/A-3, namely that any source format can be embedded in a PDF/A file.”

This strikes me as a really bad idea. The whole point of PDF/A is to restrict content to a known, self-contained set of options. The new version provides a back door that allows literally anything. The intent, according to the article, is to let archivists save documents in their original format as well as their PDF representation. Certainly saving the originals is a good archiving practice, but it should be done in an archival package, not in a PDF format designed for archiving.

Mission creep afflicts projects of all kinds, and this is a case in point.

Preservation in the geek mainstream

Digital preservation issues are gaining notice in the geek mainstream, the large body of people who are computer-savvy but don’t live in the library-archive niche. Today we have an article in The Register, “British library tracks rise and fall of file formats.” It cites the British Library’s Andy Jackson, supporting the view that file formats remain usable for many years, even if they’re no longer the latest thing.

The Register article is short but nicely done. It naturally skips over issues which Andy’s original article deals with, like just how you reliably determine the formats of files. What’s significant is that it shows that concern about the long-term usability of files isn’t just a concern of a few specialists.