JHOVE 1.10b1

I’ve put up a new beta version of JHOVE, 1.10b1, on SourceForge.

The major change since last time is the handling of structure trees in PDF files; this should keep JHOVE from hanging or running out of memory on some PDF files as it used to. Please report any problems soon.

A PDF question

A while back, I posted a question on superuser.com about a PDF issue that’s causing problems in JHOVE. So far it hasn’t gotten any answers, so I’m signal-boosting my own question here. Here’s what I asked:

The JHOVE parser for PDF, which I maintain, will sometimes find a non-dictionary object in a PDF’s Annots array. According to section 8.4.1 of the PDF spec, the Annots array holds “an array of annotation dictionaries.” In the case that I’m looking at right now, there’s a keyword of “Annot” instead of a dictionary. Is this an invalid PDF file, or is there a subtlety in the spec which I’ve overlooked?

Answering on stackoverflow.com is best, so other people can see the answer, but if you prefer to answer here, I’ll post or summarize any useful response, with attribution, as an answer over there.

The future of WebM

Yesterday I posted about the WebP still image format, expressing some skepticism about how easily it will catch on. Its companion format for video, WebM, may stand a better chance, though. Images aren’t exciting any more; JPEG delivers photographs well enough, PNG does the same for line art, and there isn’t a compelling reason to change. Video is still in flux, though, and the high bandwidth requirements mean there’s a payoff for any improvements in compression and throughput. The long-running battle among HTML5 stakeholders over video shows that it’s far from being a settled area. Patents are a big issue; if you implement H.264, you have to pay money. Alternatives are attractive from both a technological and an economic standpoint.

With Google pushing WebM and having YouTube, there’s a clear reason for browser developers to support it. YouTube plans to use the new WebM codec, VP9, once it’s complete. I haven’t seen details of the plan, but most likely YouTube will make the same video available with multiple protocols and query the browser’s capabilities to determine whether it can accept VP9. If the advantage is real and users who can get it see fewer pauses in their videos, more browser makers will undoubtedly join the bandwagon.

An eye on WebP

Google has been promoting the WebP still image format for some time, and lately Facebook has added its support. It’s hard to displace the well-entrenched JPEG, but it could happen. It supports both lossy and lossless compression, and Google claims it offers a significant advantage in compression over PNG and JPEG. Google says it’s free of patent restrictions; the container is the familiar RIFF. The VP8 lossy format is available as an IETF RFC; a specification for the lossless format is also available.

The container spec supports XMP and Exif metadata. Canvas width and height can be as much as 16,777,216 pixels, though their product is limited to 4,294,967,296 pixels. As far as I can tell it doesn’t support tiling, though, so partial rendering of huge images in the style of JPEG2000 may not be practical.

Chrome, Opera, and Ice Cream Sandwich offer WebP support, but not many other browsers do. Facebook’s offerings of WebP images have resulted in complaints from users whose browsers can’t read the format. The Firefox development team is starting to warm to it but hasn’t committed to anything yet. Internet Explorer hasn’t even reached that point.

It’s still early to make bets, but WebP increasingly bears watching. I’ve initiated a page for updates and errata for Files that Last with some updated information on WebP. (When I wrote the book, I couldn’t find the lossless spec.)

Using DROID with Java 7

It’s been a problem for a while that DROID 6 won’t run under Java 7. Matt Palmer has reported a simple fix for this, requiring only a change in pom.xml. Hopefully a release incorporating this change will appear soon.

Files that Last

Just in case you don’t follow the other channels in which I’ve been talking it up, Files that Last, my new e-book on digital preservation for “everygeek,” is now out. It covers issues of backup, archiving, file formats, and long-term planning. Right now it’s available from Smashwords, Kobo, and the iTunes Store. It hasn’t shown up on Amazon yet, but I expect it will soon.

I’m not exactly impartial on this, but I think you’ll find it a valuable resource for preservation planning on the personal level and for large and small organizations.

Slide show on FITS progress

Last Friday’s CURATEcamp AVpres was a collaboration between several physical sites, using Google Hangout and IRC. I’d been asked if I could do a lightning presentation online on my work on FITS, but I had a commitment on the 19th, so Andrea Goethals at the Harvard Library said she’d do one.

That, unfortunately, was the day the Tsarnaev brothers went on their spree in Cambridge, and Harvard was closed for the day. Paul Wheatley picked up the job on short notice and did a presentation; the slide show is online. Paul suggested people should look at the work I’m putting on the Github repository after I’m finished at the end of April, but I wouldn’t mind if people tried it out now, while I’m still devoting my time to the project.

FFident

A simple but useful tool that’s part of FITS’s collection is FFident, written by Marco Schmidt. He apparently is no longer maintaining it, and its page disappeared from the Web but was retained on the Internet Archive. It seemed like a good idea to make it more readily available, so I’ve put it, using its LGPL license, into a Github repository.

FITS uses its own copy of the source code, so this really isn’t tested at all in its own right, but it’s there for people to play with. I added a build.xml file and organized the code the way Eclipse likes it. I don’t have any plans to support it, but if anyone wants to play with it, it’s there.