Articles about JHOVE, such as Good GIF Hunting, grab my attention for obvious reasons. This article talks about false positive and negative results, and got me to thinking: What constitutes a “positive” result in file format validation? There are two ways to look at it:
- The default assumption is that the file is of a certain format, perhaps based on its extension, MIME type, or other metadata. The software sets out to see if it violates the format’s requirements. In that case, a positive result is that the file doesn’t conform to the requirements.
- The default assumption is that the file is just a collection of bytes. The software matches it against one or more sets of criteria. A positive result is that the file matches one of them.
An Open Preservation Foundation webinar, “Putting JHOVE to the acid test: A PDF test-set for well-formedness validation in JHOVE,” will be held on November 21, 10 AM GMT (that’s 11 AM in Central Europe and a ludicrous 5 AM or earlier in the US).
My venture into the Techno-Liberty blog didn’t work so well. In fact, I’m getting more views on this blog, in spite of not having posted in months, than I got on my best days on the other blog. So … I’m back.
JHOVE is still doing well too, thanks to excellent work by Carl Wilson and others at the Open Preservation Foundation. There will be an online hack day for JHOVE on April 27. The aim is to find ways to improve JHOVE by improving error reporting, collecting example files, and documenting the preservation impact of JHOVE validation issues. (I think that last one means “Why does McGath’s PDF module suck?” :)
The time listed is 8 AM-8 PM. I asked what time zone that is, and was told it means any and all, from New Zealand the long way around to Hawaii.
Last time I said I’d drop in and didn’t really manage to. This time I won’t make promises, but I’ll try to be around in some form. If nothing else, people can ask me questions about JHOVE in the comments.
Posted in News
Tagged JHOVE, software
I’ve just learned that the Open Preservation Foundation is hosting a JHOVE Online Hack Day on October 11. I’m flattered people are still interested in the work I started doing over a decade ago, though getting some paying work would be far more satisfying.
The Open Preservation Foundation has just announced JHOVE 1.14. The numbering is a bit odd. Version 1.12 never made it to release, and they seem to have skipped 1.13 entirely.
This includes three new modules: the PNG module, which I wrote on a weekend whim, and GZIP and WARC modules adapted from JHOVE2. The UTF-8 module now supports Unicode 7.0.
The release isn’t showing up yet on the OPF website, but I expect that will happen momentarily.
It’s nice to see that the code which I started working on over a decade ago is still alive and useful. Congratulations and thanks to Carl Wilson, who’s now its principal maintainer!
I’ve received an email reply from Becky McGuiness at Open Preservation Foundation to my query about JHOVE’s status. She says that VeraPDF has been taking all the development resources, as I suspected, but that work on JHOVE (in particular, fixing the expired installer) will resume soon.
Update: Here’s a response from Carl Wilson at OPF on the status of JHOVE. It says that the next version will jump from 1.12 to 1.14 (triskaidekaphobia?) and will include several new modules, including my PNG module.
I’ll second Carl’s call for institutions to become OPF supporters. As someone on Twitter said recently, open source software is “free, as in kittens.” It costs money to maintain it. Occasionally people support free software for the sheer love of it, but developers do need to earn a living.
Update 2: OPF reports that JHOVE installer has been fixed.
See this post for important updates.
In December, JHOVE 12.0 was very close to a release. Since then, next to nothing has happened. The installer for the beta version expired, and there’s been an update for that. A couple of pull requests have been merged. Otherwise — nothing.
I think what’s happened is that the Open Preservation Foundation’s very limited resources were pulled onto VeraPDF. That’s certainly a worthwhile endeavor, but it irks me that I handed support of JHOVE over to OPF only to see the ball dropped. I did some work on a PNG module a month ago and submitted a pull request; nothing’s happened since then.
I wouldn’t mind picking JHOVE up agin, but I’m going to be blunt about this: I’m done with working on it for free. If institutions that want JHOVE to be maintained really care about it, they should put up some money, whether it’s to OPF, to me, or to someone else. Open source software isn’t something that magically happens because people love to work without pay.
There’s now a JHOVE PNG module on my GitHub site. The relevant new classes are
com.mcgath.jhove.module.PngModule and everything in the package
com.mcgath.jhove.module.png. I could have continued from Lauri’s code as I mentioned in my previous post, but I like a more factored approach, so I continued with my own code, which has a separate class for each chunk type. Take a look at the top-level file FORKNOTES for what I’ve been doing.
It does a pretty decent job of validating files and extracting metadata now, but some chunk types are still ignored, and there are some design decisions on the extracted metadata that I’m not sure about yet. Also, JHOVE modules usually have a lot of metadata about themselves, and that’s not complete yet. If anyone wants to play with it, keeping in mind that it’s not stable code yet, please do and submit issue reports for bugs and suggestions.
A few days ago, I started writing a PNG module for JHOVE, partly to keep my Java skills up, partly to help me understand the PNG format. After a while I noticed there already is code for a PNG module and has been for a long time. I must have added it to SourceForge. According to a note in the code, Gian Uberto Lauri at Engineering Ingengeria Informatica S.p.a. created it in 2006. A good amount of work clearly went into it, but it won’t compile. It’s located in a non-source code directory (
extramodules/it/eng/jhove/module/png/PngModule.java), so I had to copy it to src/java to try it out.