My article on “File Format Analysis Tools for Archivists” is up on LWN.net.
- Follow Mad File Format Science on WordPress.com
The Open Preservation Foundation has just announced JHOVE 1.14. The numbering is a bit odd. Version 1.12 never made it to release, and they seem to have skipped 1.13 entirely.
This includes three new modules: the PNG module, which I wrote on a weekend whim, and GZIP and WARC modules adapted from JHOVE2. The UTF-8 module now supports Unicode 7.0.
The release isn’t showing up yet on the OPF website, but I expect that will happen momentarily.
It’s nice to see that the code which I started working on over a decade ago is still alive and useful. Congratulations and thanks to Carl Wilson, who’s now its principal maintainer!
I’ve received an email reply from Becky McGuiness at Open Preservation Foundation to my query about JHOVE’s status. She says that VeraPDF has been taking all the development resources, as I suspected, but that work on JHOVE (in particular, fixing the expired installer) will resume soon.
Update: Here’s a response from Carl Wilson at OPF on the status of JHOVE. It says that the next version will jump from 1.12 to 1.14 (triskaidekaphobia?) and will include several new modules, including my PNG module.
I’ll second Carl’s call for institutions to become OPF supporters. As someone on Twitter said recently, open source software is “free, as in kittens.” It costs money to maintain it. Occasionally people support free software for the sheer love of it, but developers do need to earn a living.
Update 2: OPF reports that JHOVE installer has been fixed.
See this post for important updates.
In December, JHOVE 12.0 was very close to a release. Since then, next to nothing has happened. The installer for the beta version expired, and there’s been an update for that. A couple of pull requests have been merged. Otherwise — nothing.
I think what’s happened is that the Open Preservation Foundation’s very limited resources were pulled onto VeraPDF. That’s certainly a worthwhile endeavor, but it irks me that I handed support of JHOVE over to OPF only to see the ball dropped. I did some work on a PNG module a month ago and submitted a pull request; nothing’s happened since then.
I wouldn’t mind picking JHOVE up agin, but I’m going to be blunt about this: I’m done with working on it for free. If institutions that want JHOVE to be maintained really care about it, they should put up some money, whether it’s to OPF, to me, or to someone else. Open source software isn’t something that magically happens because people love to work without pay.
There’s now a JHOVE PNG module on my GitHub site. The relevant new classes are
com.mcgath.jhove.module.PngModule and everything in the package
com.mcgath.jhove.module.png. I could have continued from Lauri’s code as I mentioned in my previous post, but I like a more factored approach, so I continued with my own code, which has a separate class for each chunk type. Take a look at the top-level file FORKNOTES for what I’ve been doing.
It does a pretty decent job of validating files and extracting metadata now, but some chunk types are still ignored, and there are some design decisions on the extracted metadata that I’m not sure about yet. Also, JHOVE modules usually have a lot of metadata about themselves, and that’s not complete yet. If anyone wants to play with it, keeping in mind that it’s not stable code yet, please do and submit issue reports for bugs and suggestions.
A few days ago, I started writing a PNG module for JHOVE, partly to keep my Java skills up, partly to help me understand the PNG format. After a while I noticed there already is code for a PNG module and has been for a long time. I must have added it to SourceForge. According to a note in the code, Gian Uberto Lauri at Engineering Ingengeria Informatica S.p.a. created it in 2006. A good amount of work clearly went into it, but it won’t compile. It’s located in a non-source code directory (
extramodules/it/eng/jhove/module/png/PngModule.java), so I had to copy it to src/java to try it out.
The British Library’s Digital Preservation Team has issued a report on WAV Format Preservation Assessment. It cites the broad adoption of WAV and its extension BWF (Broadcast Wave Format) as a positive for preservation purposes and offers only a few cautions. I’m flattered by the recommendation, “Wherever possible and appropriate to the workflow, submitted content should be validated using JHOVE.”
Due to a misunderstanding of mine, there wasn’t a free preview lecture with my course on file format identification tools, even though the promotion video said there was. I’ve rectified that, and the introductory lecture is now available for viewing … Continue reading
Udemy does strange things with course pricing. They’ve put my course on file format identification tools on sale for $10, through January 11. This is the same price as my introductory coupon code, which expired last night.
The only problem is that when you sign up on Udemy without a coupon code, I only get to keep half the money. If you use a coupon code, I keep almost all of it. So in self-defense, I am declaring a price war against myself. With the coupon code PRICEWAR01, you can get the course for just nine dollars! If you use the code, I keep more money at that price than at Udemy’s $10 price, so please use the code.
The code expires January 11, the same day as Udemy’s sale.
Several people have already signed up for my Udemy course on file, ExifTool, DROID, JHOVE, and Tika. It looks as if most of them have taken advantage of the discount code INTRO1 to get it at just $10 and are planning to take it later on. This makes complete sense, since the code is good just till the end of this year. If you’re taking the course, feel free to start a discussion or ask questions; I’ll answer them to the best of my ability. If you’re a specialist in one of these tools and would like to see how I’m teaching it, I’ll offer you a free pass if your credentials are good.