My article on “File Format Analysis Tools for Archivists” is up on LWN.net.
Tag Archives: softwareLink
The Open Preservation Foundation has just announced JHOVE 1.14. The numbering is a bit odd. Version 1.12 never made it to release, and they seem to have skipped 1.13 entirely.
This includes three new modules: the PNG module, which I wrote on a weekend whim, and GZIP and WARC modules adapted from JHOVE2. The UTF-8 module now supports Unicode 7.0.
The release isn’t showing up yet on the OPF website, but I expect that will happen momentarily.
It’s nice to see that the code which I started working on over a decade ago is still alive and useful. Congratulations and thanks to Carl Wilson, who’s now its principal maintainer!
A lot of software design clearly aims not at providing the best experience to the user, but at providing the most impressive demo. Apple does this all the time, or at least that’s the only explanation I can think of for their design decisions. Getting people to applaud in amazement doesn’t get loyal customers if the product is terrible in everyday use, though.
My current Garmin car GPS device is a good example of this. To enter an address, you enter first the street number, then the street, and finally the locality and state together. This sounds very natural, much better than my old device where you started with the state and worked down to the street number. The trouble is that when you use the new device, you find that auto-completion is useless.
I’ve received an email reply from Becky McGuiness at Open Preservation Foundation to my query about JHOVE’s status. She says that VeraPDF has been taking all the development resources, as I suspected, but that work on JHOVE (in particular, fixing the expired installer) will resume soon.
Update: Here’s a response from Carl Wilson at OPF on the status of JHOVE. It says that the next version will jump from 1.12 to 1.14 (triskaidekaphobia?) and will include several new modules, including my PNG module.
I’ll second Carl’s call for institutions to become OPF supporters. As someone on Twitter said recently, open source software is “free, as in kittens.” It costs money to maintain it. Occasionally people support free software for the sheer love of it, but developers do need to earn a living.
Update 2: OPF reports that JHOVE installer has been fixed.
iTunes is horrible and keeps getting worse. The current version has come down with dyslexia; it can’t even play my files in order. On top of that, it supports a poor range of file formats, knowing nothing about popular open formats like FLAC and Ogg Vorbis. QuickTime Player has a saner user interface but the same format limitations. If you want to play music in those formats, you need to look for other software. I’ve just grabbed Vox for OS X, and it handles those files nicely.
It’s not an iTunes replacement, even if all you want to do is play music that’s stored on your computer. You can import your iTunes library, but you can’t view the contents of your playlists (which it calls “collections”) or select items from them. What it does let you do, though, is play FLAC, AAC, ALAC (Apple Lossless), Ogg, MP3, and APE files.
See this post for important updates.
In December, JHOVE 12.0 was very close to a release. Since then, next to nothing has happened. The installer for the beta version expired, and there’s been an update for that. A couple of pull requests have been merged. Otherwise — nothing.
I think what’s happened is that the Open Preservation Foundation’s very limited resources were pulled onto VeraPDF. That’s certainly a worthwhile endeavor, but it irks me that I handed support of JHOVE over to OPF only to see the ball dropped. I did some work on a PNG module a month ago and submitted a pull request; nothing’s happened since then.
I wouldn’t mind picking JHOVE up agin, but I’m going to be blunt about this: I’m done with working on it for free. If institutions that want JHOVE to be maintained really care about it, they should put up some money, whether it’s to OPF, to me, or to someone else. Open source software isn’t something that magically happens because people love to work without pay.
If you look for Java libraries to support specific file formats, you’ll soon come upon the gloomy graveyard of Java APIs. Sun and Oracle have a history of devising nice packages for reading and writing different kinds of files, only to abandon their maintenance. You can still find pages for them, and it takes a close look to figure out that they aren’t supported any more.
Java Advanced Imaging (JAI) was nice in its time. It still has a page on Oracle’s website, but the latest “what’s new” item is dated 2007. The page brags about customer success stories as if it were still usable code. I’ve tried working with it. It’s out of sync with the current com.sun classes, and I got only limited use out of it. In its time it was a good way to read and write image files.
If you’re working with audio files,
javax.sound looks more encouraging. Its API is listed with Java 8. The class
java.sound.sampled.AudioSystem supports reading and writing of audio files. I can’t find a list of the supported formats.
Java does reliably support some formats. Its handling of text encodings is versatile, and
java.util.zip handles ZIP and GZIP.
Third-party code can come to the rescue. For reading and writing PDF, Apache PDFBox looks like the best bet. You can use Apache Tika with lots of formats, if you just need to extract metadata. Another alternative is to use ImageMagick, but it runs natively rather than under the JVM, so you have to invoke it with
exec calls. im4java and JMagick can save some of the tedium. There are open source Java libraries for reading and writing specific file formats. Some may be good, some not.
If you need to deal with the guts of file formats in Java, you’ll usually have to find some good third-party code or start writing your own.
There’s now a JHOVE PNG module on my GitHub site. The relevant new classes are
com.mcgath.jhove.module.PngModule and everything in the package
com.mcgath.jhove.module.png. I could have continued from Lauri’s code as I mentioned in my previous post, but I like a more factored approach, so I continued with my own code, which has a separate class for each chunk type. Take a look at the top-level file FORKNOTES for what I’ve been doing.
It does a pretty decent job of validating files and extracting metadata now, but some chunk types are still ignored, and there are some design decisions on the extracted metadata that I’m not sure about yet. Also, JHOVE modules usually have a lot of metadata about themselves, and that’s not complete yet. If anyone wants to play with it, keeping in mind that it’s not stable code yet, please do and submit issue reports for bugs and suggestions.
A few days ago, I started writing a PNG module for JHOVE, partly to keep my Java skills up, partly to help me understand the PNG format. After a while I noticed there already is code for a PNG module and has been for a long time. I must have added it to SourceForge. According to a note in the code, Gian Uberto Lauri at Engineering Ingengeria Informatica S.p.a. created it in 2006. A good amount of work clearly went into it, but it won’t compile. It’s located in a non-source code directory (
extramodules/it/eng/jhove/module/png/PngModule.java), so I had to copy it to src/java to try it out.