Tag Archives: software

Designing to the demo is a mistake

A lot of software design clearly aims not at providing the best experience to the user, but at providing the most impressive demo. Apple does this all the time, or at least that’s the only explanation I can think of for their design decisions. Getting people to applaud in amazement doesn’t get loyal customers if the product is terrible in everyday use, though.

My current Garmin car GPS device is a good example of this. To enter an address, you enter first the street number, then the street, and finally the locality and state together. This sounds very natural, much better than my old device where you started with the state and worked down to the street number. The trouble is that when you use the new device, you find that auto-completion is useless.
Continue reading

Update on JHOVE

I Aten't DeadI’ve received an email reply from Becky McGuiness at Open Preservation Foundation to my query about JHOVE’s status. She says that VeraPDF has been taking all the development resources, as I suspected, but that work on JHOVE (in particular, fixing the expired installer) will resume soon.

Update: Here’s a response from Carl Wilson at OPF on the status of JHOVE. It says that the next version will jump from 1.12 to 1.14 (triskaidekaphobia?) and will include several new modules, including my PNG module.

I’ll second Carl’s call for institutions to become OPF supporters. As someone on Twitter said recently, open source software is “free, as in kittens.” It costs money to maintain it. Occasionally people support free software for the sheer love of it, but developers do need to earn a living.

Update 2: OPF reports that JHOVE installer has been fixed.

Want FLAC on your Mac? Try Vox

Vox application windowiTunes is horrible and keeps getting worse. The current version has come down with dyslexia; it can’t even play my files in order. On top of that, it supports a poor range of file formats, knowing nothing about popular open formats like FLAC and Ogg Vorbis. QuickTime Player has a saner user interface but the same format limitations. If you want to play music in those formats, you need to look for other software. I’ve just grabbed Vox for OS X, and it handles those files nicely.

It’s not an iTunes replacement, even if all you want to do is play music that’s stored on your computer. You can import your iTunes library, but you can’t view the contents of your playlists (which it calls “collections”) or select items from them. What it does let you do, though, is play FLAC, AAC, ALAC (Apple Lossless), Ogg, MP3, and APE files.
Continue reading

Is JHOVE dead in the water again?

See this post for important updates.

JHOVE logoIn December, JHOVE 12.0 was very close to a release. Since then, next to nothing has happened. The installer for the beta version expired, and there’s been an update for that. A couple of pull requests have been merged. Otherwise — nothing.

I think what’s happened is that the Open Preservation Foundation’s very limited resources were pulled onto VeraPDF. That’s certainly a worthwhile endeavor, but it irks me that I handed support of JHOVE over to OPF only to see the ball dropped. I did some work on a PNG module a month ago and submitted a pull request; nothing’s happened since then.

I wouldn’t mind picking JHOVE up agin, but I’m going to be blunt about this: I’m done with working on it for free. If institutions that want JHOVE to be maintained really care about it, they should put up some money, whether it’s to OPF, to me, or to someone else. Open source software isn’t something that magically happens because people love to work without pay.

The Java file format API graveyard

If you look for Java libraries to support specific file formats, you’ll soon come upon the gloomy graveyard of Java APIs. Sun and Oracle have a history of devising nice packages for reading and writing different kinds of files, only to abandon their maintenance. You can still find pages for them, and it takes a close look to figure out that they aren’t supported any more.

Java Advanced Imaging (JAI) was nice in its time. It still has a page on Oracle’s website, but the latest “what’s new” item is dated 2007. The page brags about customer success stories as if it were still usable code. I’ve tried working with it. It’s out of sync with the current com.sun classes, and I got only limited use out of it. In its time it was a good way to read and write image files.

Java Media Framework (JMF) runs on a 166 MHz Pentium or 160 MHz PowerPC. The downloaded jars are dated May 1, 2003. It had a nice list of supported formats.

If you’re working with audio files, javax.sound looks more encouraging. Its API is listed with Java 8. The class java.sound.sampled.AudioSystem supports reading and writing of audio files. I can’t find a list of the supported formats.

Java does reliably support some formats. Its handling of text encodings is versatile, and java.util.zip handles ZIP and GZIP.

Third-party code can come to the rescue. For reading and writing PDF, Apache PDFBox looks like the best bet. You can use Apache Tika with lots of formats, if you just need to extract metadata. Another alternative is to use ImageMagick, but it runs natively rather than under the JVM, so you have to invoke it with exec calls. im4java and JMagick can save some of the tedium. There are open source Java libraries for reading and writing specific file formats. Some may be good, some not.

If you need to deal with the guts of file formats in Java, you’ll usually have to find some good third-party code or start writing your own.

JHOVE PNG module, progress report

There’s now a JHOVE PNG module on my GitHub site. The relevant new classes are com.mcgath.jhove.module.PngModule and everything in the package com.mcgath.jhove.module.png. I could have continued from Lauri’s code as I mentioned in my previous post, but I like a more factored approach, so I continued with my own code, which has a separate class for each chunk type. Take a look at the top-level file FORKNOTES for what I’ve been doing.

It does a pretty decent job of validating files and extracting metadata now, but some chunk types are still ignored, and there are some design decisions on the extracted metadata that I’m not sure about yet. Also, JHOVE modules usually have a lot of metadata about themselves, and that’s not complete yet. If anyone wants to play with it, keeping in mind that it’s not stable code yet, please do and submit issue reports for bugs and suggestions.


A few days ago, I started writing a PNG module for JHOVE, partly to keep my Java skills up, partly to help me understand the PNG format. After a while I noticed there already is code for a PNG module and has been for a long time. I must have added it to SourceForge. According to a note in the code, Gian Uberto Lauri at Engineering Ingengeria Informatica S.p.a. created it in 2006. A good amount of work clearly went into it, but it won’t compile. It’s located in a non-source code directory (extramodules/it/eng/jhove/module/png/PngModule.java), so I had to copy it to src/java to try it out.
Continue reading


The UK National Archive quietly released DROID 6.2 this month. I noticed only because of some mentions on Twitter. The file dates indicate the update was released on February 16. Here’s the new portion of the changelog:
Continue reading

File fuzzing

Recently I came across the term “fuzzing” for intentionally damaging files to test the software that reads them. Most of the material I’ve found doesn’t provide a useful introduction; they assume that if you know the term, you already understand something about it. One good article is “Fuzzing — Mutation vs. Generation” on the Infosec website. According to that article, fuzzing denotes the response to file changes rather than the changes themselves, but I’m seeing the term used mostly in the latter sense.
Continue reading

fixit_tiff, a TIFF repair tool

The Sächsische Landesbibliothek – Staats- und Universitätsbibliothek Dresden (Saxon State and University Library Dresden), which somehow gets abbreviated to SLUB, has developed a tool for working with TIFF files in digital preservation. fixit_tiff is a command line utility, written in C, which can do some repairs on defective TIFF files. The focus appears to be on correcting common errors, not on repairing corrupted files. A blog post from July (in German) indicates it can do configurable validation using a simple query language.

It’s available under the same license as Libtiff. Just what is that license? The only thing I can find is a very outdated “Use and Copyright” statement, which is on a page so old it warns about patents on LZW compression. It’s available for free, anyway.