Expanding JHOVE

There are some significant improvements I’d like to make to JHOVE, to bring it up to date and improve its availability. The most important of these is to bring the PDF module up to version 1.7 (ISO 32000). I’ve done two releases since leaving Harvard, and download figures and feedback show there’s still significant interest. I’ve done that much to enhance my reputation, but I need to earn a living, and the PDF upgrade would be two or three weeks of solid work, so it has to be contingent on my getting compensated.

Features which look most important for JHOVE’s usual purposes include enhancements to Tagged PDF, Unicode file name references, new markup features, and dictionaries which support 3D artwork. I’m guessing there’s also interest in supporting PDF/A-2 and 3.

There’s probably no one institution right now willing to pay for the effort, but if it were possible to get a few hundred dollars from each of several institutions, it could work. One thought, of course, is Kickstarter, but I don’t know if institutional money can be funneled that way. Maybe it can and I just don’t know it. Alternatively, I can write application letters to the appropriate places, saying that I’ll do it if the amount pledged exceeds a certain threshold. No doubt it would take months for this to happen, but it seems possible in principle.

The idea could even be generalized to a library consortium for funding useful open source projects in return for support. Yes, I’m obviously thinking of how I can make money and I’m not apologizing for it. But the idea really could be useful. The SQLite consortium is a similar approach, focused on a single product.

Does anyone know of similar funding models that have worked, or alternative approaches that would achieve the result? Does the idea make sense or am I just blowing hot air?

3 responses to “Expanding JHOVE

  1. The National Digital Stewardship Alliance has a Kickstarter group that might be of interest.

  2. Hi Gary,

    You mentioned suppport for PDF/A. Some weeks ago I found out that the (open source) PDFBox library now includes a preflight component that checks conformance with PDF/A-1. See the link below:


    I already ran it on the ISARTOR test suite, and it appeared to pick up on all the deviations from PDF/A-1 in that data set. I’m planning to do some more elaborate testing as part of the SCAPE work (late November/early December, if all goes well).

    Perhaps it might be possible to integrate this into JHOVE, which would result in a two-pass validation in case of PDF/A: first use your native JHOVE code to validate conformance to the PDF (1.0 … 1.7) specs; then use the Preflight code to check for any features that aren’t permitted in PDF/A.

    Recalling that you mentioned the difficulties of integrating the PDF/A profile test into the ‘regular’ PDF conformity check, this just might give you the best of both worlds (since PDFBox doesn’t do the ‘regular’ PDF validation that JHOVE does).

    From what I understand the PDFBox preflight code only supports PDF/A-1 at the moment, but as the differences between A-1/A-2/A-3 aren’t huge (differences between A-2 and A-3 are actually tiny) extending that code might not be that difficult after all.

    I don’t know how solid the Preflight code is, and to what extent it is compatible with JHOVE (I’ve only used the CLI), but it might be worth a look.