An Open Preservation Foundation webinar, “Putting JHOVE to the acid test: A PDF test-set for well-formedness validation in JHOVE,” will be held on November 21, 10 AM GMT (that’s 11 AM in Central Europe and a ludicrous 5 AM or earlier in the US).
Tag Archives: Open Preservation Foundation
I’ve just learned that the Open Preservation Foundation is hosting a JHOVE Online Hack Day on October 11. I’m flattered people are still interested in the work I started doing over a decade ago, though getting some paying work would be far more satisfying.
The Open Preservation Foundation has just announced JHOVE 1.14. The numbering is a bit odd. Version 1.12 never made it to release, and they seem to have skipped 1.13 entirely.
This includes three new modules: the PNG module, which I wrote on a weekend whim, and GZIP and WARC modules adapted from JHOVE2. The UTF-8 module now supports Unicode 7.0.
The release isn’t showing up yet on the OPF website, but I expect that will happen momentarily.
It’s nice to see that the code which I started working on over a decade ago is still alive and useful. Congratulations and thanks to Carl Wilson, who’s now its principal maintainer!
I’ve received an email reply from Becky McGuiness at Open Preservation Foundation to my query about JHOVE’s status. She says that VeraPDF has been taking all the development resources, as I suspected, but that work on JHOVE (in particular, fixing the expired installer) will resume soon.
Update: Here’s a response from Carl Wilson at OPF on the status of JHOVE. It says that the next version will jump from 1.12 to 1.14 (triskaidekaphobia?) and will include several new modules, including my PNG module.
I’ll second Carl’s call for institutions to become OPF supporters. As someone on Twitter said recently, open source software is “free, as in kittens.” It costs money to maintain it. Occasionally people support free software for the sheer love of it, but developers do need to earn a living.
Update 2: OPF reports that JHOVE installer has been fixed.
See this post for important updates.
In December, JHOVE 12.0 was very close to a release. Since then, next to nothing has happened. The installer for the beta version expired, and there’s been an update for that. A couple of pull requests have been merged. Otherwise — nothing.
I think what’s happened is that the Open Preservation Foundation’s very limited resources were pulled onto VeraPDF. That’s certainly a worthwhile endeavor, but it irks me that I handed support of JHOVE over to OPF only to see the ball dropped. I did some work on a PNG module a month ago and submitted a pull request; nothing’s happened since then.
I wouldn’t mind picking JHOVE up agin, but I’m going to be blunt about this: I’m done with working on it for free. If institutions that want JHOVE to be maintained really care about it, they should put up some money, whether it’s to OPF, to me, or to someone else. Open source software isn’t something that magically happens because people love to work without pay.
JHOVE 1.12 will be the first release of JHOVE that I had no significant role in, but I’m still glad to see that the beta release is now available. I’ve downloaded it, run the installer (yes, there’s now an installer!), and then launched JHOVE without having to edit any configuration files by hand! That’s a huge advance by itself. Nice work by Carl Wilson and everyone else at the Open Preservation Foundation. It’s now built with Maven, and I’m sure that the building process is much better than the clunky old one.
With Ed Fay’s departure, the Open Preservation Foundation is seeking a new executive director. The location is “negotiable,” but I’m sure major centers of digital preservation activity in Europe will get top consideration.
Yesterday the Open Preservation Foundation held a webinar on JHOVE, presented by Carl Wilson. I was really impressed by the progress he’s made there, and any rumors of JHOVE’s death (including ones I may have contributed to) have been greatly exaggerated.
The big changes include reorganizing the code under Maven and making installation more straightforward. These are both badly needed changes. I never had the opportunity to do them at Harvard, and when I took the code over for a while after leaving there, I focused on fixing bugs rather than fixing the design.
In my comments during the webinar, I pointed out the importance of Stephen Abrams’ contribution, which a lot of people don’t remember. I didn’t create JHOVE; he did. The core application and design principles were already in place when I entered the project. OPF will, I’m sure, give him the credit he deserves.
The Open Preservation Foundation (formerly the Open Planets Foundation) is launching a new model for funding the development of preservation-related software. Quoting from the announcement:
‘Over the last year the OPF has established a solid foundation for ensuring the sustainability of digital preservation technology and knowledge,’ explains Dr. Ross King, Chair of the OPF Board. ‘Our new strategic plan was introduced in November 2014 along with community surveys to establish the current state of the art. We developed our annual plan in consultation with our members and added JHOVE to our growing software portfolio. The new membership and software supporter models are the next steps towards realising our vision and mission.’ …
The software supporter model allows organisations to support individual digital preservation software products and ensure their ongoing sustainability and maintenance. We are launching support for JHOVE based on its broad adoption and need for active stewardship. It is also a component in several leading commercial digital preservation solutions. While it remains fully open source, supporters can steer our stewardship and maintenance activities and receive varying levels of technical support and training.
I have a selfish personal interest in spreading the word. At the moment, I’m between contracts, and I wouldn’t mind getting some funding from OPF to resume development work on JHOVE. I know its code base better than anyone else, I worked on it without pay as a hobby for a year or so after leaving Harvard, and I’d enjoy working on it some more if I could just get some compensation. This is possible, but only if there’s support from outside.
US libraries have been rather insular in their approach to software development. They’ll use free software if it’s available, but they aren’t inclined to help fund it. If they could each set aside some money for this purpose, it would help assure the continued creation and maintenance of the open source software which is important to their mission.
How about it, Harvard?
The VeraPDF Consortium has announced that it has begun the prototyping phase for a new open-source validator of PDF/A. This is a piece of the PREFORMA (PREservation FORMAts) project; other branches will cover TIFF and audio-visual formats. Participants in VeraPDF are the Open Preservation Foundation, the PDF Association, the Digital Preservation Coalition, Dual Lab, and Keep Solutions.
Documents are available, including a functional and technical specification. It aims at being the “definitive” tool for determining if a PDF document conforms to the ISO 19005 requirements. It will separate the PDF parser from the higher-level validation, so a different parser can be plugged in.
Validating PDF is tough In JHOVE, I designed PDF/A validation as an afterthought to the PDF module. PDF/A requirements affect every level of the implementation, so that approach led to problems that never entirely went away. Making PDF/A validation a primary goal should help greatly, but having it sit on top of and independent from the PDF parser may introduce another form of the same problem.
PDF files can include components which are outside the spec, and PDF/A-3 permits their inclusion. This means that really validating PDF/A-3 is an open-ended task. Even in the earlier version of PDF/A, not everything that can be put into a file is covered by the PDF specification per se. The specification addresses this by providing for extensibility; add-ons can address these aspects as desired. In particular, the core validator won’t attempt thorough validation of fonts.
A Metadata Fixer will not just check documents for conformance, but in some cases will perform the necessary fixes to make a file PDF/A compliant.
JHOVE ignores the content streams, focusing only on the structure, so it could report a thoroughly broken file as well-formed and valid. JHOVE2 doesn’t list PDF in its modules. Analyzing the content stream data is a big task. In general, the project looks hugely ambitious, and not every ambitious digital preservation project has reached a successful end. If this one does, it will be a wonderful accomplishment.