With Ed Fay’s departure, the Open Preservation Foundation is seeking a new executive director. The location is “negotiable,” but I’m sure major centers of digital preservation activity in Europe will get top consideration.
Yesterday the Open Preservation Foundation held a webinar on JHOVE, presented by Carl Wilson. I was really impressed by the progress he’s made there, and any rumors of JHOVE’s death (including ones I may have contributed to) have been greatly exaggerated.
The big changes include reorganizing the code under Maven and making installation more straightforward. These are both badly needed changes. I never had the opportunity to do them at Harvard, and when I took the code over for a while after leaving there, I focused on fixing bugs rather than fixing the design.
In my comments during the webinar, I pointed out the importance of Stephen Abrams’ contribution, which a lot of people don’t remember. I didn’t create JHOVE; he did. The core application and design principles were already in place when I entered the project. OPF will, I’m sure, give him the credit he deserves.
The Open Preservation Foundation (formerly the Open Planets Foundation) is launching a new model for funding the development of preservation-related software. Quoting from the announcement:
‘Over the last year the OPF has established a solid foundation for ensuring the sustainability of digital preservation technology and knowledge,’ explains Dr. Ross King, Chair of the OPF Board. ‘Our new strategic plan was introduced in November 2014 along with community surveys to establish the current state of the art. We developed our annual plan in consultation with our members and added JHOVE to our growing software portfolio. The new membership and software supporter models are the next steps towards realising our vision and mission.’ …
The software supporter model allows organisations to support individual digital preservation software products and ensure their ongoing sustainability and maintenance. We are launching support for JHOVE based on its broad adoption and need for active stewardship. It is also a component in several leading commercial digital preservation solutions. While it remains fully open source, supporters can steer our stewardship and maintenance activities and receive varying levels of technical support and training.
I have a selfish personal interest in spreading the word. At the moment, I’m between contracts, and I wouldn’t mind getting some funding from OPF to resume development work on JHOVE. I know its code base better than anyone else, I worked on it without pay as a hobby for a year or so after leaving Harvard, and I’d enjoy working on it some more if I could just get some compensation. This is possible, but only if there’s support from outside.
US libraries have been rather insular in their approach to software development. They’ll use free software if it’s available, but they aren’t inclined to help fund it. If they could each set aside some money for this purpose, it would help assure the continued creation and maintenance of the open source software which is important to their mission.
How about it, Harvard?
The VeraPDF Consortium has announced that it has begun the prototyping phase for a new open-source validator of PDF/A. This is a piece of the PREFORMA (PREservation FORMAts) project; other branches will cover TIFF and audio-visual formats. Participants in VeraPDF are the Open Preservation Foundation, the PDF Association, the Digital Preservation Coalition, Dual Lab, and Keep Solutions.
Documents are available, including a functional and technical specification. It aims at being the “definitive” tool for determining if a PDF document conforms to the ISO 19005 requirements. It will separate the PDF parser from the higher-level validation, so a different parser can be plugged in.
Validating PDF is tough In JHOVE, I designed PDF/A validation as an afterthought to the PDF module. PDF/A requirements affect every level of the implementation, so that approach led to problems that never entirely went away. Making PDF/A validation a primary goal should help greatly, but having it sit on top of and independent from the PDF parser may introduce another form of the same problem.
PDF files can include components which are outside the spec, and PDF/A-3 permits their inclusion. This means that really validating PDF/A-3 is an open-ended task. Even in the earlier version of PDF/A, not everything that can be put into a file is covered by the PDF specification per se. The specification addresses this by providing for extensibility; add-ons can address these aspects as desired. In particular, the core validator won’t attempt thorough validation of fonts.
A Metadata Fixer will not just check documents for conformance, but in some cases will perform the necessary fixes to make a file PDF/A compliant.
JHOVE ignores the content streams, focusing only on the structure, so it could report a thoroughly broken file as well-formed and valid. JHOVE2 doesn’t list PDF in its modules. Analyzing the content stream data is a big task. In general, the project looks hugely ambitious, and not every ambitious digital preservation project has reached a successful end. If this one does, it will be a wonderful accomplishment.
There’s a brief piece by Becky McGuinness in D-Lib Magazine on the handover of JHOVE to the Open Preservation Foundation. It describes upcoming plans:
During March the OPF will be working with Portico and other members to complete the transfer of JHOVE to its new home. The latest code base will move to the OPF GitHub organisation page. All documentation, source code files, and full change history will be publicly available, alongside other OPF supported software projects, including JHOVE2, Fido, jpylyzer, and the SCAPE project tools.
Once the initial transfer is complete the next step will be to set up a continuous integration (CI) build on Travis, an online CI service that’s integrated with GitHub. This will ensure that all new code submissions are built and tested publicly and automatically, including all external pull requests. This will establish a firm foundation for future changes based on agile software development best practises.
With this foundation in place OPF will test and incorporate JHOVE fixes from the community into the new project. Several OPF members have already developed fixes based on their own automated processes, which they will be releasing to the community. Working as a group these fixes will be examined and tested methodically. At the same time the OPF’s priority will be to produce a Debian package that can be downloaded and installed from its apt repository.
Following the transfer OPF will gather requirements from its members and the wider digital preservation community. The OPF aims to establish and oversee a self-sustaining community around JHOVE that will take these requirements forward, carrying out roadmapping exercises for future development and maintenance. The OPF will also assess the need for specific training and support material for JHOVE such as documentation and online or virtual machine demonstrators.
It’s great to know that JHOVE still has a future a decade after its birth, but what boggles my mind is the next sentence:
The transfer of JHOVE is supported by its creators and developers: Harvard Library, Portico, the California Digital Library, and Gary McGath.
I never expected to see my name in a list like that!
Over a decade ago, the Harvard University Libraries took me on as a contractor to start work on JHOVE. Later I became an employee, and JHOVE formed an important part of my work. When I left Harvard, I asked for continued “custody” of JHOVE so I could keep maintaining it, and got it. Over time it became less of a priority for me; there’s only so much time you can devote to something when no one’s paying you to do it.
After a long period of discussion, the Open Preservation Foundation (formerly the Open Planets Foundation) has taken up support of JHOVE. In addition to picking up the open source software, it’s resolved copyright issues in the documentation with Harvard, really over boilerplate that no one intended to enforced, but still an issue that had to be cleared.
Stephen Abrams, who was the real father of JHOVE, said, “We’re very pleased to see this transfer of stewardship responsibility for JHOVE to the OPF. It will ensure the continuity of maintenance, enhancement, and availability between the original JHOVE system and its successor JHOVE2, both key infrastructural components in wide use throughout the digital library community.”
JHOVE2 was originally supposed to be the successor to JHOVE, but it didn’t get enough funding to cover all the formats that JHOVE covers, so both are used, and the confusion of names is unfortunate. OPF has both in its portfolio. It doesn’t appear to have forked JHOVE to its Github repository yet, but I’m sure that’s coming soon.
My own Github repository for JHOVE should now be considered archival. Go forth and prosper, JHOVE.
The Open Planets Foundation is now the Open Preservation Foundation. This name change reflects its function; the old name grew out of the Planets project and never really made sense.
For the present, it’s still found on the Internet as openplanetsfoundation.org.