An Open Preservation Foundation webinar, “Putting JHOVE to the acid test: A PDF test-set for well-formedness validation in JHOVE,” will be held on November 21, 10 AM GMT (that’s 11 AM in Central Europe and a ludicrous 5 AM or earlier in the US).
The email which I received says:
In digital preservation we rely on automation and tools for some of our most crucial tasks like format identification and validation. One of the most widespread tools for format validation is JHOVE. As there is no other validation tool which checks the well-formedness and validity of plain PDF files, the quality and infallibility [seriously?] of JHOVE’s PDF module is especially important. Unfortunately, as there are no other tools, checking JHOVE’s PDF skills via tool-benchmarking is not an option.
As of today, there is not a ground-truth data set which can be used to understand and test PDF validation at the structural level. In this webinar, we present a corpus of light-weight files designed to test the validation criteria of JHOVE’s PDF module against well-formedness. Based on the findings of checking this data set with JHOVE, we give an overview of how reliable JHOVE is, what works well and where still are inconsistencies.
… Places will be allocated on a first come, first served basis. The recording and slides will be available to OPF members afterwards.