PDF 2.0

The ISO specification for PDF 2.0 is now out. It’s known as ISO 32000-2. As usual for ISO, it costs an insane 198 Swiss francs, which is roughly the same amount in dollars. In the past, Adobe has made PDF specifications available for free on its own site, but I can’t find it on adobe.com. Its PDF reference page still covers only PDF 1.7.

ISO has to pay its bills somehow, but it’s not good if the standard is priced so high that only specialists can afford it. I don’t intend to spend $200 to be able to update JHOVE without pay. With some digging, I’ve found it in an incomplete, eyes-only format. All I can view is the table of contents. There are links to all sections, but they don’t work. I’m not sure whether it’s broken on my browser or by intention. In any case, it’s a big step backward as an open standard. I hope Adobe will eventually put the spec on its website.

Adobe’s blog describes some of the features of PDF 2.0. They include more sensible and stronger encryption, enhanced tagging, and improved color management.

The PDF 1.7 spec wasn’t a very close match to the standard as Adobe practiced it. If you’ve worked with JHOVE, you know its PDF module has been riddled with bugs. I don’t want to shake off blame for software bugs, but in many cases it just wasn’t clear what data types are permitted. The philosophy of JHOVE has been to follow the written spec strictly, and it doesn’t behave well when files use the “wrong” data types. It may report that the file isn’t well-formed, or it may throw an exception. Reports indicate that PDF 2.0’s specification cleans many of these issues up. This directly matters only to a handful of developers, but it makes it easier to create PDF-compliant software when the spec matches the files which Adobe considers valid.

The PDF Association reports:

PDF is complex, and extremely flexible at a very low level. Most of the work that’s gone into the document is to clarify and correct the existing text. PDF 2.0 resolves many longstanding ambiguities, updates to external references and generally provides a tighter set of rules to enhance and ease interoperability.

As a result, all those who read it and express their view say the same thing: the text of PDF 2.0 is significantly clearer and more consistent in terms of describing the various features, requirements and considerations in PDF technology. Once PDF 2.0 is published it’s reasonable to predict an immediate benefit in terms of training developers to write PDF software, add support for PDF features, and more.

A better spec — if you can afford it.

The PDF Association has made example files available. Version 2.0 is mostly, but not entirely, compatible with 1.x. There’s discussion on Stack Overflow. UTF-8 text will be a serious compatibility issue.

It’s likely that many PDF readers will see the new major revision in the version number and assume they can’t handle it. This includes JHOVE. There will be a lot of software to update.

One response to “PDF 2.0

  1. Paywalling a standards document would seem to defeat the purpose of encouraging people to use consistent standards, which is the supposed purpose of ISO. Organizations like them belong in the dustbin of history along with academic publishers like Elsevier.