PDF 2.0

The ISO specification for PDF 2.0 is now out. It’s known as ISO 32000-2. As usual for ISO, it costs an insane 198 Swiss francs, which is roughly the same amount in dollars. In the past, Adobe has made PDF specifications available for free on its own site, but I can’t find it on adobe.com. Its PDF reference page still covers only PDF 1.7.

ISO has to pay its bills somehow, but it’s not good if the standard is priced so high that only specialists can afford it. I don’t intend to spend $200 to be able to update JHOVE without pay. With some digging, I’ve found it in an incomplete, eyes-only format. All I can view is the table of contents. There are links to all sections, but they don’t work. I’m not sure whether it’s broken on my browser or by intention. In any case, it’s a big step backward as an open standard. I hope Adobe will eventually put the spec on its website.
The strange state of “open” format documentation

You can legally download many specs from the ISO site, including the Open Document Format (ODF) specs. ISO lets you print out a copy. However, if you photocopy or scan it, or if you make it available on your organization’s LAN, the Copyright Police will haul you away.

I’ve seen similar restrictions elsewhere. They’re variations on the idea that you can download a document for free, but you can’t share it after you download it. It’s bizarre.

Maybe they’re trying to keep people from going into competition by selling copies of their standards. Since ISO also sells what it publishes, the goal would make sense. In fact, there’s a specific and emphatic prohibition on sales. But why they should care whether copies are printed or photocopied is beyond me.

Usually the answer to questions like these is “lawyers who are disconnected from reality.” If there’s a better answer, I’d love to hear it.

Work on TI/A quietly continues

The work on the TI/A project, to define an archive-friendly version of TIFF analogous to PDF/A, is still going, even though hardly any of it is publicly visible. Marisa Pfister’s leaving the project, along with her position at the University of Basel, was unfortunate, but others are continuing a detailed analysis of TIFF files used at various archives. This will help them to learn what features and tags are used.

The target of March 1, 2016, for a submission to ISO has been crossed out, and nothing has replaced it, but we can still hope it will happen.

PDF 2.0

As most people who read this blog know, the development of PDF didn’t end with the ISO 32000 (aka PDF 1.7) specification. Adobe has published three extensions to the specification. These aren’t called PDF 1.8, but they amount to a post-ISO version.

The ISO TC 171/SC 2 technical committee is working on what will be called PDF 2.0. The jump in major revision number reflects the change in how releases are being managed but doesn’t seem to portend huge changes in the format. PDF is no longer just an Adobe product, though the company is still heavily involved in the spec’s continued development. According to the PDF Association, the biggest task right now is removing ambiguities. The specification’s language will shift from describing conforming readers and writers to describing a valid file. This certainly sounds like an improvement. The article mentions that several sections have been completely rewritten and reorganized. What’s interesting is that their chapter numbers have all been incremented by 4 over the PDF 1.7 specification. We can wonder what the four new chapters are.

Leonard Rosenthol gave a presentation on PDF 2.0 in 2013.

As with many complicated projects, PDF 2.0 has fallen behind its original schedule, which expected publication in 2013. The current target for publication is the middle of 2016.


The latest version of PDF/A, a subset of PDF suitable for long-term archiving, is now available as ISO standard 19005-3:2012. According to the PDF/A Association Newsletter, “there is only one new feature with PDF/A-3, namely that any source format can be embedded in a PDF/A file.”

This strikes me as a really bad idea. The whole point of PDF/A is to restrict content to a known, self-contained set of options. The new version provides a back door that allows literally anything. The intent, according to the article, is to let archivists save documents in their original format as well as their PDF representation. Certainly saving the originals is a good archiving practice, but it should be done in an archival package, not in a PDF format designed for archiving.

Mission creep afflicts projects of all kinds, and this is a case in point.

PDF 1.7 and beyond

A paradox from Euan Cochrane: PDF 1.7 may follow the ISO standard, but not all PDF 1.7 files follow the same standard.

PDF/A-2 ratified

This time it’s from the PDF/A Competence Center, so I’m pretty sure it’s real: On November 30, the committee for ISO 19005 met in Ottawa and ratified Part 2 of IDO 19005, aka PDF/A. PDF/A is a restricted profile for PDF which is designed to guarantee long-term usability of conforming files.

The previous version, PDF/A-1, was based on PDF 1.4. This is based on ISO 32000-1, which is equivalent to PDF 1.7. Valid PDF/A-1 files are also valid under PDF/A-2.

ISO 19005:2005, or PDF/A-1, is available for purchase from ISO, but as of this writing the new one, which presumably will be ISO 19005:2010, isn’t being offered online yet.

I can’t make any promises about when JHOVE will support PDF/A-2, if ever. Any work I do on it is on my own time. Of course, if someone else wants to run with it, the source is there and I can answer questions.

ZIP standardization

The ZIP format is widely used, both by itself and as part of other widely used formats such as ODF, yet it’s never been standardized. Caroline Arms of the Library of Congress has informed the JHOVE2 list that there’s a new study group under ISO/IEC JTC1 SC34 WG1, which is looking into the standardization of ZIP. There is a Wiki for this study as well as a mailing list archive.

Membership in the group requires going through the appropriate national standards group.

Catching up

Here are a few of the news items I mentioned recently on the old blog, for your convenience:

  • A workshop on JHOVE2 will be held after the conclusion of iPres 2009 in San Francisco, on October 7, 2009. This will include, for the first time, a presentation of the prototype code.
  • JPEG XR, formerly known as Microsoft HD Photo, is now an international standard, as reported in a JPEG press release.
  • JHOVE 1.4 is now available on SourceForge. The main change is that PDF/A compliance is more accurately identified than before, and is based on the final standard rather than a draft.