The PDF/A controversy

Is PDF/A a good archival format? Many institutions use it, but it has problems which are inherent in PDF. With PDF/A-3, it has lost some of its focus. A format which can be a container for any kind of content isn’t great for digital preservation.

An article by Marco Klindt of the Zuse Institute Berlin takes a strong position against its suitability, with the title “PDF/A considered harmful for digital preservation.” Carl Wilson at the Open Preservation Foundation has added his own thoughts with “PDF/A and Long Term Preservation.”

Most of the problems Klindt cites come from the fact that PDF is primarily a way to give a document a consistent visual appearance, not to give it a structure. Many PDF files have an internal organization that’s very different from the obvious reading order. Tagged PDF (which PDF/A subsumes) tries to add structure, but there’s no guarantee of how well it will work in any case. If tagging is added to an existing file, it isn’t always useful. Klindt says, “Converting “normal” PDFs to PDF/A a-level conformance automatically is not advisable as a lot of information may already be lost during the creation process of the document.”

Converting a file to PDF may lose information. Klindt gives the example of converting a spreadsheet to PDF, where the spreadsheet’s internal representation of numbers may have more precision than what’s displayed. It’s a tradeoff; the spreadsheet format might have greater preservation risks. However, he concedes that “there is no viable alternative to PDF as a universal digital container of everything that can be flattened to printed pages.”

Klindt’s title implies strong opposition, echoing Dijkstra’s classic Goto Considered Harmful and many papers with similar titles. It seems to call for switching to alternatives to PDF/A as soon as possible. He suggests several alternatives but doesn’t make a strong case for any of them as an adequate replacement. Part of the problem is that if any proposed replacement were as universal as PDF, it might well have the same problems as PDF.

“Harmful” seems like an excessively strong word to me, but archivists should be aware of the issues in PDF/A. Certainly they should stick with PDF/A-2 rather than the anything-goes PDF/A-3.

