The uses and abuses of PDF

PDF is a versatile format, but that doesn’t mean it should be used for everything. It’s a visual presentation format above all else. It lets you define a document with a specific appearance, with capabilities such as form filling and text searching. It’s not very good if you want a document that adapts to different device capabilities. If you need an editable format or a way to deliver structured data, there are much better alternatives.

When the Malaysian government released satellite data from the communications of Flight 370, which had disappeared, it delivered a PDF file. It looks very nice, but anyone who wants to extract and analyze the data has to do a lot of extra work. A spreadsheet or structured text (e.g., CSV) document is the right thing.

PDF can be used for e-books, but it’s not ideal. If you create normal sized pages, then on a phone they’ll either look tiny or require a lot of scrolling. Formats such as EPUB work better with a range of screen sizes.

Delivering text documents in PDF loses a lot of its value when the document is a scanned image rather than a text-based document. It can’t be searched, and people with visual disabilities can’t use text-to-speech. My condo association delivers its newsletters in scanned-image PDF. When I pointed out these problems at an owners’ meeting, I was told that the owners weren’t sophisticated enough to take advantage of those benefits. Our complex is a big one, and I’d be surprised if at least a few residents don’t use text-to-speech when they can. It’s not particularly hard to generate PDF files; scanning a finished document into a PDF seems like the hard way.

To maximize the usefulness of assistive technologies, you should create PDF/A if possible. It produces a slightly larger file, but it’s organized in a way that makes extraction of content easier and eliminates dependencies you might not have thought of.

Redacting PDFs is another tricky issue. If you simply black out an area, that’s the equivalent of gluing a piece of paper over it, and no harder to defeat. For advice on properly redacting documents, who better to turn to than the NSA? They may be a gang of criminals within the government, but they certainly know how to redact. It’s from 2006, though, so some of its advice could be dated.

There are lots of things you can do with PDF, but use it intelligently and where it’s appropriate.

2 responses to “The uses and abuses of PDF

  1. I would whole heartily agree with the idea that the PDF format should be used where appropriate. It seems from your examples you are speaking mostly of access to the information contained in the PDF for use today. What is your recommendation for long term preservation of this information? PDF/A seems to be a great low-risk approach to long term accessibility of the information, but much is lost in the conversion of Word documents, spreadsheets, eBooks, etc to PDF.

    • Excellent question. If you’re going to use PDF for long-term preservation, then obviously PDF/A is the way to go, but any format conversion runs a risk of data loss. The choice depends on what you’re trying to do, what resources you have available, and how suitable the original format is for preservation.

      Most archives allow only a limited set of formats, to maximize the chance of future readability. If one of these is available and is closer to the original format (for instance, if you have an Excel spreadsheet and the archive admits OpenDocument and PDF but not Excel), it makes sense to archive it is an OpenDocument spreadsheet, and perhaps also as a PDF.

      It’s a complicated question, and a book could be written on it. For instance, this one.