PDF is a versatile format, but that doesn’t mean it should be used for everything. It’s a visual presentation format above all else. It lets you define a document with a specific appearance, with capabilities such as form filling and text searching. It’s not very good if you want a document that adapts to different device capabilities. If you need an editable format or a way to deliver structured data, there are much better alternatives.
When the Malaysian government released satellite data from the communications of Flight 370, which had disappeared, it delivered a PDF file. It looks very nice, but anyone who wants to extract and analyze the data has to do a lot of extra work. A spreadsheet or structured text (e.g., CSV) document is the right thing.
PDF can be used for e-books, but it’s not ideal. If you create normal sized pages, then on a phone they’ll either look tiny or require a lot of scrolling. Formats such as EPUB work better with a range of screen sizes.
Delivering text documents in PDF loses a lot of its value when the document is a scanned image rather than a text-based document. It can’t be searched, and people with visual disabilities can’t use text-to-speech. My condo association delivers its newsletters in scanned-image PDF. When I pointed out these problems at an owners’ meeting, I was told that the owners weren’t sophisticated enough to take advantage of those benefits. Our complex is a big one, and I’d be surprised if at least a few residents don’t use text-to-speech when they can. It’s not particularly hard to generate PDF files; scanning a finished document into a PDF seems like the hard way.
To maximize the usefulness of assistive technologies, you should create PDF/A if possible. It produces a slightly larger file, but it’s organized in a way that makes extraction of content easier and eliminates dependencies you might not have thought of.
Redacting PDFs is another tricky issue. If you simply black out an area, that’s the equivalent of gluing a piece of paper over it, and no harder to defeat. For advice on properly redacting documents, who better to turn to than the NSA? They may be a gang of criminals within the government, but they certainly know how to redact. It’s from 2006, though, so some of its advice could be dated.
There are lots of things you can do with PDF, but use it intelligently and where it’s appropriate.
This weekend I borrowed a book from the company library called 
OOXML: The good and the bad
An article by Markus Feilner presents a very critical view of Microsoft’s Open Office XML as it currently stands. There are three versions of OOXML — ECMA, Transitional, and Strict. All of them use the same extensions, and there’s no easy way for the casual user to tell which variant a document is. If a Word document is created on one computer in the Strict format, then edited on another machine with an older version of Word, it may be silently downgraded to Transitional, with resulting loss of metadata or other features.
On the positive side, Microsoft has released the Open XML SDK as open source on Github. This is at least a partial answer to Feilner’s complaint that “there are no free and open source solutions that fully support OOXML.”
Incidentally, I continue to hate Microsoft’s use of the deliberately confusing term “Open XML” for OOXML.
Thanks to @willpdp for tweeting the links referenced here.
1 Comment
Posted in commentary
Tagged Microsoft, standards, XML