Tag Archives: PDF

The steep road to supporting the PDF format

A lot of applications claim they can display PDF files, but not all of them fully support the format. They won’t necessarily display all valid files correctly. The PDF Association has an article discussing this problem, with the main focus on the Microsoft Edge browser.

Edge offers only partial support for the JBIG2Decode and JPXDecode filters, which means some objects might not display. It doesn’t support certain types of shadings, so other objects could render incorrectly.

The strength of PDF is supposed to be that it will render the same way everywhere. You can blame Microsoft for not putting enough work into it, or Adobe for making the format too complex. I have enough experience with it to know it’s a seriously difficult format just to analyze, to say nothing of rendering. Is a format which presents such difficulties really the ideal for a universal document rendering format that people will count on far into the future?

Update: It gets worse. Take a look at this discussion of what’s in PDF.

The state of PDF 2.0

The next big jump in PDF may finally happen this year. The PDF association tells us that the spec for PDF 2.0 is “feature-complete” and will be available to the ISO PDF committee and members of the PDF Association in July. When this will turn into a public release still isn’t clear. A year ago the target was “mid-2016”; that seems unlikely now.

The specification will be ISO 32000-2. The current version of PDF, 1.7, is ISO 32000-1. More precisely, Adobe has published several extension levels to PDF 1.7. They’re a way of getting around having a version 1.8, which would be an admission that the ISO standard is outdated. Version 2.0 will get Adobe and ISO back in sync. Hopefully Adobe will publish the PDF spec for free, as it has in the past, so that it won’t be available just to people who pay for the ISO version. Currently an electronic copy of ISO 32000-1 costs 198 Swiss francs, or a bit more than $200.
Continue reading

PDF/A and forms

The PDF Association reminds us that we can use PDF forms for electronic submissions. It’s a useful feature, and I’ve filled out PDF forms now and then. However, one point seems wrong to me:

PDF/A, the archival subset of PDF technology, provides a means of ensuring the quality and usability of conforming PDF pages (including PDF forms) without any external dependencies. PDF/A offers implementers the confidence of knowing that conforming documents and forms will be readable 10, 20 or 200 years from now.

The problem is that PDF/A doesn’t allow form actions. ISO 19005-1 says, “Interactive form fields shall not perform actions of any type.” You can have a form and you can print it, but without being able to perform the submit-form action, it isn’t useful for digital submissions.

You could have an archival version of the form and a way to convert it to an interactive version, but this seems clumsy. Please let me know if I’ve missed something.

Update: There’s some kind of irony in the fact that the same day that I posted this, I received a print-only PDF form which I’ll now have to take to Staples to fax to the originator.

3D PDF and PDF/E

It must be a surprise to most people, but you can represent three-dimensional objects in PDF, in spite of its strictly 2-dimensional imaging model. It turns out there are two ways to do it, with the older U3D and the more modern PRC. What makes them possible is PDF’s annotation feature, which allows capabilities to be added to PDF, and the Acrobat 3D API. Full support of these features requires implementation of at least PDF 1.7 Extension Level 1, or to put it in application terms, Acrobat 8.1.

The PDF/E standard for engineering documents, aka ISO 24517, includes U3D but not PRC. A PDF/E-2 standard is currently in development and is expected to include PRC. PDF/E, like the other slashes of PDF, is a subset of the PDF standard (version 1.6), so obviously it’s possible to do 3D work without reference to it. It’s intended for cases where long-term retention or archiving is important. This suggests some affinity with PDF/A, which is specifically aimed at archive-quality documents, and the PDF Association, which is heavily involved in PDF/A, has recently started a PDF/E Competence Center. Oddly, the competence center says that PDF/E-1 “does not address 3D,” though other sources say PDF/E does reference U3D. Perhaps this is a matter of what really constitutes “addressing” 3D as opposed to just acknowledging it.

The PDF search problem

An article from the PDF Association points out the pitfalls in searching PDF documents. Even if a document has actual text in it, rather than being a scanned image, it might not hold the text in the natural character ordering. PDF is a format for rendering a document’s visible appearance, and it isn’t so good at holding semantic content. Chunks of text can be stored out of sequence as long as they render in the right place.
Continue reading

A link roundup on file formats

3D printing is an exciting new technology, but the formats to choose from are an alphabet soup.

A call for “PDF 2.0” or an “Analytical File Format.” The description is vague, but it sounds like something analogous to the Semantic Web for documents.

BW64, a new RIFF-based audio format. The article describes it as a “3D” format, but more significantly it’s a metadata-rich interchange format that supports really big files.

And just for bitter laughs: I need a ‘file’ format.”

PDF/R

The PDF Association and TWAIN Working Group have announced a partnership to develop a specification called PDF/Raster or PDF/R. It’s described as “a component of TWG’s TWAIN Direct™ initiative, a language/protocol that eliminates the need for users to install vendor specific drivers as communication between scanning devices and image capture software applications.”
Continue reading

McCoy on the future of PDF

Bill McCoy’s article, “Takeaways on the Future of Documents: Report from the 2015 PDF Technical Conference,” offers some interesting thoughts on the future of PDF. I can’t find much to disagree with. PDF is in practice a format for reproducing a specific document appearance, and that’s becoming less important as the variety of computing devices increases. He makes a point I hadn’t thought of, that the “de facto interoperable PDF format” is well behind the latest specifications, which may explain why I haven’t seen complaints that JHOVE doesn’t know about ISO 32000 PDF!
Continue reading

PDF forever?

Distant galaxiesThe PDF Association has an article on its site titled “What’s unique about PDF? and why PDF will live forever.” The article claims PDF is “a format of such flexibility and power that it will define the essential ‘electronic document’ concept forever.”

Forever is a long time. No one will think they mean that the last object left as the universe succumbs to entropy will be a disk with a PDF file, but what scale of “forever” gives sense to their claim? In a tweet responding to my skepticism, they offered a clarification:

Continue reading

PDF 2.0

As most people who read this blog know, the development of PDF didn’t end with the ISO 32000 (aka PDF 1.7) specification. Adobe has published three extensions to the specification. These aren’t called PDF 1.8, but they amount to a post-ISO version.

The ISO TC 171/SC 2 technical committee is working on what will be called PDF 2.0. The jump in major revision number reflects the change in how releases are being managed but doesn’t seem to portend huge changes in the format. PDF is no longer just an Adobe product, though the company is still heavily involved in the spec’s continued development. According to the PDF Association, the biggest task right now is removing ambiguities. The specification’s language will shift from describing conforming readers and writers to describing a valid file. This certainly sounds like an improvement. The article mentions that several sections have been completely rewritten and reorganized. What’s interesting is that their chapter numbers have all been incremented by 4 over the PDF 1.7 specification. We can wonder what the four new chapters are.

Leonard Rosenthol gave a presentation on PDF 2.0 in 2013.

As with many complicated projects, PDF 2.0 has fallen behind its original schedule, which expected publication in 2013. The current target for publication is the middle of 2016.