Monthly Archives: July 2016

Work on TI/A quietly continues

The work on the TI/A project, to define an archive-friendly version of TIFF analogous to PDF/A, is still going, even though hardly any of it is publicly visible. Marisa Pfister’s leaving the project, along with her position at the University of Basel, was unfortunate, but others are continuing a detailed analysis of TIFF files used at various archives. This will help them to learn what features and tags are used.

The target of March 1, 2016, for a submission to ISO has been crossed out, and nothing has replaced it, but we can still hope it will happen.

The persistence of old formats

Technologies develop to a point where they’re good enough for widespread use. Once a lot of people have adopted them, it’s hard to move on from there to a still better one, since people have invested so much in a technology that works for them. We see this with cell phone communication, which is pretty good but would undoubtedly be much better if it could be invented all over today. We see it with the DVD format, which Blu-Ray hasn’t managed to push aside in spite of huge marketing efforts. And we see it in file formats.

Most of today’s highly popular formats have been around since the nineties. For images, we still have TIFF, JPEG, PNG, and even the primitive GIF format, which goes back to the eighties. In audio, MP3 still dominates, even though there are now much better alternatives.

This is a good thing in many ways. If new, improved formats displaced old ones every five years, we’d be constantly investing in new software, and anyone who didn’t upgrade would be unable to read a lot of new files. Digital preservation would be a big headache, as archivists would need to migrate files repeatedly to avoid obsolescence.

It does mean, though, that we’re working with formats that have deficiencies which often have grown in importance. JPEG compression isn’t nearly as good as what modern techniques can manage. MP3 is encumbered with patents and offers sound quality that’s inferior to other lossy audio formats. HTML has improved through major revisions, but it’s still a mess to validate. For that matter, we have formats like “English,” which lacks any spec and is a pile of kludges that have accumulated over centuries. Try finding support for supposed improvements such as Esperanto anywhere.

It’s a situation we just have to live with. The good enough hangs on, and the better has a hard time getting acceptance. Considering how unstable the world of data would be if this weren’t the case, it’s a good thing on the whole.

The steep road to supporting the PDF format

A lot of applications claim they can display PDF files, but not all of them fully support the format. They won’t necessarily display all valid files correctly. The PDF Association has an article discussing this problem, with the main focus on the Microsoft Edge browser.

Edge offers only partial support for the JBIG2Decode and JPXDecode filters, which means some objects might not display. It doesn’t support certain types of shadings, so other objects could render incorrectly.

The strength of PDF is supposed to be that it will render the same way everywhere. You can blame Microsoft for not putting enough work into it, or Adobe for making the format too complex. I have enough experience with it to know it’s a seriously difficult format just to analyze, to say nothing of rendering. Is a format which presents such difficulties really the ideal for a universal document rendering format that people will count on far into the future?

Update: It gets worse. Take a look at this discussion of what’s in PDF.