Bit-rot tolerance doesn’t work

My brief post yesterday on the TI/A initiative provoked a lively discussion on Twitter, mostly on whether archival formats should allow compression. The argument against compression rests on the argument that archives should be able to deal with files that have a few bit errors in them. This is a badly mistaken idea.
Continue reading

The TI/A initiative

A project to define an archive-safe subset of TIFF has been going on for a long time. Originally it was called the TIFF/A initiative, but Adobe wouldn’t allow the use of the TIFF trademark, so it’s now called the TI/A initiative.

So far it’s been very closed in what it presents to the public. It’s easy enough to sign up and view the discussions; I’ve done that, and I have professional credentials but no inside connections. However, it bothers me that it’s gone so long presenting nothing more to the public than just a white paper and no progress reports.

I’m not going to make anything public which they don’t want to, but I’ll just say that I have some serious disagreements with the approach they’re taking. When they finally do go public, I’m afraid they won’t get much traction with the archival community. Some transparency would have helped to determine whether I’m wrong or they’re wrong.

JHOVE Online Hack Day

I’ve just learned that the Open Preservation Foundation is hosting a JHOVE Online Hack Day on October 11. I’m flattered people are still interested in the work I started doing over a decade ago, though getting some paying work would be far more satisfying.
Continue reading

The little-known potential of SVG

Today on Twitter I came upon an article, “SVG Has More Potential,” by Mike Riethmuller. He points out that SVG is more than just “scalable vector graphics,” and he demonstrates that its images can be responsive.
Continue reading

Figuring out the PDF version is harder than you think

In a GitHub comment, Johan van der Knijff noted how messy it is to determine the version of a PDF file. He looked at a file with the header characters “%PDF-1.8”. DROID says this isn’t a PDF file at all.

By a strict reading of the PDF specification, it isn’t. The version number has to be in the range 1.0 through 1.7. Being this strict seems like a bad idea, since it would mean format recognition software will fail to recognize any future versions of the format. (JHOVE doesn’t care what character comes after the period.)
Continue reading

Klingon vs. Emoji in Unicode

In 2001, the Unicode Consortium rejected a proposal to include the Klingon encoding. The reasons it gave were:

Lack of evidence of usage in published literature, lack of organized community interest in its standardization, no resolution of potential trademark and copyright issues, question about its status as a cipher rather than a script, and so on.

Fair enough, but don’t most of these objections apply equally to emoji?
Continue reading

A Libtiff mirror

Libtiff is still offline at, but there’s a mirror of the source available on GitHub. I held off on mentioning it in this blog till Bob Friesenhahn confirmed it’s reliable.

Libtiff goes offline

The Libtiff library, which has been a reference implementation of TIFF for many years, has disappeared from the Internet. It was located at, a domain whose owner apparently was willing to host it without having any close connection to the project. The domain fell into someone else’s hands, and the content changed completely, breaking all links to Libtiff material. Malice doesn’t seem to be involved; the original owner of just walked away from the domain or forgot to renew it. Who owns it now is unknown, since it’s registered under a privacy shield.

Originally Libtiff was hosted on, but that fell into the hands of a domain owner with no interest in the project. I don’t know why. It still holds Libtiff code, but it’s many years out of date.

As I’m writing this, people on the Libtiff list are trying to figure out exactly what happened. There’s talk of trying to get back, though that may or may not be possible.

For the moment, there’s no primary source for Libtiff on the Web. I’ll hopefully be able to post more information later.

How big is BigTIFF?

TIFF is a very popular image format, but it can’t handle really huge files. “Really huge” means files bigger than 4 gigabytes, or more precisely, files in which any data offset can’t be represented in 32 bits. That’s not a limitation that comes up often, but some applications, such as medical scans, need enough detail to push the limit.

A dozen years ago, members of the TIFF community at AWare Systems came up with a simple idea: Create a variant of TIFF with 64-bit offsets instead of 32 bits. The result was BigTIFF.
Continue reading

The strange state of “open” format documentation

You can legally download many specs from the ISO site, including the Open Document Format (ODF) specs. ISO lets you print out a copy. However, if you photocopy or scan it, or if you make it available on your organization’s LAN, the Copyright Police will haul you away.

I’ve seen similar restrictions elsewhere. They’re variations on the idea that you can download a document for free, but you can’t share it after you download it. It’s bizarre.

Maybe they’re trying to keep people from going into competition by selling copies of their standards. Since ISO also sells what it publishes, the goal would make sense. In fact, there’s a specific and emphatic prohibition on sales. But why they should care whether copies are printed or photocopied is beyond me.

Usually the answer to questions like these is “lawyers who are disconnected from reality.” If there’s a better answer, I’d love to hear it.