The inventor of binary encoding

Francis Bacon may not have written Shakespeare’s plays, but he wrote the Novum Organum, a foundational work of scientific methodology. He did something else almost as impressive: He invented the binary encoding of text. In the early 17th century he wrote:

First let all the Letters of the Alphabet, by transposition, be resolved into two Letters onely; for the transposition of two Letters by five placeings will be sufficient for 32. Differences, much more for 24. which is the number of the Alphabet . The example of such an Alphabet is on this wise.

By “transposition” he meant the use of two letters, such as A and B, as units of an encoded message. They could just as well have been 1 and 0, or any other pair. Using five letters gives 25 or 32 possible encodings. AAAAA signifies A, AAAAB is B, AAABA is C, and so on. He said there were 24 letters in the alphabet because in his time I and J were considered the same letter, as were U and V. It’s a very short hop from this encoding to Baudot, and just an extension to seven letters (bits) to get ASCII.
Continue reading

How Twitter renders GIF

I’ve long wondered how Twitter renders animated GIF files. I have Firefox set to disable GIF animation, and it works everywhere except on Twitter. Apart from that, the interface indicates something is going on beyond normal GIF display. It doesn’t animate till you hit the “Play” button, and then there’s apparently no way to stop it.
Continue reading

SMS messages and GSM encoding

Today I learned from a science fiction discussion group that SMS messages don’t use UTF-8. In fact, they don’t even use ASCII or an extension of it. It’s a case of old technology which has survived beyond its time.

The usual encoding for SMS text messages is GSM-7. Most cell phones use it, regardless of whether they’re on the GSM network or not. They generally support Unicode as well, but in a strange way.
Continue reading

What are “positives” in format validation?

Articles about JHOVE, such as Good GIF Hunting, grab my attention for obvious reasons. This article talks about false positive and negative results, and got me to thinking: What constitutes a “positive” result in file format validation? There are two ways to look at it:

  1. The default assumption is that the file is of a certain format, perhaps based on its extension, MIME type, or other metadata. The software sets out to see if it violates the format’s requirements. In that case, a positive result is that the file doesn’t conform to the requirements.
  2. The default assumption is that the file is just a collection of bytes. The software matches it against one or more sets of criteria. A positive result is that the file matches one of them.

Continue reading


The Libtiff source code repository is now on Gitlab. The old CVS repository on will be maintained for historical purposes but won’t get any updates.

One reason for choosing Gitlab rather than Github is that there’s already a libtiff repository on Github. The reasons it’s there aren’t clear, but it’s definitely not an official Libtiff repository.

The Libtiff homepage continues to be on

International Digital Preservation Day

Today is International Digital Preservation Day.

Files that Last coverIn honor of the day, I’m offering Files that Last: Digital Preservation for Everygeek on Smashwords at its lowest price ever. Today only, you can get it for $0.99 with the coupon code
AM26N. This is a one-day sale, so get it now if you don’t already have it!

There are new releases of VeraPDF and JHOVE today.

Libtiff 4.0.9 released

Libtiff 4.0.9 has been released. According to the email announcing it:

A great many security improvements have been implemented by Even Rouault.

Much thanks to OSS Fuzz, team OWL337, Roger Leigh, and of course Even Rouault.

Obligatory reminder: Don’t download from libtiff dot org. It’s many years out of date.

JHOVE webinar

An Open Preservation Foundation webinar, “Putting JHOVE to the acid test: A PDF test-set for well-formedness validation in JHOVE,” will be held on November 21, 10 AM GMT (that’s 11 AM in Central Europe and a ludicrous 5 AM or earlier in the US).
Continue reading

Popular Science on format conversion

Popular Science has an article, “How to convert any file to any format.” The title overreaches, but the article actually isn’t too bad. It’s addressed at the ordinary user, not the file format specialist, so it wouldn’t be appropriate to complain too much that it has more breadth than depth.

It starts by recommending using the application that created the file, and that’s certainly good advice. Even when formats are open standards, an app knows more about how it creates its own files than anyone else does. Its files might have bits of application-specific information.
Continue reading


This XKCD cartoon showed up in my Twitter feed more times in one day than any previous one, for reasons that should be obvious.

XKCD on Digital Resource Lifespan