Category Archives: commentary

The future of TIFF

Is TIFF a legacy format?

The most recent version of the TIFF specification, 6.0, dates from 1992. Adobe updated it with three technical notes, the latest coming out in 2002. Since then there has been nothing.

The format is solid, but the past quarter-century has seen reasons to enhance it. BigTIFF is a variant of the format to accommodate larger files. It isn’t backward-compatible with TIFF, but the changes mostly concern data lengths and are easy to add to a TIFF interpreter. The format sits in a kind of limbo, since Adobe owns the spec but is no longer updating it. There have been new tags which have achieved consensus acceptance but don’t have official status. AWare Systems has a list of known tags but has no reliable way to say which ones are private and which are generally accepted. There’s no way to add a new compression or encryption algorithm, or any other new feature, and give it official status.
Continue reading

Can a .txt file contain malware?

The Internet Crime Complaint Center reported that some email messages are impersonating it in an attempt to get malware onto target computers. That’s clearly worth knowing about, but this part of the report is odd:

The unknown actors also attached a text document (.txt) to download, complete, and return to the perpetrators. The text file contained malware which was designed to further victimize the recipient.

It really shouldn’t be possible to run malware by opening a .txt file. It should just open in a text editor, with no execution of code. There’s no further explanation.
Continue reading

The inventor of binary encoding

Francis Bacon may not have written Shakespeare’s plays, but he wrote the Novum Organum, a foundational work of scientific methodology. He did something else almost as impressive: He invented the binary encoding of text. In the early 17th century he wrote:

First let all the Letters of the Alphabet, by transposition, be resolved into two Letters onely; for the transposition of two Letters by five placeings will be sufficient for 32. Differences, much more for 24. which is the number of the Alphabet . The example of such an Alphabet is on this wise.

By “transposition” he meant the use of two letters, such as A and B, as units of an encoded message. They could just as well have been 1 and 0, or any other pair. Using five letters gives 25 or 32 possible encodings. AAAAA signifies A, AAAAB is B, AAABA is C, and so on. He said there were 24 letters in the alphabet because in his time I and J were considered the same letter, as were U and V. It’s a very short hop from this encoding to Baudot, and just an extension to seven letters (bits) to get ASCII.
Continue reading

How Twitter renders GIF

I’ve long wondered how Twitter renders animated GIF files. I have Firefox set to disable GIF animation, and it works everywhere except on Twitter. Apart from that, the interface indicates something is going on beyond normal GIF display. It doesn’t animate till you hit the “Play” button, and then there’s apparently no way to stop it.
Continue reading

SMS messages and GSM encoding

Today I learned from a science fiction discussion group that SMS messages don’t use UTF-8. In fact, they don’t even use ASCII or an extension of it. It’s a case of old technology which has survived beyond its time.

The usual encoding for SMS text messages is GSM-7. Most cell phones use it, regardless of whether they’re on the GSM network or not. They generally support Unicode as well, but in a strange way.
Continue reading

What are “positives” in format validation?

Articles about JHOVE, such as Good GIF Hunting, grab my attention for obvious reasons. This article talks about false positive and negative results, and got me to thinking: What constitutes a “positive” result in file format validation? There are two ways to look at it:

  1. The default assumption is that the file is of a certain format, perhaps based on its extension, MIME type, or other metadata. The software sets out to see if it violates the format’s requirements. In that case, a positive result is that the file doesn’t conform to the requirements.
  2. The default assumption is that the file is just a collection of bytes. The software matches it against one or more sets of criteria. A positive result is that the file matches one of them.

Continue reading

Popular Science on format conversion

Popular Science has an article, “How to convert any file to any format.” The title overreaches, but the article actually isn’t too bad. It’s addressed at the ordinary user, not the file format specialist, so it wouldn’t be appropriate to complain too much that it has more breadth than depth.

It starts by recommending using the application that created the file, and that’s certainly good advice. Even when formats are open standards, an app knows more about how it creates its own files than anyone else does. Its files might have bits of application-specific information.
Continue reading

The PDF/A controversy

Is PDF/A a good archival format? Many institutions use it, but it has problems which are inherent in PDF. With PDF/A-3, it has lost some of its focus. A format which can be a container for any kind of content isn’t great for digital preservation.

An article by Marco Klindt of the Zuse Institute Berlin takes a strong position against its suitability, with the title “PDF/A considered harmful for digital preservation.” Carl Wilson at the Open Preservation Foundation has added his own thoughts with “PDF/A and Long Term Preservation.”

Continue reading

PDF 2.0

The ISO specification for PDF 2.0 is now out. It’s known as ISO 32000-2. As usual for ISO, it costs an insane 198 Swiss francs, which is roughly the same amount in dollars. In the past, Adobe has made PDF specifications available for free on its own site, but I can’t find it on Its PDF reference page still covers only PDF 1.7.

ISO has to pay its bills somehow, but it’s not good if the standard is priced so high that only specialists can afford it. I don’t intend to spend $200 to be able to update JHOVE without pay. With some digging, I’ve found it in an incomplete, eyes-only format. All I can view is the table of contents. There are links to all sections, but they don’t work. I’m not sure whether it’s broken on my browser or by intention. In any case, it’s a big step backward as an open standard. I hope Adobe will eventually put the spec on its website.
Continue reading

A world of emoji misinformation

July 17 was World Emoji Day. Anyone can declare a World Anything Day, but my local library thought it was important enough to give it part of a sign, along with Cell Phone Courtesy Month.
Library sign giving inaccurate information about Emoji They didn’t think it was important enough to give accurate information, though. It does tell us something about how non-tech people think of emoji. Here’s the content of the sign, with commentary.
Continue reading