Articles about JHOVE, such as Good GIF Hunting, grab my attention for obvious reasons. This article talks about false positive and negative results, and got me to thinking: What constitutes a “positive” result in file format validation? There are two ways to look at it:
- The default assumption is that the file is of a certain format, perhaps based on its extension, MIME type, or other metadata. The software sets out to see if it violates the format’s requirements. In that case, a positive result is that the file doesn’t conform to the requirements.
- The default assumption is that the file is just a collection of bytes. The software matches it against one or more sets of criteria. A positive result is that the file matches one of them.

SMS messages and GSM encoding
Today I learned from a science fiction discussion group that SMS messages don’t use UTF-8. In fact, they don’t even use ASCII or an extension of it. It’s a case of old technology which has survived beyond its time.
The usual encoding for SMS text messages is GSM-7. Most cell phones use it, regardless of whether they’re on the GSM network or not. They generally support Unicode as well, but in a strange way.
Continue reading →
Comments Off on SMS messages and GSM encoding
Posted in commentary
Tagged emoji, SMS, Unicode