Monthly Archives: January 2016

Notes of a reluctant video teacher

Question mark superimposed on file iconsAs you’ve doubtless notice if you follow this blog or my Twitter feed, I’ve made two video courses and put them up on You may be wondering why I’m doing this, especially if you know how much I hate being on camera.

Several steps have led to my being here. One is that the more gray hair you have, the more likely clients and employers are to assume the gray matter has leaked out of your brain, even though that’s nonsense. So I have to find other sources of income. I’ve been doing writing, including the book Files that Last, and having some successes there. Many people, though, like video learning, and turning written material into video presentation isn’t a huge step. I liked the arrangements Udemy offered, so I’ve given it a try.
Continue reading

fixit_tiff, a TIFF repair tool

The Sächsische Landesbibliothek – Staats- und Universitätsbibliothek Dresden (Saxon State and University Library Dresden), which somehow gets abbreviated to SLUB, has developed a tool for working with TIFF files in digital preservation. fixit_tiff is a command line utility, written in C, which can do some repairs on defective TIFF files. The focus appears to be on correcting common errors, not on repairing corrupted files. A blog post from July (in German) indicates it can do configurable validation using a simple query language.

It’s available under the same license as Libtiff. Just what is that license? The only thing I can find is a very outdated “Use and Copyright” statement, which is on a page so old it warns about patents on LZW compression. It’s available for free, anyway.

WAV format preservation assessment

The British Library’s Digital Preservation Team has issued a report on WAV Format Preservation Assessment. It cites the broad adoption of WAV and its extension BWF (Broadcast Wave Format) as a positive for preservation purposes and offers only a few cautions. I’m flattered by the recommendation, “Wherever possible and appropriate to the workflow, submitted content should be validated using JHOVE.”

A personal case study in digital obsolescence

Pegasus Winners coverThe nineties saw huge changes in personal computing, as operating systems became more complex, Internet connections became common, and the World Wide Web appeared. This meant a lot of instability as formats came and went.

This past weekend I discovered a CD-ROM in my closet with the production files for a small-run songbook, The Pegasus Winners (optimistically called “Volume 1”), that I produced in 1994. The good news is that the CD is still readable. The bad news is that I can’t read most of the files. The not-so-bad news is that I could probably recover them with moderate effort.
Continue reading

MRF for large images

NASA is using a format for online files, called MRF (Meta Raster Format), which is claimed to deliver images ten times as fast as JPEG2000 from cloud services when used with a compression algorithm called LERC. LERC is under patent by Esri, which says the technique is especially suited for geospatial applications and makes the algorithm “freely available to the geospatial and earth sciences community.” An implementation of MRF from NASA is available on GitHub under the Apache license, and an implementation of LERC is on GitHub from Esri.
Continue reading


I’ve now got a Facebook page on Mad File Format Science. I’m really not enthusiastic about Facebook at all, but it seems that it should be part of my Web presence.

Unicode security mechanisms

Unicode is a great thing, but sometimes its thoroughness poses problems. Different character sets often include characters that look exactly like common ASCII characters in most fonts, and these can be used to spoof domain names. Sometimes this is called a homograph attack or script spoofing. For instance, someone might register the domain gοο, which looks a lot like “,” but actually uses the Greek letter omicron instead of the Roman letter o. (Search this page in your browser for “google” if you don’t believe me.) Such tricks could lure unwary users into a phishing site. A real-life example, which didn’t even require more than ASCII, was a site called — that’s a capital I instead of a lower-case L, and they look the same in some fonts. That was way back in 2000.
Continue reading

New Udemy course: Personal Digital Preservation

course imageMy second Udemy course, Personal Digital Preservation, is now available! The regular price for enrolling is $16, but for readers of this blog (and anyone else you want to tell!) it’s just $10 with the coupon code DATALITH10. That code is good through the end of February.
Continue reading


Secrets of the online Harvard libraries

Here’s a new video on viewing publicly available information in the Harvard Library’s Digital Collections, Harvard Geospatial Library (HGL), and Visual Information Access (VIA).
Continue reading

Coming soon: Course on personal digital preservation

Promo image for personal digital preservation course
My next video course on Udemy will be (Udemy willing, which I think they will be) “Personal Digital Preservation: Keeping Your Files Safe and Usable.” Unlike my previous course on File Format Identification Tools, this one will be aimed at a broad audience: anyone who has a lot of files and wants to keep them usable for years to come. I’ll be covering three main areas: avoiding file loss, recovering files, and keeping files usable and understandable. The price will be $16, which will include about an hour of lectures as well as reference PDF files, but I’ll post a coupon code here to get it for less.

There’s still work to be done, including the approval process. It will appear as soon as it’s approved, so I can’t tell you an exact date, but I’m targeting January 12.