Yearly Archives: 2015

DOTS: Almost a datalith

A lot of people in digital preservation are convinced a “digital dark age” is nothing to worry about. I’ve consistently disagreed with this. The notion that archivists will replace outdated digital media every decade or two through the centuries is a pipe dream. Records have always gone through periods of neglect, and they will in the future. Periods of unrest will happen; authorities will try to suppress inconvenient history; groups like Daesh will set out to destroy everything that doesn’t match their worldview; natural disasters will disrupt archiving.

I’ve proposed the idea of a “datalith,” a data record made out of rock or equivalent material, optically readable and self-explanatory assuming a common language survives. DOTS, Digital Optical Technology System, is burned on tape rather than engraved in stone, but in every other respect it matches my vision of a datalith. It can store digital images in any format but also allows them to be recorded as a visual representation. The Long Now Foundation explains:
Continue reading

Approaching the year’s end

Several people have already signed up for my Udemy course on file, ExifTool, DROID, JHOVE, and Tika. It looks as if most of them have taken advantage of the discount code INTRO1 to get it at just $10 and are planning to take it later on. This makes complete sense, since the code is good just till the end of this year. If you’re taking the course, feel free to start a discussion or ask questions; I’ll answer them to the best of my ability. If you’re a specialist in one of these tools and would like to see how I’m teaching it, I’ll offer you a free pass if your credentials are good.
Continue reading


The PRONOM file format signature files were updated on December 17. DROID users should make sure they have the latest files.

Support your favorite (Unicode) character

What’s your favorite character? Luke Skywalker? Georgia Mason? Captain Ahab?

Oh, sorry, we’re not talking about that kind of character. We’re talking about characters like the Hungarian double-acute u (ű), the four-leaf clover emoji (🍀), or the Katakana “ka” (カ). The Unicode Consortium is looking for people to “adopt” their favorite characters with a tax-deductible donation. Each character can have one Gold ($5000) sponsor, five Silver ($1000) sponsors, and any number of Bronze ($100) sponsors. As I read the rules, only recognized Unicode characters are eligible, so you probably can’t support Klingon characters.

New video course: How to Tell a File’s Format

Course logoMy new video course on Udemy, How to Tell a File’s Format: Five Open Source Tools is now live! This course introduces file, DROID, ExifTool, JHOVE, and Apache Tika, explaining how to install them and use them for format identification. Since I wrote most of the code for JHOVE, the course has some special tips on how to get the most out of it. For each tool, you get instructions on downloading and installing it and a screen capture demo. I’ll be available to help out with any questions.

The standard price for the course is $28, but you can enroll through December 31 for the introductory rate of not $27, not $26, but just $10 US! Use the coupon code INTRO1 to get this rate.

If you’re a currently active developer on any of the tools I mentioned, get in touch with me before December 31 and I’ll get you a free pass in exchange for your feedback.


Through December 31, you can buy my e-book Files that Last: Digital Preservation for Everygeek for just $3.20 on Smashwords. Use coupon code XY29D to get the discount.

Digitizing motion picture film

In this age of digital video, it’s easy to forget that until recently movies were all made and released on film. The Library of Congress’s digital preservation blog has a discussion of “Digitizing Motion Picture Film”, though this title doesn’t fully represent the ongoing debates. Some people, the article notes, think that film is still best preserved with film. Even now when a terabyte of data costs less than dinner for four at Denny’s, it takes a lot of bits to preserve movies at full resolution.
Continue reading

Course on file format identification tools: Progress report

The video course which I’m developing on “File Format Identification Tools” is almost ready to submit to Udemy. I’m holding off for a little more work at the Open Preservation Foundation on JHOVE, because some user interface details are going to change from the current beta (1.12 beta). The other tools covered will include file, DROID, ExifTool, and Apache Tika. This course should be useful to both students and professionals who want to learn how to use the tools.
Continue reading

File identification tools, part 10: Siegfried

“Do we really need another PRONOM-based file format identification tool?” That’s what Richard Lehane asked rhetorically last year on the Open Preservation Foundation blog. It was obviously rhetorical, since he’d gone ahead and done just that with a new tool called Siegfried. Siegfried recently turned up in some tweets by Ross Spencer, so it’s worth a mention here.
Continue reading

Iterating a directory in command line Tika

Apache Tika is best used as a library to wrap your own code around. Its GUI application is a toy, and its command line version isn’t all that great either. The command line can be improved with a little scripting, though.
Continue reading