File identification tools, part 10: Siegfried

“Do we really need another PRONOM-based file format identification tool?” That’s what Richard Lehane asked rhetorically last year on the Open Preservation Foundation blog. It was obviously rhetorical, since he’d gone ahead and done just that with a new tool called Siegfried. Siegfried recently turned up in some tweets by Ross Spencer, so it’s worth a mention here.

It’s very simple in its design; you run it on a file or directory, and the only special commands are to get the version and update the signatures. You can also try it out by dragging files to the grainy picture of Siegfried (it’s a very retro page). Its output is YAML, which doesn’t stand for “yet another markup language,” as you might think, but recursively for “YAML Ain’t Markup Language.”

The output includes a field called “BASIS,” which tells you how it made the identification. It explains it in terms of file offsets and bytes, so you have to dump the original file to understand what it’s doing, but it can be a useful check.

I guess that makes this the latest entry in my long-neglected file format identification tools sequence.

