New video course: How to Tell a File’s Format

Course logoMy new video course on Udemy, How to Tell a File’s Format: Five Open Source Tools is now live! This course introduces file, DROID, ExifTool, JHOVE, and Apache Tika, explaining how to install them and use them for format identification. Since I wrote most of the code for JHOVE, the course has some special tips on how to get the most out of it. For each tool, you get instructions on downloading and installing it and a screen capture demo. I’ll be available to help out with any questions.

The standard price for the course is $28, but you can enroll through December 31 for the introductory rate of not $27, not $26, but just $10 US! Use the coupon code INTRO1 to get this rate.

If you’re a currently active developer on any of the tools I mentioned, get in touch with me before December 31 and I’ll get you a free pass in exchange for your feedback.

Aside

Files that Last coverThrough December 31, you can buy my e-book Files that Last: Digital Preservation for Everygeek for just $3.20 on Smashwords. Use coupon code XY29D to get the discount.

Digitizing motion picture film

In this age of digital video, it’s easy to forget that until recently movies were all made and released on film. The Library of Congress’s digital preservation blog has a discussion of “Digitizing Motion Picture Film”, though this title doesn’t fully represent the ongoing debates. Some people, the article notes, think that film is still best preserved with film. Even now when a terabyte of data costs less than dinner for four at Denny’s, it takes a lot of bits to preserve movies at full resolution.
Continue reading

Course on file format identification tools: Progress report

The video course which I’m developing on “File Format Identification Tools” is almost ready to submit to Udemy. I’m holding off for a little more work at the Open Preservation Foundation on JHOVE, because some user interface details are going to change from the current beta (1.12 beta). The other tools covered will include file, DROID, ExifTool, and Apache Tika. This course should be useful to both students and professionals who want to learn how to use the tools.
Continue reading

File identification tools, part 10: Siegfried

“Do we really need another PRONOM-based file format identification tool?” That’s what Richard Lehane asked rhetorically last year on the Open Preservation Foundation blog. It was obviously rhetorical, since he’d gone ahead and done just that with a new tool called Siegfried. Siegfried recently turned up in some tweets by Ross Spencer, so it’s worth a mention here.
Continue reading

Iterating a directory in command line Tika

Apache Tika is best used as a library to wrap your own code around. Its GUI application is a toy, and its command line version isn’t all that great either. The command line can be improved with a little scripting, though.
Continue reading

The FLIF format

flif logoNew image file formats keep turning up, taking advantage of advances in compression technology. One of the latest is FLIF, Free Lossless Image Format. It claims to outcompress PNG, lossless JPEG2000, lossless WebP, and lossless BPG. Though it has only a lossless mode, it claims that “FLIF works well on any kind of image, so the end-user does not need to try different algorithms and parameters.”
Continue reading

The coming of WebP (or not)

The WebP image format has been around for about five years, but till recently it’s been mostly a curiosity. I last blogged about it in 2013, when it didn’t have very wide support. Since then most browsers have adopted it, and now Google+ is making more use of it (no surprise, since Google is the format’s principal backer). It promises smarter lossy compression than JPEG and smaller file sizes for the same image quality.
Continue reading

Video

Video: Introduction to JHOVE

A new video on my YouTube channel offers a seven-minute introduction to JHOVE. This is a teaser for my upcoming video course on file format identification tools, as well as a public test of the techniques I’ve been developing. It’s a screen capture video, and I cover the GUI version, even if it’s not as widely used, because it lets me focus on the concepts, and because it’s silly to teach a command line application in a video.
Continue reading

A link roundup on file formats

3D printing is an exciting new technology, but the formats to choose from are an alphabet soup.

A call for “PDF 2.0” or an “Analytical File Format.” The description is vague, but it sounds like something analogous to the Semantic Web for documents.

BW64, a new RIFF-based audio format. The article describes it as a “3D” format, but more significantly it’s a metadata-rich interchange format that supports really big files.

And just for bitter laughs: I need a ‘file’ format.”