The last installment in this series looked at
file, a simple command line tool available with Linux and Unix systems for determining file types. This one looks at DROID (Digital Record Object IDentification), a Java-based tool from the UK National Archives, focused on identifying and verifying files for the digital repositories of libraries and archives. It’s available as open source software under the New BSD License. Java 7 or 8 is needed for the current release (6.1.5). It relies on PRONOM, the National Archive’s registry of file format information.
file, DROID depends on files that describe distinctive data values for each format. It’s designed to process large batches of files and compiles reports in a much more useful way than
file‘s output. Reports can include total file counts and sizes by various criteria.
To install DROID, you have to download and expand the ZIP file for the latest version. On Windows, you run droid.bat; on sensible operating systems, run droid.sh. You may first have to make it executable:
chmod +x droid.sh
Running droid.sh with no arguments launches the GUI application. If there are any command line arguments, it runs as a command line tool. You can type
to see all the options.
The first time you run it as a GUI application, it may ask if you want to download some signature file updates from PRONOM. Let it do that.
It’s also possible to use DROID as a Java library in another application. FITS, for example, does this. There isn’t much documentation to help you, but if you’re really determined to try, look at the FITS source code for an example.
DROID will report file types by extension if it can’t find a matching signature. This isn’t a very reliable way to identify a file, and you should examine any files matched only by extension to see what they really are and whether they’re broken. It may report more than one matching signature; this is very common with files that match more than one version of a format.
It isn’t possible to cover DROID in any depth in a blog post. The document Droid: How to use it and how to interpret your results is a useful guide to the software. It’s dated 2011, so some things may have changed.