A widely available file identification tool is simply called file. It comes with nearly all Linux and Unix systems, including Macintosh computers running OS X. Detailed “man page” documentation is available. It requires using the command line shell, but its basic usage is simple:
file starts by checking for some special cases, such as directories, empty files, and “special files” that aren’t really files but ways of referring to devices. Second, it checks for “magic numbers,” identifiers that are (hopefully) unique to the format near the beginning of the file. If it doesn’t find a “magic” match, it checks if the file looks like a text file, checking a variety of character encodings including the ancient and obscure EBCDIC. Finally, if it looks like a text file,
file will attempt to determine if it’s in a known computer language (such as Java) or natural language (such as English). The identification of file types is generally good, but the language identification is very erratic.
The identification of magic numbers uses a set of magic files, and these vary among installations, so running the same version of
file on different computers may produce different results. You can specify a custom set of magic files with the
-m flag. If you want a file’s MIME type, you can specify
--mime-encoding. For example:
file --mime xyz.pdf
will tell you the MIME type of xyz.pdf. If it really is a PDF file, the output will be something like
xyz: application/pdf; charset=binary
If instead you enter
file --mime-type xyz.pdf
If some tests aren’t working reliably on your files, you can use the
-e option to suppress them. If you don’t trust the magic files, you can enter
file -e soft xyz.pdf
But then you’ll get the uninformative
-k option tells
file not to stop with the first match but to apply additional tests. I haven’t found any cases where this is useful, but it might help to identify some weird files. It can slow down processing if you’re running it on a large number of files.
As with many other shell commands, you can type
file --help to see all the options.
file can easily be fooled and won’t tell you if a file is defective, but it’s a very convenient quick way to query the type of a file.
Windows has a roughly similar command line tool called FTYPE, but its syntax is completely different.