The name of the NLNZ (National Library of New Zealand) Metadata Extraction Tool suggests getting metadata more than identifying files, FITS uses it as part of its set of format identification tools. It employs a set of adapters to access the following file formats: BMP, GIF, JPEG TIFF, MS Word, Word Perfect, Open Office, MS Works, MS Excel, MS PowerPoint, PDF, WAV, MP3, BWF, FLAC, HTML, XML, and ARC. It also has a generic adapter to report basic file system information about other files. It’s available as open source on SourceForge under the Apache Public License. Output is in XML, with a choice of schemas. Like many other identification tools, it’s written in Java and can run on any desktop system that supports Java applications. It has command line versions for Unix and Windows, as well as a GUI version. The most recent update was in June 2014. A brief Developer’s Guide and an installation guide are available.
Like JHOVE, the NLNZ tool has its own code for processing various file formats, some of which are complicated, and like JHOVE, it’s met with varying degrees of success. The source code of the Word adapter says that it “adapts all Microsoft Word files from version 2.0 to XP/2003.” The PDF adapter says it handles versions 1.1 through 1.5 (the latest, ISO version is 1.7).
The NLNZ tool adapters check if a file meets some basic tests for the format, and if it doesn’t then other adapters will be tried, so it certainly qualifies as an identification tool within the range of formats it handles.
The source code is available only within the ZIP files for each version; this makes it difficult to tell how actively specific parts have been maintained. A spot check, though, suggests that many of the adapters haven’t been kept up to date.
I wasn’t able to run it. The launch script, metadata.sh, seems to make assumptions about the METAHOME directory that are inconsistent with the file structure, and I gave up after some diddling with it. If I get more information, I’ll update this post.
Often software projects in the library world come out of an initial burst of funding, after which it’s hard to maintain the programming staff time to do all the necessary updates. I think the NLNZ Metadata Extraction Tool may be a case in point.
Next: JHOVE2. To read this series from the beginning, start here.
The coming of WebP (or not)
The WebP image format has been around for about five years, but till recently it’s been mostly a curiosity. I last blogged about it in 2013, when it didn’t have very wide support. Since then most browsers have adopted it, and now Google+ is making more use of it (no surprise, since Google is the format’s principal backer). It promises smarter lossy compression than JPEG and smaller file sizes for the same image quality.
Continue reading →
Posted in commentary
Tagged images, software, WebP