January’s mostly over, and I’ve only posted three times to this blog. Files that Last has been keeping me busy. My posting should pick up again before long, once I get a draft out to first readers.
One thing I’ve been looking at, with an eye to the upcoming SPRUCE Hackathon, is things that can be done with FITS. I’ve written up the results of some profiling experiments and quick attempts at optimization. FITS puts together a lot of tools for extracting file metadata, but there have been some complaints that it’s not as fast as it might be. The first results were surprising; the easiest way to get a small improvement was to factor out the initialization of namespace URIs for parsing XML. You wouldn’t think that would make any detectable difference, but the initialization of URIs in Xerces is surprisingly slow.
Another possibility to explore is improving the connection between FITS and JHOVE. Even though JHOVE is intended for use as a callable library, among other things, it’s designed to write to an output file. Some simple changes would let it provide an in-memory response without writing a file, which would be more useful to an application like FITS.

So far there isn’t, as far as I know, a book to promote and explain digital preservation to people who understand computers but aren’t part of the library and archiving world. That’s where I’m aiming this book. If you look at the Library of Congress’s
“Digital forensics”
Now and then I see talk about “digital forensics.” It’s never clear what it’s supposed to mean. “Forensic” means “belonging to, used in, or suitable to courts of judicature or to public discussion and debate.” In popular usage, it’s generally applied to criminal investigations, especially in the phrase “forensic medicine.”
Some activities could be called digital forensics, where digital methods help to resolve contentious issues. For instance, textual analysis might shed light on an author’s identity. Digital techniques can even solve crimes. Too often, though, the term is getting stretched beyond meaningfulness, to the point that routine curation practices are called “forensics.”
No doubt it feels glamorous to think of oneself as the CSI of libraries, but let’s not get carried away with buzzwords.
1 Comment
Posted in commentary
Tagged forensics, language