Apple hides attachments in malformed multipart mail

Recently I got a PDF of a filk songbook which I had contributed to. More precisely, the email said I was getting it, but there was no sign of an attachment. I wrote back to the editor who’d sent it, and she insisted it was there. Digging it out of the message revealed to me a whole new way of messing up email formats.

A quick look at the message source showed that there really was an attachment with Content-Type of “application/pdf” which took up well over 90% of the message. The question was why Thunderbird didn’t show it to me.
JHOVE PNG module, progress report

There’s now a JHOVE PNG module on my GitHub site. The relevant new classes are com.mcgath.jhove.module.PngModule and everything in the package com.mcgath.jhove.module.png. I could have continued from Lauri’s code as I mentioned in my previous post, but I like a more factored approach, so I continued with my own code, which has a separate class for each chunk type. Take a look at the top-level file FORKNOTES for what I’ve been doing.

It does a pretty decent job of validating files and extracting metadata now, but some chunk types are still ignored, and there are some design decisions on the extracted metadata that I’m not sure about yet. Also, JHOVE modules usually have a lot of metadata about themselves, and that’s not complete yet. If anyone wants to play with it, keeping in mind that it’s not stable code yet, please do and submit issue reports for bugs and suggestions.

Notes of a reluctant video teacher

Question mark superimposed on file iconsAs you’ve doubtless notice if you follow this blog or my Twitter feed, I’ve made two video courses and put them up on You may be wondering why I’m doing this, especially if you know how much I hate being on camera.

Several steps have led to my being here. One is that the more gray hair you have, the more likely clients and employers are to assume the gray matter has leaked out of your brain, even though that’s nonsense. So I have to find other sources of income. I’ve been doing writing, including the book Files that Last, and having some successes there. Many people, though, like video learning, and turning written material into video presentation isn’t a huge step. I liked the arrangements Udemy offered, so I’ve given it a try.
Course on file format identification tools: Progress report

The video course which I’m developing on “File Format Identification Tools” is almost ready to submit to Udemy. I’m holding off for a little more work at the Open Preservation Foundation on JHOVE, because some user interface details are going to change from the current beta (1.12 beta). The other tools covered will include file, DROID, ExifTool, and Apache Tika. This course should be useful to both students and professionals who want to learn how to use the tools.
Professional update

Just to keep everyone up to date on what I’m doing professionally:

Currently I’m back in consulting mode, offering my services for software development and consultations. Those of you who’ve been following this blog regularly know I’ve been working with libraries for a long time and I’m familiar with the technology. I’ve updated my business home page at and moved it to new hosting, which will allow me to put demos and other materials of interest on the site.

The key to success is, of course, networking. so if you happen to hear of a situation where my skills could be put to good use, please let me know.

Format registry browser on Github

I’ve put the format-reg-browser project up on Github, in case anyone wants to play with the code. This is the first time I’ve committed code to any kind of Git site, but it looks as if the code’s really there. Let me know if there are any problems.


Personal note: I’m now on LinkedIn and looking for contract software development work starting in September.

New blog: Files That Last

Today I’m launching a new tech blog, called “Files That Last.” As you might guess, its subject is digital preservation. Why do we need another preservation blog? Perhaps “we” don’t, where we’re mostly people closely connected with libraries and archives, but it’s a topic that’s ripe for more attention from the general computer-tech community, as everyone relies increasingly on computer files for long-term memory. Its focus will be practical guidance. Since it’s a solo operation, I’ll be able to say things the Library of Congress really shouldn’t.

I’ll be running that blog on a more regular schedule than this one, with weekly posts. Please drop by, and if you like what you see please spread the word.

State of the blog

It’s been a while since I’ve posted here. One reason is that I’ve been working on a book proposal and gotten a favorable preliminary response from a publisher. Hopefully I’ll have good news to announce here soon.

In the meantime, I’m using Google+ and enjoying it. If you want to find me, I’m here. I wouldn’t mind connecting up with other people in the digital preservation world.