Tag Archives: preservation

iPRES 2012

iPRES 2012 now has real information on its website.

JHOVE 1.7, finally!

After well over a year, a new version of JHOVE is finally available. Really, not very much has changed since 1.6 as far as the software itself goes. However, I’m leaving Harvard at the end of August and asked for and got custody of JHOVE, so this version marks its transition from a Harvard-supported project (which, in practice, it hasn’t been for a long time) to a separate open-source project. The JHOVE web pages are now hosted on SourceForge, and all support and discussion will go through SourceForge. The jhove-support and jhove-users mailing lists hosted by Harvard will shut down in the near future.

This doesn’t mean JHOVE is dead. I may actually have more opportunities to work on it than before, now that I’m going into independent consulting. I need to stay visible to the library and preservation world, and this is one way to do it.

Meanwhile, I’m looking for contract opportunities. Please take a look at my new business site or my LinkedIn profile.

JHOVE web pages moved

The web pages for JHOVE are now on SourceForge. They’ll remain on the Harvard site for some period of time but won’t be further updated.

There’s at least a chance this means there will be a release of JHOVE soon. Yes, I know, I’ve been promising that for a long time.

Contributors to JHOVE2

The JHOVE2 project has issued a governance document (PDF) for contributors to the JHOVE2 project. Stephen Abrams writes that “we believe it important to enlist the efforts of the wider user community in future efforts. Working collectively, we can most effectively take advantage of opportunities to enhance and extend the utility of JHOVE2, especially in times of significant constraints on local institutional resources.”

PDF/A post on FTL

Today on Files That Last I have a post on “PDF/A for the long haul.” It’s directed at the end user or administrator, not at the formats geek or preservation specialist, but might be useful to link to when you’re explaining what PDF/A is good for.

IPRES proceedings

The IPRES proceedings for 2011 are now available.

IPRES 2012 will be in Toronto, making it the most convenient one for Americans in years. It will be September 30 to October 5 (which is when I was planning to be in Germany … just can’t win),

The email jungle

In researching tomorrow’s post on email preservation on Files That Last, I came to appreciate more thoroughly how messy email formats are. RFC 4155, which defines “the ‘default’ mbox database format” (their quotes around “default”) and application/mbox MIME type, tells us that “The mbox database format is not documented in an authoritative specification, but instead exists as a well-known output format that is anecdotally documented, or which is only authoritatively documented for a specific platform or tool.”

Some versions may have eight-bit character data with the character encoding not explicitly specified, and possibly varying from one file creator to another. The format of email addresses isn’t specified. A short page on qmail.org, referenced from RFC 4155, discusses some of the variants, including mboxo, mboxrd, mboxc1, and mboxc12. The differences may appear minor, but they’re sufficient that a parser that assumes one of the variants can fail when it encounters the others.

Then there’s the encoding issue. Most of the world has settled on MIME by now, but older archives (and perhaps some recent ones) may contain messages encoded with uuencode, BinHex, or Apple Single. The last two are found mostly with mail that was sent from Macintosh clients, but uuencode was once widely used — and poorly standardized.

An alternative email archiving format is the CERP XML schema. This looks at a glance as if it provides better structuring than MBOX, but it isn’t as widely supported.

Update: The FTL post is now available at “You HAD mail.”

New blog: Files That Last

Today I’m launching a new tech blog, called “Files That Last.” As you might guess, its subject is digital preservation. Why do we need another preservation blog? Perhaps “we” don’t, where we’re mostly people closely connected with libraries and archives, but it’s a topic that’s ripe for more attention from the general computer-tech community, as everyone relies increasingly on computer files for long-term memory. Its focus will be practical guidance. Since it’s a solo operation, I’ll be able to say things the Library of Congress really shouldn’t.

I’ll be running that blog on a more regular schedule than this one, with weekly posts. Please drop by, and if you like what you see please spread the word.