I recently got an email reminding me that my Google+ account will go away on April 2. My first reaction was a yawn. Google has made the service steadily less attractive over the years. I just checked my feed for the first time in months, and it consists entirely of posts by people I don’t follow, on topics I don’t care about. Posts from this blog and my writing blog get links automatically posted to Google+, but otherwise I haven’t posted in a long time.
One of my posts got two comments from people I know, so it’s not totally dead, but it’s close. Google made the service as unattractive as they could. Posts by strangers keep showing up. Comments appear and disappear as you’re trying to read them. But there was a time when Google+ was somewhat useful. You might have material there which you want to save. Fortunately, Google provides a way to do this.
JHOVE is still alive and active! The Open Preservation Foundation is holding a workshop on “Getting Started with JHOVE” on January 25, 2019 in the Hague, Netherlands. The announcement says, “This workshop is aimed at beginners, or anyone who is new to JHOVE.”
OPF members get priority for registration.
Should there be songs about digital preservation? This is just a special case of the question, “Should there be songs about X?” For nearly all X, the answer is “Yes, and there probably are!” (Even — perhaps especially — if there shouldn’t be, there are.)
Someone in the Australiasian preservation community asked if AusPreserves needed a theme song. The first responses were existing popular songs, but then people started getting more creative. This led to the Digital Preservation Song Challenge!
One response was the Beyonce parody, “All the Corrupt Files” (“Put a checksum on it”). I think it’s the first song ever to mention JHOVE!
Naturally, I already have my own song on digital preservation, called Files that Last. I wrote it to promote my book of the same title, but it stands (or falls) by itself.
If it’s worth doing, it’s worth singing about, and that certainly applies to digital preservation!
For years I wrote most of the code for JHOVE. With each format, I wrote tests for whether a file is “well-formed” and “valid.” With most formats, I never knew exactly what these terms meant. They come from XML, where they have clear meanings. A well-formed XML file has correct syntax. Angle brackets and quote marks match. Closing tags match opening tags. A valid file is well-formed and follows its schema. A file can be well-formed but not valid, but it can’t be valid without being well-formed.
With most other formats, there’s no definition of these terms. JHOVE applies them anyway. (I wrote the code, but I didn’t design JHOVE’s architecture. Not my fault.) I approached them by treating “well-formed” as meaning syntactically correct, and “valid” as meaning semantically correct. Drawing the line wasn’t always easy. If a required date field is missing, is the file not well-formed or just not valid? What if the date is supposed to be in ISO 8601 format but isn’t? How much does it matter?
It’s been too long since I’ve had a special discount on FTL. For all of June, you can get Files that Last: Digital Preservation for Everygeek on Smashwords for just $4.00. That’s half off the regular price! The coupon code is KC49Z.
FTL is aimed at anyone with a moderate level of technical knowledge who’s concerned with keeping files from becoming useless over the years. It covers formats, metadata, media, file systems, and more.
The book is 100% DRM-free on Smashwords. I’ve done my best to keep it that way when it’s sold through other platforms but can’t always guarantee it.
What’s the format of a Google Docs file? The question may not even be meaningful. According to Jenny Mitcham at the University of York, there is no such thing as a Google Docs file. What you see when you open a document is an assembly of information from a database. You can export it in various file formats, but the exported file isn’t identical to the Google document.
This makes them risky from a preservation standpoint. You can’t save a local backup of a document. If you lose your Google account, or if censorship in your country cuts you off from it, you lose all your documents.
When you offer expert advice on something, such as digital preservation, you have to admit your own errors. I very nearly lost my 2016 tax return. When I tried to open it in TurboTax, the application just did nothing. I hadn’t exported it to a generally usable format. The TurboTax file format is proprietary and opaque.
Today is International Digital Preservation Day.
In honor of the day, I’m offering Files that Last: Digital Preservation for Everygeek on Smashwords at its lowest price ever. Today only, you can get it for $0.99 with the coupon code
AM26N. This is a one-day sale, so get it now if you don’t already have it!
There are new releases of VeraPDF and JHOVE today.
This XKCD cartoon showed up in my Twitter feed more times in one day than any previous one, for reasons that should be obvious.
Is PDF/A a good archival format? Many institutions use it, but it has problems which are inherent in PDF. With PDF/A-3, it has lost some of its focus. A format which can be a container for any kind of content isn’t great for digital preservation.
An article by Marco Klindt of the Zuse Institute Berlin takes a strong position against its suitability, with the title “PDF/A considered harmful for digital preservation.” Carl Wilson at the Open Preservation Foundation has added his own thoughts with “PDF/A and Long Term Preservation.”