What are “positives” in format validation?

Articles about JHOVE, such as Good GIF Hunting, grab my attention for obvious reasons. This article talks about false positive and negative results, and got me to thinking: What constitutes a “positive” result in file format validation? There are two ways to look at it:

  1. The default assumption is that the file is of a certain format, perhaps based on its extension, MIME type, or other metadata. The software sets out to see if it violates the format’s requirements. In that case, a positive result is that the file doesn’t conform to the requirements.
  2. The default assumption is that the file is just a collection of bytes. The software matches it against one or more sets of criteria. A positive result is that the file matches one of them.

Continue reading

Aside

The Libtiff source code repository is now on Gitlab. The old CVS repository on maptools.org will be maintained for historical purposes but won’t get any updates.

One reason for choosing Gitlab rather than Github is that there’s already a libtiff repository on Github. The reasons it’s there aren’t clear, but it’s definitely not an official Libtiff repository.

The Libtiff homepage continues to be on maptools.org.

International Digital Preservation Day

Today is International Digital Preservation Day.

Files that Last coverIn honor of the day, I’m offering Files that Last: Digital Preservation for Everygeek on Smashwords at its lowest price ever. Today only, you can get it for $0.99 with the coupon code
AM26N. This is a one-day sale, so get it now if you don’t already have it!

There are new releases of VeraPDF and JHOVE today.

Libtiff 4.0.9 released

Libtiff 4.0.9 has been released. According to the email announcing it:

A great many security improvements have been implemented by Even Rouault.

Much thanks to OSS Fuzz, team OWL337, Roger Leigh, and of course Even Rouault.

Obligatory reminder: Don’t download from libtiff dot org. It’s many years out of date.

JHOVE webinar

An Open Preservation Foundation webinar, “Putting JHOVE to the acid test: A PDF test-set for well-formedness validation in JHOVE,” will be held on November 21, 10 AM GMT (that’s 11 AM in Central Europe and a ludicrous 5 AM or earlier in the US).
Continue reading

Popular Science on format conversion

Popular Science has an article, “How to convert any file to any format.” The title overreaches, but the article actually isn’t too bad. It’s addressed at the ordinary user, not the file format specialist, so it wouldn’t be appropriate to complain too much that it has more breadth than depth.

It starts by recommending using the application that created the file, and that’s certainly good advice. Even when formats are open standards, an app knows more about how it creates its own files than anyone else does. Its files might have bits of application-specific information.
Continue reading

Aside

This XKCD cartoon showed up in my Twitter feed more times in one day than any previous one, for reasons that should be obvious.

XKCD on Digital Resource Lifespan

The PDF/A controversy

Is PDF/A a good archival format? Many institutions use it, but it has problems which are inherent in PDF. With PDF/A-3, it has lost some of its focus. A format which can be a container for any kind of content isn’t great for digital preservation.

An article by Marco Klindt of the Zuse Institute Berlin takes a strong position against its suitability, with the title “PDF/A considered harmful for digital preservation.” Carl Wilson at the Open Preservation Foundation has added his own thoughts with “PDF/A and Long Term Preservation.”

Continue reading

Apple’s HEIF and HEVC

At this year’s WWDC, Apple introduced a new format for still images and video. The container is called High Efficiency Image Format (HEIF), and it uses a codec called High Efficiency Video Coding (HEVC). HEIF files can store still images, video, or both at once. Apple doesn’t have proper documentation on its site, as far as I can see, but a slideshow on HEIF and one on HEVC provide a lot of information. Kelly Thompson provides a technical overview.

Continue reading

PDF 2.0

The ISO specification for PDF 2.0 is now out. It’s known as ISO 32000-2. As usual for ISO, it costs an insane 198 Swiss francs, which is roughly the same amount in dollars. In the past, Adobe has made PDF specifications available for free on its own site, but I can’t find it on adobe.com. Its PDF reference page still covers only PDF 1.7.

ISO has to pay its bills somehow, but it’s not good if the standard is priced so high that only specialists can afford it. I don’t intend to spend $200 to be able to update JHOVE without pay. With some digging, I’ve found it in an incomplete, eyes-only format. All I can view is the table of contents. There are links to all sections, but they don’t work. I’m not sure whether it’s broken on my browser or by intention. In any case, it’s a big step backward as an open standard. I hope Adobe will eventually put the spec on its website.
Continue reading