Category Archives: commentary

The (information) machine stops

The “Digital Dark Age” discussion has started up again on Twitter, and again I find myself in the minority position. It really is possible to have Twitter discussions on complex topics and say something intelligent, but it isn’t easy. More than 140 characters at a time are needed, and it’s been a while since I last wrote about the subject at length, so let’s get back to it. The last post that I wrote on this was “Dataliths vs. the Digital Dark Age”, and I hope you’ll read that before continuing here, since I don’t want to just repeat myself.

Maybe the question needs to be turned around. Let’s not ask what could trigger a Digital Dark Age, but what conditions are necessary and sufficient for the really long-term preservation of information, what will minimize the risk of widespread loss of today’s history, literature, and daily news?
Continue reading

Photoshop’s PSD file format

Photoshop’s native format, PSD, doesn’t get a lot of discussion. It’s Photoshop’s default format, and people use it for projects if only for that reason, so we really should know something about it. A lively place to start is “Fun Photoshop File Format Facts” on the Postlight blog. For serious investigation, look at Adobe’s specification. There’s also a short article on archiveteam.org, with some information about the format’s history.
Continue reading

JHOVE and PNG

A few days ago, I started writing a PNG module for JHOVE, partly to keep my Java skills up, partly to help me understand the PNG format. After a while I noticed there already is code for a PNG module and has been for a long time. I must have added it to SourceForge. According to a note in the code, Gian Uberto Lauri at Engineering Ingengeria Informatica S.p.a. created it in 2006. A good amount of work clearly went into it, but it won’t compile. It’s located in a non-source code directory (extramodules/it/eng/jhove/module/png/PngModule.java), so I had to copy it to src/java to try it out.
Continue reading

3D PDF and PDF/E

It must be a surprise to most people, but you can represent three-dimensional objects in PDF, in spite of its strictly 2-dimensional imaging model. It turns out there are two ways to do it, with the older U3D and the more modern PRC. What makes them possible is PDF’s annotation feature, which allows capabilities to be added to PDF, and the Acrobat 3D API. Full support of these features requires implementation of at least PDF 1.7 Extension Level 1, or to put it in application terms, Acrobat 8.1.

The PDF/E standard for engineering documents, aka ISO 24517, includes U3D but not PRC. A PDF/E-2 standard is currently in development and is expected to include PRC. PDF/E, like the other slashes of PDF, is a subset of the PDF standard (version 1.6), so obviously it’s possible to do 3D work without reference to it. It’s intended for cases where long-term retention or archiving is important. This suggests some affinity with PDF/A, which is specifically aimed at archive-quality documents, and the PDF Association, which is heavily involved in PDF/A, has recently started a PDF/E Competence Center. Oddly, the competence center says that PDF/E-1 “does not address 3D,” though other sources say PDF/E does reference U3D. Perhaps this is a matter of what really constitutes “addressing” 3D as opposed to just acknowledging it.

Billion-year storage?

What would you say about data storage with a lifetime of billions of years? I’d say that extraordinary claims require extraordinary support. The University of Southampton’s Optoelectronics Research Center says it’s developed digital storage that will last for 13.8 billion years at 190° C — or at least that’s how it came out in the report. Peter Kazansky says “we have created the first document which will likely survive the human race.” (And the death of the Sun?)
Continue reading

Religious authoritarianism vs. emoji

This post may be illegal in Indonesia. It includes the code point sequence U+1F468‍ U+200D U+2764️ U+FE0F U+200D U+1F48B‍ U+200D U+1F468, which renders as the emoji 👨‍❤️‍💋‍👨 or “man kissing man.” According to a Time article, the Indonesian Ministry of Communication and Informatics is “asking” Facebook to block the use of “gay” emoji. Failure to comply could mean the Negative Content Management Panel (George Orwell would have been impressed!) will block Facebook in Indonesia.

Emoji have generated several controversies already, but this is the first I’ve heard of a government censoring code points. It’s couched in terms of “sensitivity,” “respect,” and protecting children.

The PDF search problem

An article from the PDF Association points out the pitfalls in searching PDF documents. Even if a document has actual text in it, rather than being a scanned image, it might not hold the text in the natural character ordering. PDF is a format for rendering a document’s visible appearance, and it isn’t so good at holding semantic content. Chunks of text can be stored out of sequence as long as they render in the right place.
Continue reading

“High-res audio”

We hear a lot about “high-res audio” these days. Sound digitized at 192,000 samples per second must be a lot better than the usual 44,000, right? Well, maybe not.

We can hear sounds only in a certain frequency range. The popular rule of thumb is 20 to 20,000 Hertz, though there’s a lot of variation among people. Not a lot of people can hear anything higher than 20,000.
Continue reading

Which way to personal digital preservation?

Today I came across a video from the Library of Congress on “Why digital preservation is important for you.” Anyone following its advice will certainly have a better chance of keeping their files alive and organized for a long time. The only question is: Who’s going to follow that advice?
Continue reading

File fuzzing

Recently I came across the term “fuzzing” for intentionally damaging files to test the software that reads them. Most of the material I’ve found doesn’t provide a useful introduction; they assume that if you know the term, you already understand something about it. One good article is “Fuzzing — Mutation vs. Generation” on the Infosec website. According to that article, fuzzing denotes the response to file changes rather than the changes themselves, but I’m seeing the term used mostly in the latter sense.
Continue reading