Category Archives: commentary

When is an algorithm not an algorithm?

The only time the news media use the term “algorithm,” it seems, is for computational methods that aren’t.

Merriam-Webster defines it as “a procedure for solving a mathematical problem (as of finding the greatest common divisor) in a finite number of steps that frequently involves repetition of an operation.” Let’s forget about repetition; almost every computational procedure uses loops. The key word is “mathematical.”

An algorithm produces results that can be mathematically verified. An algorithm for calculating pi will produce the known value to the needed level of precision, or it’s wrong. A search algorithm is an algorithm when its results correspond to precise matching criteria.
Continue reading

The decline and fall of Adobe Flash

It’s been a year since I last posted about Adobe Flash’s impending demise. Like everything else on the Internet, it won’t ever vanish completely, but its decline is accelerating.
Continue reading

Olympic file format capriciousness

This blog doesn’t generally deal with cronyist bullying operations like the International Olympic Committee (IOC). But when the IOC get silly about the file formats it tells people they can’t use, that’s a subject worth mentioning here.

The IOC has decreed that “the use of Olympic Material transformed into graphic animated formats such as animated GIFs (i.e. GIFV), GFY, WebM, or short video formats such as Vines and others, is expressly prohibited.”
Continue reading

Newspeak, emoji style

In Orwell’s 1984, the Newspeak language followed the principle that if you can abolish certain words, you can abolish the thoughts that go with them.

It was intended that when Newspeak had been adopted once and for all and Oldspeak forgotten, a heretical thought — that is, a thought diverging from the principles of Ingsoc — should be literally unthinkable, at least so far as thought is dependent on words. … This was done partly by the invention of new words, but chiefly by eliminating undesirable words and by stripping such words as remained of unorthodox meanings, and so far as possible of all secondary meanings whatever.

Apple is doing something like this with Unicode codepoint U+1F52B (🔫), which the code chart defines as PISTOL, with the explanatory text of “handgun, revolver.” There’s nothing that suggests it’s supposed to represent a water gun or any other kind of toy. However, Apple has elected to represent this character as a water pistol in iOS 10.
Continue reading

The persistence of old formats

Technologies develop to a point where they’re good enough for widespread use. Once a lot of people have adopted them, it’s hard to move on from there to a still better one, since people have invested so much in a technology that works for them. We see this with cell phone communication, which is pretty good but would undoubtedly be much better if it could be invented all over today. We see it with the DVD format, which Blu-Ray hasn’t managed to push aside in spite of huge marketing efforts. And we see it in file formats.

Most of today’s highly popular formats have been around since the nineties. For images, we still have TIFF, JPEG, PNG, and even the primitive GIF format, which goes back to the eighties. In audio, MP3 still dominates, even though there are now much better alternatives.

This is a good thing in many ways. If new, improved formats displaced old ones every five years, we’d be constantly investing in new software, and anyone who didn’t upgrade would be unable to read a lot of new files. Digital preservation would be a big headache, as archivists would need to migrate files repeatedly to avoid obsolescence.

It does mean, though, that we’re working with formats that have deficiencies which often have grown in importance. JPEG compression isn’t nearly as good as what modern techniques can manage. MP3 is encumbered with patents and offers sound quality that’s inferior to other lossy audio formats. HTML has improved through major revisions, but it’s still a mess to validate. For that matter, we have formats like “English,” which lacks any spec and is a pile of kludges that have accumulated over centuries. Try finding support for supposed improvements such as Esperanto anywhere.

It’s a situation we just have to live with. The good enough hangs on, and the better has a hard time getting acceptance. Considering how unstable the world of data would be if this weren’t the case, it’s a good thing on the whole.

Don’t hide those file extensions!

Lately I’ve ghostwritten several pieces on Internet security and how to protect yourself against malicious files. One point comes up over and over: Don’t hide file extensions! If you get a file called Evilware.pdf.exe, then Microsoft thinks you should see it as Evilware.pdf. The default setting on Windows conceals file extensions from you; you have to change a setting to view files by their actual names.

What’s this supposed to accomplish, besides making you think executable files are just documents? I keep seeing vague statements that this somehow “simplifies” things for users. If they see a file called “Document.pdf,” Microsoft’s marketing department thinks people will say, “What’s that .pdf at the end of the name? This is too bewildering and technical for me! I give up on this computer!”

They also seem to think that when people run a .exe file, not knowing it is one because the extension is hidden, and it turns out to be ransomware that encrypts all the files on the computer, that’s a reasonable price to pay for making file names look simpler. It’s always marketing departments that are to blame for this kind of stupidity; I’m sure the engineers know better.
Continue reading

PDF/A and forms

The PDF Association reminds us that we can use PDF forms for electronic submissions. It’s a useful feature, and I’ve filled out PDF forms now and then. However, one point seems wrong to me:

PDF/A, the archival subset of PDF technology, provides a means of ensuring the quality and usability of conforming PDF pages (including PDF forms) without any external dependencies. PDF/A offers implementers the confidence of knowing that conforming documents and forms will be readable 10, 20 or 200 years from now.

The problem is that PDF/A doesn’t allow form actions. ISO 19005-1 says, “Interactive form fields shall not perform actions of any type.” You can have a form and you can print it, but without being able to perform the submit-form action, it isn’t useful for digital submissions.

You could have an archival version of the form and a way to convert it to an interactive version, but this seems clumsy. Please let me know if I’ve missed something.

Update: There’s some kind of irony in the fact that the same day that I posted this, I received a print-only PDF form which I’ll now have to take to Staples to fax to the originator.

XKCD on digital preservation

Today’s XKCD comic comments on digital preservation in Randall Munroe’s usual style.
XKCD cartoon on Digital Data
Continue reading

Are uncompressed files better for preservation?

How big a concern is physical degradation of files, aka “bit rot,” to digital preservation? Should archives eschew data compression in order to minimize the effect of lost bits? In most of my experience, no one’s raised that as a major concern, but some contributors to the TI/A initiative consider it important enough to affect their recommendations.
Continue reading

Tim Berners-Lee on “trackable” ebooks

Ebooks of the future, says Tim Berners-Lee, should be permanent, seamless, linked, and trackable. That’s three good ideas and one very bad one.

Speaking at BookExpo America, he offered these as the four attributes of the ebooks of the future. They’ll achieve permanence through encoding in HTML5, which is what EPUB basically is. Any ebook that’s available only in a proprietary format with DRM is doomed to extinction. Pinning hopes on Amazon’s eternal existence and support of its present formats is foolish. Seamlessness, the ability to transition through different platforms and content types, follows from using HTML5. This is reasonable and not very controversial.
Continue reading