Category Archives: commentary

PDF/A-4

It looks as if I’ll have a little input into the upcoming PDF/A-4 standardization process; earlier this month I got an email from the 3D PDF Consortium inviting me to participate, and I responded affirmatively. While waiting for whatever happens next, I should figure out what PDF/A-4 is all about.

ISO has a placeholder for it, where it’s also called “PDF/A-NEXT.” There’s some substantive information on PDFlib. What’s interesting right at the start is that it will build on PDF/A-2, not PDF/A-3. A lot of people in the library and archiving communities thought A-3 jumped the shark when it allowed any kind of attachments without limitation. It’s impossible to establish a document’s archival suitability if it has opaque content.
Continue reading

Path traversal bugs in archive formats

Malware has shown up which takes advantage of a path traversal bug in the WinRAR archiving utility. The bug, which reportedly existed for 19 years, is fixed in the latest version. The problem stems from an old, buggy DLL which WinRAR used. It allowed the expansion of an archive with a file that would be extracted to an absolute path rather than the destination folder. In this case, the path was the system startup folder. The next time the computer was rebooted, it would run the malware file.
Continue reading

What part of “No Flash” doesn’t Microsoft understand?

If you disable Flash on Microsoft Edge, Microsoft ignores your setting — but only for Facebook’s domains. It sounds too conspiratorial to be true, but a number of generally reliable websites confirm it.

Bleeping Computer: “Microsoft’s Edge web browser comes with a hidden whitelist file designed to allow Facebook to circumvent the built-in click-to-play security policy to autorun Flash content without having to ask for user consent.”

ZDNet: “Microsoft’s Edge browser contains a secret whitelist that lets Facebook run Adobe Flash code behind users’ backs. The whitelist allows Facebook Flash content to bypass Edge security features such as the click-to-play policy that normally prevents websites from running Flash code without user approval beforehand.”
Continue reading

The police body camera data problem

The Washington Post reports that some police departments are dropping body camera programs because of the expense. I’ll admit that my first gut reaction on seeing the story was that it’s just an excuse. In some cases it probably is. But it’s a fact that while the cameras are cheap, storing and managing large amounts of video data isn’t. The question needs objective examination.
Continue reading

Canvas fingerprinting in Web pages

The array of sneaky tricks to get past Internet users’ veil of privacy is astonishing. At least it would be, if we weren’t all past the capacity for astonishment. One which has been around for years is Canvas fingerprinting. It lets servers narrow your profile down to a small number of clients. Combined with other measures, it can uniquely identify you.

How Canvas works

Canvas wasn’t designed to spy on you. It’s a way to draw graphics very efficiently in a browser. It supports animation and interaction. In order to get fast performance, it allows hardware acceleration and doesn’t mandate the exact set of pixels to be drawn. The server can then get those pixels back using getImageData() or toDataURL() in the Canvas API.
Continue reading

FUIF: Yet another image format?

A tweet led me to a pair of articles about a new file format called FUIF. That stands for “Free Universal Image Format.” Jon Sneyers describes it in a series of articles which so far include a Part 1 and Part 2.

It’s “responsive by design”; a single image file can be truncated at various offsets to produce different resolutions. Sneyers says FUIF meets JPEG’s criteria for a new format that provides “efficient coding of images with text and graphics” and “very low file size image coding.”
Continue reading

The great GIF pronunciation debate

Of all the issues in file formats, the pronunciation of “GIF” is surely close to the bottom in importance. When an issue is that minor, you can be sure everyone has strong opinions on it and will defend them on the barricades. It’s like the way political movements work: the closer together they are in their beliefs, the more ferociously they’ll vilify each other over little differences.

Personally, I always pronounce it my mind with a hard “G,” as in “give” rather than “giraffe.” I’m glad to see some support for this view in “A Linguist’s Guide to Pronouncing ‘GIF’.” One of its arguments matches the main reason in my mind: the “G” stands for “graphics,” which is pronounced with a hard “G.”

Case closed. Now can we agree that “PNG” is pronounced “Pee-Enn-Gee,” and not “Ping”?

Why does one PDF display and another one download?

Sometimes when you click on a link to a PDF, it comes up in the browser. Other times, the browser downloads the file. Everyone must wonder why, but few have wondered enough to find out. Here’s a quick explanation.

It has nothing to do with the PDF version, the content of the file, or the link. It’s the HTTP headers that make the difference. Specifically, a header called “Content-Disposition” is the determining factor. If it’s absent, the file will open in the browser. If it’s present, the value it specifies determines how you get the file.
Continue reading

The digital preservation song challenge!

Should there be songs about digital preservation? This is just a special case of the question, “Should there be songs about X?” For nearly all X, the answer is “Yes, and there probably are!” (Even — perhaps especially — if there shouldn’t be, there are.)

Someone in the Australiasian preservation community asked if AusPreserves needed a theme song. The first responses were existing popular songs, but then people started getting more creative. This led to the Digital Preservation Song Challenge!

One response was the Beyonce parody, “All the Corrupt Files” (“Put a checksum on it”). I think it’s the first song ever to mention JHOVE!

Naturally, I already have my own song on digital preservation, called Files that Last. I wrote it to promote my book of the same title, but it stands (or falls) by itself.

If it’s worth doing, it’s worth singing about, and that certainly applies to digital preservation!

Fact-checking the GIF format

The Politifact article on the White House’s video “evidence” against reporter Jim Acosta looked plausible enough to me, until I got to the explanation of GIF files. It got significant points wrong, following common misunderstandings.

The regular readers of this blog mostly know what GIF really is, but this article may be a useful reference if you need to explain to anyone. The Politifact article says:
Continue reading