PDF or HTML for public documents?

Should official online documents be PDF files? Many institutions say they obviously should, but the format has some clear disadvantages. An article on the UK’s Government Digital Service site argues that HTML, not PDF, is the right format for UK government documents. Its arguments, to the extent that they’re valid, apply to lots of other documents.

It makes a plausible case against PDF. The trouble is that the case against HTML is even stronger in some ways.
Continue reading

USD and USDZ format for 3D models

Pixar’s USD format allows representation of dynamic 3D scenes. It lets designers create large numbers of objects that fit together into a scene. People on a team can work independently of each other, each designing certain parts. The project is on GitHub.

USD’s design solves the problem of not having to work with one monolithic file (as Pixar did for Toy Story), but sometimes a monolithic file is useful. At WWDC 2018, Apple and Pixar announced a new wrapper for USD, called USDZ. It’s a Zip archive with some special rules. iOS 12 will support it.
Continue reading

The “Zip Slip” vulnerability

Sometimes my reaction to a story is “Wait, are they saying someone was that dumb? … No one could be that dumb! … Oh, gods, they were that dumb!” Naked Security’s account of the Zip Slip vulnerability is just such a story.

The article starts with a fair warning that the vulnerability is “so simple you’ll need to put a cushion on your desk before you read any further (in case of involuntary headdesk injury).” It explains that because of the coding mistake called “Zip Slip,” “attackers can create Zip archives that use path traversal to overwrite important files on affected systems, either destroying them or replacing them with malicious alternatives.” This is where I started to suspect.

The vulnerability isn’t in the Zip format as such, but in bad coding found in some of the zillion ad hoc pieces of software written to unpack Zip files. Have you figured it out yet? I’ll put the cut here to give you a chance to think…
Continue reading

Files that Last — 50% off!

Files that Last coverIt’s been too long since I’ve had a special discount on FTL. For all of June, you can get Files that Last: Digital Preservation for Everygeek on Smashwords for just $4.00. That’s half off the regular price! The coupon code is KC49Z.

FTL is aimed at anyone with a moderate level of technical knowledge who’s concerned with keeping files from becoming useless over the years. It covers formats, metadata, media, file systems, and more.

The book is 100% DRM-free on Smashwords. I’ve done my best to keep it that way when it’s sold through other platforms but can’t always guarantee it.

Flash in the Library of Congress’s online archives

Everybody recognizes that Adobe Flash is on the way out. It takes effort to convert existing websites, though, and some sites aren’t maintained, so it won’t disappear from the Web in the next few decades.

When it’s minor or abandoned sites, it doesn’t matter so much, but even the Library of Congress has the issue. Its National Jukebox currently requires a browser with Flash enabled to be useful. Turning on Flash for reliable sites such as the Library of Congress should be safe, at least as long as those sites don’t include third-party ads from dubious sources. Not everyone has that option, though. If you’re using iOS, you’re stuck.

I came across the National Jukebox while doing research for my book project Yesterday’s Songs Transformed, and it’s frustrating that I can’t currently use it without taking steps which I’d rather avoid. The good news is that this is a temporary situation and work is already underway to eliminate the Flash dependency. David Sager of the National Jukebox Team replied to my email inquiry:
Continue reading

“Mad” file formats

There really are some file formats that have “mad” in their name or extension. Some people may find this blog when trying to learn about them, so I suppose I should provide some information about them.
Continue reading

The Joy Reid case and the fragility of archives

The exposure of old, embarrassing posts by MSNBC columnist Joy Ann Reid has provoked a lot of heated discussion. It’s also revealed the difficulty of retaining reliable information about old material on the Web.

When these old posts came to public attention through Twitter, she asserted that there had been one or more unauthorized break-ins altering her articles to add offensive content.

In December I learned that an unknown, external party accessed and manipulated material from my now-defunct blog, The Reid Report, to include offensive and hateful references that are fabricated and run counter to my personal beliefs and ideology.

I began working with a cyber-security expert who first identified the unauthorized activity, and we notified federal law enforcement officials of the breach. The manipulated material seems to be part of an effort to taint my character with false information by distorting a blog that ended a decade ago.

Now that the site has been compromised I can state unequivocally that it does not represent the original entries.

The “altered” material, however, also was found on the Internet Archive’s Wayback Machine with the same content. If Reid’s statement is true, the alterations must have taken place shortly after their publication and yet not been noticed, or else the Internet Archive must also have been compromised.

Continue reading

PDF in three dimensions

There are two ways to put 3D models into a PDF file. Neither of them is an extension of the two-dimensional PDF model. Rather, they’re technologies which were developed independently, which can be wrapped into a PDF, and which software such as Adobe Acrobat can work with.

PDF has become a container format as much as a representational format. It can hold anything, and some of the things it holds have more or less official status, but there are no common architectural principles. The two formats used with PDF are U3D and PRC. Both are actually independent file formats which a PDF can embed.
Continue reading

Apple hides attachments in malformed multipart mail

Recently I got a PDF of a filk songbook which I had contributed to. More precisely, the email said I was getting it, but there was no sign of an attachment. I wrote back to the editor who’d sent it, and she insisted it was there. Digging it out of the message revealed to me a whole new way of messing up email formats.

A quick look at the message source showed that there really was an attachment with Content-Type of “application/pdf” which took up well over 90% of the message. The question was why Thunderbird didn’t show it to me.
Continue reading