“Mad” file formats

There really are some file formats that have “mad” in their name or extension. Some people may find this blog when trying to learn about them, so I suppose I should provide some information about them.
Continue reading

The Joy Reid case and the fragility of archives

The exposure of old, embarrassing posts by MSNBC columnist Joy Ann Reid has provoked a lot of heated discussion. It’s also revealed the difficulty of retaining reliable information about old material on the Web.

When these old posts came to public attention through Twitter, she asserted that there had been one or more unauthorized break-ins altering her articles to add offensive content.

In December I learned that an unknown, external party accessed and manipulated material from my now-defunct blog, The Reid Report, to include offensive and hateful references that are fabricated and run counter to my personal beliefs and ideology.

I began working with a cyber-security expert who first identified the unauthorized activity, and we notified federal law enforcement officials of the breach. The manipulated material seems to be part of an effort to taint my character with false information by distorting a blog that ended a decade ago.

Now that the site has been compromised I can state unequivocally that it does not represent the original entries.

The “altered” material, however, also was found on the Internet Archive’s Wayback Machine with the same content. If Reid’s statement is true, the alterations must have taken place shortly after their publication and yet not been noticed, or else the Internet Archive must also have been compromised.

Continue reading

PDF in three dimensions

There are two ways to put 3D models into a PDF file. Neither of them is an extension of the two-dimensional PDF model. Rather, they’re technologies which were developed independently, which can be wrapped into a PDF, and which software such as Adobe Acrobat can work with.

PDF has become a container format as much as a representational format. It can hold anything, and some of the things it holds have more or less official status, but there are no common architectural principles. The two formats used with PDF are U3D and PRC. Both are actually independent file formats which a PDF can embed.
Continue reading

Apple hides attachments in malformed multipart mail

Recently I got a PDF of a filk songbook which I had contributed to. More precisely, the email said I was getting it, but there was no sign of an attachment. I wrote back to the editor who’d sent it, and she insisted it was there. Digging it out of the message revealed to me a whole new way of messing up email formats.

A quick look at the message source showed that there really was an attachment with Content-Type of “application/pdf” which took up well over 90% of the message. The question was why Thunderbird didn’t show it to me.
Continue reading

PDF/L?

Here’s a question for the gallery: Have any of you heard of PDF/L, and do you know what it is?
Continue reading

Google Docs: Not a File Format

What’s the format of a Google Docs file? The question may not even be meaningful. According to Jenny Mitcham at the University of York, there is no such thing as a Google Docs file. What you see when you open a document is an assembly of information from a database. You can export it in various file formats, but the exported file isn’t identical to the Google document.

This makes them risky from a preservation standpoint. You can’t save a local backup of a document. If you lose your Google account, or if censorship in your country cuts you off from it, you lose all your documents.
Continue reading

Preserving and losing tax records

When you offer expert advice on something, such as digital preservation, you have to admit your own errors. I very nearly lost my 2016 tax return. When I tried to open it in TurboTax, the application just did nothing. I hadn’t exported it to a generally usable format. The TurboTax file format is proprietary and opaque.
Continue reading

File corruption and political corruption

When people who don’t understand file formats manipulate files in order to cover their tracks, they generally fail miserably. Slate magazine gives an entertaining case in point from the Trump scandals. The article says:

There are two types of people in this world: those who know how to convert PDFs into Word documents and those who are indicted for money laundering. Former Trump campaign chairman Paul Manafort is the second kind of person.

The PDF Association chimes in with additional technical details.
Continue reading

The future of TIFF

Is TIFF a legacy format?

The most recent version of the TIFF specification, 6.0, dates from 1992. Adobe updated it with three technical notes, the latest coming out in 2002. Since then there has been nothing.

The format is solid, but the past quarter-century has seen reasons to enhance it. BigTIFF is a variant of the format to accommodate larger files. It isn’t backward-compatible with TIFF, but the changes mostly concern data lengths and are easy to add to a TIFF interpreter. The format sits in a kind of limbo, since Adobe owns the spec but is no longer updating it. There have been new tags which have achieved consensus acceptance but don’t have official status. AWare Systems has a list of known tags but has no reliable way to say which ones are private and which are generally accepted. There’s no way to add a new compression or encryption algorithm, or any other new feature, and give it official status.
Continue reading