Tag Archives: PDF

Technical issues with the Hunter Biden email

The PDF Association has an analysis of the file which the New York Post has uploaded to Scribd, which purports to show a message from Vadim Pozharskyi to Hunter Biden and Devon Archer. Discussions of what it signifies politically and whether Twitter was justified in blocking the link are for another place. The issue in this blog is what the file says about the authenticity of the email. The answer is: Nothing at all.

Continue reading

“Shadow attack” allows alteration of signed PDF files

The more complex a format is, the less chance there is that its security features will work in all cases. A vulnerability has turned up that lets sneaky people alter digitally signed PDF documents. A German team discovered a “shadow attack” vulnerability in the format. It’s easiest to do this if the document’s creator designed it to be altered after signing. The victim sees one set of content and signs it; the dishonest creator gets the document back, changes its appearance, and passes it on.
Continue reading

PDF/A-4

It looks as if I’ll have a little input into the upcoming PDF/A-4 standardization process; earlier this month I got an email from the 3D PDF Consortium inviting me to participate, and I responded affirmatively. While waiting for whatever happens next, I should figure out what PDF/A-4 is all about.

ISO has a placeholder for it, where it’s also called “PDF/A-NEXT.” There’s some substantive information on PDFlib. What’s interesting right at the start is that it will build on PDF/A-2, not PDF/A-3. A lot of people in the library and archiving communities thought A-3 jumped the shark when it allowed any kind of attachments without limitation. It’s impossible to establish a document’s archival suitability if it has opaque content.
Continue reading

Why does one PDF display and another one download?

Sometimes when you click on a link to a PDF, it comes up in the browser. Other times, the browser downloads the file. Everyone must wonder why, but few have wondered enough to find out. Here’s a quick explanation.

It has nothing to do with the PDF version, the content of the file, or the link. It’s the HTTP headers that make the difference. Specifically, a header called “Content-Disposition” is the determining factor. If it’s absent, the file will open in the browser. If it’s present, the value it specifies determines how you get the file.
Continue reading

When sloppy redaction fails, resort to censorship

The Broward County School Board used an idiot’s form of “redaction” on a PDF before sending it to the media. The Sun Sentinel removed the blackout layer from the file and found newsworthy information on shooter Nikolas Cruz. They published it. Judge Elizabeth Scherer flew into a rage. She decreed, “From now on if I have to specifically write word for word exactly what you are and are not permitted to print… then I’ll do that.”

That’s called prior restraint, or censorship.
Continue reading

PDF or HTML for public documents?

Should official online documents be PDF files? Many institutions say they obviously should, but the format has some clear disadvantages. An article on the UK’s Government Digital Service site argues that HTML, not PDF, is the right format for UK government documents. Its arguments, to the extent that they’re valid, apply to lots of other documents.

It makes a plausible case against PDF. The trouble is that the case against HTML is even stronger in some ways.
Continue reading

PDF in three dimensions

There are two ways to put 3D models into a PDF file. Neither of them is an extension of the two-dimensional PDF model. Rather, they’re technologies which were developed independently, which can be wrapped into a PDF, and which software such as Adobe Acrobat can work with.

PDF has become a container format as much as a representational format. It can hold anything, and some of the things it holds have more or less official status, but there are no common architectural principles. The two formats used with PDF are U3D and PRC. Both are actually independent file formats which a PDF can embed.
Continue reading

PDF/L?

Here’s a question for the gallery: Have any of you heard of PDF/L, and do you know what it is?
Continue reading

File corruption and political corruption

When people who don’t understand file formats manipulate files in order to cover their tracks, they generally fail miserably. Slate magazine gives an entertaining case in point from the Trump scandals. The article says:

There are two types of people in this world: those who know how to convert PDFs into Word documents and those who are indicted for money laundering. Former Trump campaign chairman Paul Manafort is the second kind of person.

The PDF Association chimes in with additional technical details.
Continue reading

JHOVE webinar

An Open Preservation Foundation webinar, “Putting JHOVE to the acid test: A PDF test-set for well-formedness validation in JHOVE,” will be held on November 21, 10 AM GMT (that’s 11 AM in Central Europe and a ludicrous 5 AM or earlier in the US).
Continue reading