Tag Archives: standards

The strange state of “open” format documentation

You can legally download many specs from the ISO site, including the Open Document Format (ODF) specs. ISO lets you print out a copy. However, if you photocopy or scan it, or if you make it available on your organization’s LAN, the Copyright Police will haul you away.

I’ve seen similar restrictions elsewhere. They’re variations on the idea that you can download a document for free, but you can’t share it after you download it. It’s bizarre.

Maybe they’re trying to keep people from going into competition by selling copies of their standards. Since ISO also sells what it publishes, the goal would make sense. In fact, there’s a specific and emphatic prohibition on sales. But why they should care whether copies are printed or photocopied is beyond me.

Usually the answer to questions like these is “lawyers who are disconnected from reality.” If there’s a better answer, I’d love to hear it.

Newspeak, emoji style

In Orwell’s 1984, the Newspeak language followed the principle that if you can abolish certain words, you can abolish the thoughts that go with them.

It was intended that when Newspeak had been adopted once and for all and Oldspeak forgotten, a heretical thought — that is, a thought diverging from the principles of Ingsoc — should be literally unthinkable, at least so far as thought is dependent on words. … This was done partly by the invention of new words, but chiefly by eliminating undesirable words and by stripping such words as remained of unorthodox meanings, and so far as possible of all secondary meanings whatever.

Apple is doing something like this with Unicode codepoint U+1F52B (🔫), which the code chart defines as PISTOL, with the explanatory text of “handgun, revolver.” There’s nothing that suggests it’s supposed to represent a water gun or any other kind of toy. However, Apple has elected to represent this character as a water pistol in iOS 10.
Continue reading

Work on TI/A quietly continues

The work on the TI/A project, to define an archive-friendly version of TIFF analogous to PDF/A, is still going, even though hardly any of it is publicly visible. Marisa Pfister’s leaving the project, along with her position at the University of Basel, was unfortunate, but others are continuing a detailed analysis of TIFF files used at various archives. This will help them to learn what features and tags are used.

The target of March 1, 2016, for a submission to ISO has been crossed out, and nothing has replaced it, but we can still hope it will happen.

The persistence of old formats

Technologies develop to a point where they’re good enough for widespread use. Once a lot of people have adopted them, it’s hard to move on from there to a still better one, since people have invested so much in a technology that works for them. We see this with cell phone communication, which is pretty good but would undoubtedly be much better if it could be invented all over today. We see it with the DVD format, which Blu-Ray hasn’t managed to push aside in spite of huge marketing efforts. And we see it in file formats.

Most of today’s highly popular formats have been around since the nineties. For images, we still have TIFF, JPEG, PNG, and even the primitive GIF format, which goes back to the eighties. In audio, MP3 still dominates, even though there are now much better alternatives.

This is a good thing in many ways. If new, improved formats displaced old ones every five years, we’d be constantly investing in new software, and anyone who didn’t upgrade would be unable to read a lot of new files. Digital preservation would be a big headache, as archivists would need to migrate files repeatedly to avoid obsolescence.

It does mean, though, that we’re working with formats that have deficiencies which often have grown in importance. JPEG compression isn’t nearly as good as what modern techniques can manage. MP3 is encumbered with patents and offers sound quality that’s inferior to other lossy audio formats. HTML has improved through major revisions, but it’s still a mess to validate. For that matter, we have formats like “English,” which lacks any spec and is a pile of kludges that have accumulated over centuries. Try finding support for supposed improvements such as Esperanto anywhere.

It’s a situation we just have to live with. The good enough hangs on, and the better has a hard time getting acceptance. Considering how unstable the world of data would be if this weren’t the case, it’s a good thing on the whole.

The state of PDF 2.0

The next big jump in PDF may finally happen this year. The PDF association tells us that the spec for PDF 2.0 is “feature-complete” and will be available to the ISO PDF committee and members of the PDF Association in July. When this will turn into a public release still isn’t clear. A year ago the target was “mid-2016”; that seems unlikely now.

The specification will be ISO 32000-2. The current version of PDF, 1.7, is ISO 32000-1. More precisely, Adobe has published several extension levels to PDF 1.7. They’re a way of getting around having a version 1.8, which would be an admission that the ISO standard is outdated. Version 2.0 will get Adobe and ISO back in sync. Hopefully Adobe will publish the PDF spec for free, as it has in the past, so that it won’t be available just to people who pay for the ISO version. Currently an electronic copy of ISO 32000-1 costs 198 Swiss francs, or a bit more than $200.
Continue reading

PDF/A and forms

The PDF Association reminds us that we can use PDF forms for electronic submissions. It’s a useful feature, and I’ve filled out PDF forms now and then. However, one point seems wrong to me:

PDF/A, the archival subset of PDF technology, provides a means of ensuring the quality and usability of conforming PDF pages (including PDF forms) without any external dependencies. PDF/A offers implementers the confidence of knowing that conforming documents and forms will be readable 10, 20 or 200 years from now.

The problem is that PDF/A doesn’t allow form actions. ISO 19005-1 says, “Interactive form fields shall not perform actions of any type.” You can have a form and you can print it, but without being able to perform the submit-form action, it isn’t useful for digital submissions.

You could have an archival version of the form and a way to convert it to an interactive version, but this seems clumsy. Please let me know if I’ve missed something.

Update: There’s some kind of irony in the fact that the same day that I posted this, I received a print-only PDF form which I’ll now have to take to Staples to fax to the originator.

Tim Berners-Lee on “trackable” ebooks

Ebooks of the future, says Tim Berners-Lee, should be permanent, seamless, linked, and trackable. That’s three good ideas and one very bad one.

Speaking at BookExpo America, he offered these as the four attributes of the ebooks of the future. They’ll achieve permanence through encoding in HTML5, which is what EPUB basically is. Any ebook that’s available only in a proprietary format with DRM is doomed to extinction. Pinning hopes on Amazon’s eternal existence and support of its present formats is foolish. Seamlessness, the ability to transition through different platforms and content types, follows from using HTML5. This is reasonable and not very controversial.
Continue reading

More what you’d call guidelines than actual rules

Do pirate sites have rules? Apparently so, according to Beta News. It tells us that sites like Pirate Bay have “fairly strict rules dictating capturing, formatting and naming releases” and “astoundingly lengthy standards documents covering standard and high definition releases of TV shows.” These rules “mandate” a switch from MP4 to the open Matroska (MKV) format as of April 10, so they’re stricter than the Pirates of the Caribbean.

I have no love for pirate sites. They play up their reputation for making stuff from big, evil, litigious companies available, but they’ll grab anything they can get their hands on, including music by small, independent artists who are having a hard enough time making a living. A couple of sites have even grabbed my filk recordings, which have no market beyond a couple of hundred people. But I’m amused that pirates have their own strict rules, and a move anywhere toward open formats can’t be a bad thing.

HTML5 and DRM

logo, 'DRM' with XIf anything causes more controversy than DRM (digital rights management), it’s joining DRM with an open standard. The World Wide Web Consortium’s Encrypted Media Extensions Working Draft is generating controversy in plenty.

Cory Doctorow has declared: “The World Wide Web Consortium’s decision to make DRM part of HTML5 doesn’t just endanger security researchers, it also endangers the next version of all the video products and services we rely on today: from cable TV to iTunes to Netflix.”
Continue reading

3D PDF and PDF/E

It must be a surprise to most people, but you can represent three-dimensional objects in PDF, in spite of its strictly 2-dimensional imaging model. It turns out there are two ways to do it, with the older U3D and the more modern PRC. What makes them possible is PDF’s annotation feature, which allows capabilities to be added to PDF, and the Acrobat 3D API. Full support of these features requires implementation of at least PDF 1.7 Extension Level 1, or to put it in application terms, Acrobat 8.1.

The PDF/E standard for engineering documents, aka ISO 24517, includes U3D but not PRC. A PDF/E-2 standard is currently in development and is expected to include PRC. PDF/E, like the other slashes of PDF, is a subset of the PDF standard (version 1.6), so obviously it’s possible to do 3D work without reference to it. It’s intended for cases where long-term retention or archiving is important. This suggests some affinity with PDF/A, which is specifically aimed at archive-quality documents, and the PDF Association, which is heavily involved in PDF/A, has recently started a PDF/E Competence Center. Oddly, the competence center says that PDF/E-1 “does not address 3D,” though other sources say PDF/E does reference U3D. Perhaps this is a matter of what really constitutes “addressing” 3D as opposed to just acknowledging it.