Category Archives: commentary

CSS3: Threat or menace?

Lately I’ve been looking at CSS3 animations as a possible solution to a problem I’ve been dealing with. But after thinking about it, I’m getting more concerned: CSS animations? CSS is supposed to be about the layout of a page, not the creation of special effects. I’ve seen pages describing supposedly wonderful effects that can be created with CSS3. Fine, but what if you don’t want them?

JavaScript and Flash product many annoying effects, introduced by designers who effectively are yelling “Hey, look how clever we are!” at you while you’re trying to concentrate on reading. You can turn off JavaScript and Flash and still get readable content, at least with many sites. But turn off CSS and most modern web pages will turn into a messy jumble. CSS3 looks like a narcissistic web designer’s dream: a way to bombard you with special effects that you just can’t escape from.

If you aren’t worried yet, consider this post on how to do Flash-like ads using only CSS3.

Addendum: The CSS3 working draft was recently updated.

Reinventing the stone tablet

It’s a basic premise of the digital preservation community that preservation will require ongoing effort over the years. Let an archive lie neglected for twenty or thirty years, and you might as well throw it away. No one will know how to plug in that piece of hardware. If they do, it’ll have stopped working. If it still works, its files will be in some long-forgotten format.

The trouble is, this is an untenable requirement over the long run. Institutions disappear. Wars happen. Governments are replaced. Budgets get cut. Projects get dropped. Organizational interests change. The contents of an archive may be deemed heretical or politically inconvenient. The expectation that over a period of centuries, institutions will actively preserve any given archive is a shaky one.

Information from past centuries has survived not by active maintenance, but by luck and durability. Much of the oldest information we have was carved into stone walls and tablets. It lay forgotten for centuries, till someone went digging for it. There were issues with the data format, to be sure; people worked for decades to figure out hieroglyphics and cuneiform, and no one’s cracked Linear A yet. But at least we have the data.

Preservation of digital data over a comparable time span requires storage with similar longevity. This is a very difficult problem. If it’s hard to figure out writing from three thousand years ago, how will people three thousand years from now make any sense of a 21st century storage device? But we have advantages. Global communication means that information doesn’t stay hidden in one corner of the world, where it can be wiped out. Today’s major languages aren’t likely to be totally forgotten. As long as enough information is passed down through each generation to allow deciphering of our stone tablets, people in future centuries will be able to extract their information.

What we don’t have is the tablets. Our best digital media are intended to last for decades, not centuries. Archivists should be looking into technologies that can really last, that will be standardized so that the knowledge of how to read them stands a good chance of surviving.

PDF and accessibility

PDF is both better and worse than its reputation for accessibility. That is, it’s worse than most people realize when it’s used with text-to-speech readers, but potentially much better than many visually impaired people suppose from their own experience. The reason for this paradox is that PDF wasn’t designed to present content rather than appearance, but modern versions have features which largely make up for this.

The worst case, of course, is the scanned document. Not only does this mean you’re stuck with OCR for machine reading, but it isn’t searchable. It’s a cheap solution when working from hardcopy originals, but should be avoided if possible.

Normal PDF has a number of problems. There’s no necessary relationship between the order of elements in a file and the expected reading order. If an article is in multiple columns, the text ordering in the document might go back and forth between columns. If an article is “continued on page 46,” it can be hard to find the continuation.

Character encoding is based on the font, so there’s no general way to tell what character a numeric value represents. The same character may have different encodings within the same document. This means that reader software doesn’t know what to do with non-ASCII characters (and even ASCII isn’t guaranteed).

Adobe provided a fix to this problem with a new feature in PDF 1.4, known as Tagged PDF. All except seriously outdated PDF software supports at least 1.4. This doesn’t mean using it is easy, though. Some software, such as Adobe’s InDesign, supports creation of Tagged PDF files, but you have to remember to turn on the feature, and you may need to edit automatically created tags to reflect your document structure accurately. For some things, it can be a pain. I tried fixing up a songbook in InDesign with PDF tags, and realized I’d need to do a lot of work to get it right.

Tagging defines contiguous groups of text and ordering, offering a fix for the problem of multiple columns, sidebars, and footnotes. It allows language identification, so if you have a paragraph of German in the middle of an English text, the reader can switch languages if it supports them. Character codes in Tagged PDF are required to have an unambiguous mapping to Unicode.

These features of Tagged PDF are obviously valuable to preservation as well as to visual access. PDF/A incorporates Tagged PDF.

It shouldn’t be assumed that because a document is in PDF, all problems with visual access are solved. But solutions are possible, with some extra effort.

Some useful links:

HTML5 and video

There’s an entry on the W3C blog about the state of HTML5 video. The most significant point is that “we still don’t have a baseline video codec for HTML5.” Without that, it’s silly to talk about HTML5 as an alternative to Flash or any other kind of video presentation. Microsoft is pushing H.264, and IE9 will support only H.264 under HTML5. Mozilla is going with Ogg Theora. Both codecs have patent issues, limiting the opportunities for third parties to fill in the gap. Both have enthusiastic advocates.

The Browser Wars are back.

Flash “vs.” HTML: the shadowboxing continues

The shadowboxing between Flash and HTML 5 is getting pretty serious. A lot of people are using “HTML 5 video” as a shorthand for “non-Flash video technologies which HTML 5 facilitates,” and Adobe is clearly worried.

An article by Justin Nichols regards HTML 5 and Flash as competitors, and that article is showing a solid five-star rating on feeds.adobe.com, though it isn’t written by an Adobe employee, so it probably expresses a view that’s popular at Adobe. It refers to Flash as a “platform,” and that may be the key point; there’s an unstated suggestion that it can’t just live inside standardized HTML elements. But if it can’t, we’re in for still more rounds of browser incompatibility. Just as “the end of history” when the Soviet empire collapsed was a delusion, the “end of the browser wars” is most likely another.

A New York Times article on the lack of Flash on the iPad is entertaining for its disclaimer at the bottom. The body of the article says:

But concerns over the lack of Flash in the iPad and iPhone may be short-lived. Many online video sites have been experimenting with a new Web language that can support video, called HTML5. Unlike Flash, which is a downloaded piece of software that can interact with a computer’s operating system, HTML5 works directly in a Web browser. And although this new video format does not work in all browsers, it will allow iPhone and iPad users to enjoy more Web-based video content.

Then in a correction it notes that that was wrong:

An article on Monday about the absence of the multimedia software Flash in Apple’s new iPad tablet computer referred incorrectly to the Web language HTML5. While HTML5 can support video, it is not itself a video format. The article also misstated the ownership of HTML5 patents. HTML5, like other versions of Hypertext Markup Language, is open source; it is not owned by a group of companies, including Apple.

Can I hope they learned their error by reading this blog? Probably not. Even the disclaimer isn’t completely right; HTML 5 is a specification, not a program, so it’s meaningless to call it “open source.” Some implementations of it are open source, and others aren’t.

Standardization of the means of embedding video is a good thing. If that has Adobe worried it will face competition, that’s a good thing too.

Flash “vs.” HTML? Not so.

CNET has a rather confused article titled “HTML vs. Flash: Can a turf war be avoided?” This is like asking whether a turf war can be avoided between mixing bowls and batter.

The article says: “Bruce Lawson, Web standards evangelist for browser maker Opera Software, believes HTML and the other technologies inevitably will replace Flash and already collectively are ‘very close’ to reproducing today’s Flash abilities.” Further on: “Perhaps the most visible HTML5 aspect is built-in support for audio and video.”

This is complete nonsense. HTML 5 does not include “built-in support” for video. All that it does is provide a standardized means for browsers to support it. The video and audio tags provide a standardized means of expressing video and audio content, but don’t define any means of interpreting the content. That’s left up to the browser, just as it is with HTML 4 with its lack of standardized media tags. The browser can support MPEG 4, Flash, Ogg, all of them, none of them, or something else entirely.

Perhaps author Stephen Shankland is thinking of a different issue. There are some Web pages whose content is made up entirely of Flash. If you bring them up on a browser where Flash support is lacking or disabled, you generally get a blank page, not even a clue about what’s wrong. This could be considered Flash vs. HTML competition, but it’s an area where Flash has no excuse for being there and deserves to be beaten. The appropriate use of Flash, to present animation and video, is actually better supported by HTML 5 than by earlier versions, and the idea that the technologies compete is meaningless.

Libtiff and search engines

The current version of libtiff, a widely used C library for processing TIFF images, is found on remotesensing.org. The domain libtiff.org used to belong to the people who maintain libtiff but doesn’t any more. The holder of the domain claims to be “Lib Tiff” in Ottawa. It’s not a fraud or malware site, but it has an outdated version of libtiff. I don’t know what the domain holder’s game is, and I’m not sure anyone does. I can’t even see how it’s making money; maybe it has popups which my browser is suppressing?

Anyway, I got curious about how various search engines would do when I searched for “libtiff.” Here’s the rundown:

  • Google puts libtiff.org in first and second place and remotesensing.org in third, and it has numerous subsidiary links inits listing of libtiff.org.
  • Ask.com does the same, minus the subsidiary links. The fourth-place item is the Wikipedia entry, which correctly lists remotesensing.org (Google puts it fifth).
  • Yahoo puts remotesensing.org in first place, and the bogus site doesn’t show up at all on the first page of results.
  • Clusty.com puts remotesensing.org in first and the cheap imitation in second.
  • Dogpile puts the mutt first and the purebred down in eleventh place.
  • Alltheweb.com puts remotesensing.org first and doesn’t show the imitator in the first page of results.

It’s not clear exactly what this proves, except that the big names don’t always do the best.

What ever happened to .SIT?

With the increasing use of ZIP compression on the Macintosh, the Stuffit or .SIT format has fallen into relative obscurity. But not only is it still around, its publishers claim it’s “the ultimate in compression.” Five to ten years ago, lots of computer products were promoted as “the ultimate.” But when the next revision is the new “ultimate,” and so is the one after that, the claim starts to look ridiculous, and most advertisers have dropped it.

Stuffit’s compression is, according to most studies, about as good as competing technologies. It has no claim on being “the ultimate.” Its ad in the MacConnection catalogue says that “Stuffit Deluxe(R) 2009 can compress files up to 98% of their original size.” This is a nicely ambiguous claim; does that mean that the compressed file is reduced by 98%, or that it’s 98% of its original size? The latter isn’t hard to achieve at all, and hardly worth bragging about. But it’s extremely rare that Stuffit, or any other compression, can reduce a file to 2% of its original size. Perhaps a file of all 1’s would get 98% reduction, but that’s seldom useful.

Stuffit once had the advantage of recognizing the two-fork file format of the Macintosh Classic OS. But now that virtually everyone has gone to OS X, which doesn’t use dual file forks, it’s just one more compression format.

The most annoying HTML tag

This past weekend, at a singing gathering, someone was trying to remember the words to “Flow Gently, Sweet Afton.” Trying to help, I did a search on my MacBook and found lots of matches. When I clicked on the first likely-looking one, it started playing the song, to my great embarrassment. This was in spite of the fact that I use NoScript to disable JavaScript, Java, and Flash on unfamiliar sites. (Here’s the offending page; it seems harmless in other respects, but I’ve added rel="nofollow" to the link anyway, so as not to give it any aid with search engines.)

The page uses a non-standard (in HTML 4 and earlier) but widely supported tag called embed. With the parameter autostart=true, this tag will immediately start up a plugin, which could be a sound or audio file or anything else, depending on what plugins are installed with your browser. The only way to prevent this with NoScript is to disable plugins across the board.

In HTML 5, the embed tag gains official status but there’s a standard way to disable the functionality:

When the sandboxed plugins browsing context flag is set on the browsing context for which the embed element’s document is the active document, then the user agent must render the embed element in a manner that conveys that the plugin was disabled. The user agent may offer the user the option to override the sandbox and instantiate the plugin anyway; if the user invokes such an option, the user agent must act as if the sandboxed plugins browsing context flag was not set for the purposes of this element.

A sandbox can be set for a frame, window, or tab. For a frame, it can be specified in the HTML, letting a page incorporate not fully trusted HTML from another site. The window or tab sandbox settings are evidently intended to be controlled by user preferences.

There’s no longer an autostart parameter. I think this means that the behavior is whatever the plugin creator wants; it could start up immediately or could provide a user interface with start, stop, and pause controls.

If future browsers let users control the plugin sandbox through preferences, that will mean one less way that web page authors can get around the user’s desire not to be annoyed.