So is anyone reading this?

Since I moved over from Blogspot, I’ve received 11 comments, or rather WordPress has marked 11 comments as spam and deleted them, usually before I could look at them. Other than a couple of pieces of email, I’ve been getting no feedback. The readership graph shows blips when I post something new, but I don’t know whether that’s actual people or not.

So I’d really like to hear from anyone who’s still reading this. Did I lose everyone with the host switch? is occasional news on file formats and JHOVE just too boring to read? Are legitimate comments disappearing down the maw of the antispambot? Or is everything I say so self-evidently true and complete that nothing more needs to be said? I rather doubt the last.

Microsoft to open up Outlook format

A report on CNET says that Microsoft will be publicly documenting the formats of .pst files used by Outlook. Microsoft’s Paul Lorimer is quoted as saying the format specification will be available “under our Open Specification Promise, which will allow anyone to implement the .pst file format on any platform and in any tool, without concerns about patents, and without the need to contact Microsoft in any way.” No timetable is given.

What ever happened to .SIT?

With the increasing use of ZIP compression on the Macintosh, the Stuffit or .SIT format has fallen into relative obscurity. But not only is it still around, its publishers claim it’s “the ultimate in compression.” Five to ten years ago, lots of computer products were promoted as “the ultimate.” But when the next revision is the new “ultimate,” and so is the one after that, the claim starts to look ridiculous, and most advertisers have dropped it.

Stuffit’s compression is, according to most studies, about as good as competing technologies. It has no claim on being “the ultimate.” Its ad in the MacConnection catalogue says that “Stuffit Deluxe(R) 2009 can compress files up to 98% of their original size.” This is a nicely ambiguous claim; does that mean that the compressed file is reduced by 98%, or that it’s 98% of its original size? The latter isn’t hard to achieve at all, and hardly worth bragging about. But it’s extremely rare that Stuffit, or any other compression, can reduce a file to 2% of its original size. Perhaps a file of all 1’s would get 98% reduction, but that’s seldom useful.

Stuffit once had the advantage of recognizing the two-fork file format of the Macintosh Classic OS. But now that virtually everyone has gone to OS X, which doesn’t use dual file forks, it’s just one more compression format.

Unicode 5.2.0

Unicode 5.2.0 is now out. It adds 6,648 new characters but still doesn’t officially include Klingon.

JHOVE2 at iPres

Unfortunately, I wasn’t in California for the post-iPres workshop on JHOVE2, but there is some information online. The JHOVE2 project presentations page includes a short and a long version of the slides. An early version of the code has been made available for testing and progress continues.

P2 registry

I’ve just come across yet another file format registry: the P2 Registry at the University of Southampton in the UK. It’s identified as a beta and was pretty slow when I tried it, but it has some interesting features, including risk assessments of formats. PRONOM and other data sources are used. There is a short PDF article on the aims of P2, which tells us that “the key feature of the registry is the ability to import arbitrary ontologies that can be used both to infer new facts from existing information as well as to align (in the case where two concepts are similar or the same in nature) information already in the registry.”

Its web user interface is minimal at the moment, but it’s worth keeping an eye on this.

The most annoying HTML tag

This past weekend, at a singing gathering, someone was trying to remember the words to “Flow Gently, Sweet Afton.” Trying to help, I did a search on my MacBook and found lots of matches. When I clicked on the first likely-looking one, it started playing the song, to my great embarrassment. This was in spite of the fact that I use NoScript to disable JavaScript, Java, and Flash on unfamiliar sites. (Here’s the offending page; it seems harmless in other respects, but I’ve added rel="nofollow" to the link anyway, so as not to give it any aid with search engines.)

The page uses a non-standard (in HTML 4 and earlier) but widely supported tag called embed. With the parameter autostart=true, this tag will immediately start up a plugin, which could be a sound or audio file or anything else, depending on what plugins are installed with your browser. The only way to prevent this with NoScript is to disable plugins across the board.

In HTML 5, the embed tag gains official status but there’s a standard way to disable the functionality:

When the sandboxed plugins browsing context flag is set on the browsing context for which the embed element’s document is the active document, then the user agent must render the embed element in a manner that conveys that the plugin was disabled. The user agent may offer the user the option to override the sandbox and instantiate the plugin anyway; if the user invokes such an option, the user agent must act as if the sandboxed plugins browsing context flag was not set for the purposes of this element.

A sandbox can be set for a frame, window, or tab. For a frame, it can be specified in the HTML, letting a page incorporate not fully trusted HTML from another site. The window or tab sandbox settings are evidently intended to be controlled by user preferences.

There’s no longer an autostart parameter. I think this means that the behavior is whatever the plugin creator wants; it could start up immediately or could provide a user interface with start, stop, and pause controls.

If future browsers let users control the plugin sandbox through preferences, that will mean one less way that web page authors can get around the user’s desire not to be annoyed.

Planets digital preservation conference

The Planets project will host a three-day training event on digital preservation in Bern, Switzerland, on November 17-19, 2009. According to the announcement: “Day 1 will consider the case for preserving digital objects, the technical issues involved, and the Planets framework, tools and services. On days 2 and 3 delegates will gain hands-on experience of working with Planets and a scenario (sample collection) to develop a preservation plan and preserve digital objects.”

Day 1 is recommended for “Heads of IT, Curation and Preservation, CEOs and preservation/curation/IT staff.” Days 2-3 are recommended for “digital preservation staff (e.g. librarians, archivists, digital librarians and archivists, repository managers, software developers, policy managers etc.).”

Attendance is limited.

HTML 5 updated

There’s a new working draft of HTML 5 available from W3C. It still has the same warning as in April: “Implementors should be aware that this specification is not stable. Implementors who are not taking part in the discussions are likely to find the specification changing out from under them in incompatible ways.

But lots of sections have been marked “Last call for comments,” so perhaps it really is closing in on a stable version. Or perhaps not. The most widely debated issue is video codecs, and I get the impression there’s been little progress on them. The situation is, in principle, similar to the <IMG> tag, where browsers explicitly aren’t required to support any particular image format; but it would be a poor (or text-only) browser that didn’t support JPEG and GIF, at least. With video there isn’t even that much agreement. Granted, the situation is just as bad now, but HTML 4 doesn’t even address the issue, so it isn’t held back by format disputes.

I’m looking at the HTML 5 wars from a rather uninformed distance, so don’t expect expert analysis here, just impatience with how slowly things are going. According to the WHATWG Wiki, it may reach Candidate Recommendation stage in 2012. The fact that the HTML working group now has three co-chairs just strikes me as a bad sign.

Twiddling settings

I’ve made a few changes to the settings. As of now, anyone can comment, but all comments have to be approved by me. (WordPress doesn’t allow automatic approval of comments by registered users together with moderation of other comments, as far as I can tell.) Also, the feed now gives full articles instead of summaries.