iPRES 2012

iPRES 2012 now has real information on its website.

JHOVE 1.7, finally!

After well over a year, a new version of JHOVE is finally available. Really, not very much has changed since 1.6 as far as the software itself goes. However, I’m leaving Harvard at the end of August and asked for and got custody of JHOVE, so this version marks its transition from a Harvard-supported project (which, in practice, it hasn’t been for a long time) to a separate open-source project. The JHOVE web pages are now hosted on SourceForge, and all support and discussion will go through SourceForge. The jhove-support and jhove-users mailing lists hosted by Harvard will shut down in the near future.

This doesn’t mean JHOVE is dead. I may actually have more opportunities to work on it than before, now that I’m going into independent consulting. I need to stay visible to the library and preservation world, and this is one way to do it.

Meanwhile, I’m looking for contract opportunities. Please take a look at my new business site or my LinkedIn profile.

JHOVE web pages moved

The web pages for JHOVE are now on SourceForge. They’ll remain on the Harvard site for some period of time but won’t be further updated.

There’s at least a chance this means there will be a release of JHOVE soon. Yes, I know, I’ve been promising that for a long time.

LinkedIn

Personal note: I’m now on LinkedIn and looking for contract software development work starting in September.

The two faces of HTML5

The question “What is HTML5?” has gotten more complicated. While W3C continues work on a full specification of HTML5, the Web Hypertext Application Technology Working Group (WHATWG) is pursuing a “living standard” approach that is frequently updated. Both groups are reassuring us that this doesn’t constitute a rift, but certainly it will make things tricky when resolving the fine points of the standard(s). Ian Hickson has gone into some detail on the W3C site about the relationship between the WHATWG HTML living standard and the W3C HTML5 specification.

The WHATWG “HTML Living Standard” site significantly has no version number.

Considering that HTML5 is already widely implemented even though it won’t be finalized till the year after next, it’s unlikely this will add any further confusion. By the time it becomes a W3C Recommendation, many implementers will doubtless have moved beyond it to new features.

The horrible state of Java image processing

A while back I posted on the painfully poor choices in creating thumbnails of JPEG2000 files. Since then I’ve come to realize that support for image file processing in Java is even worse than I’d realized. Now I’m trying to make thumbnails from TIFF files. At first I went with JAI, even though it hasn’t been supported for five years and relies on implementation-dependent classes. I’d done this before successfully, but now I’m trying to do it in an EJB under JBoss. This runs into a NoClassDefFoundError trying to get com.sun.image.codec.jpeg.JPEGCodec. A web search suggests there’s some obscure trick necessary to access com.sun.image, but I couldn’t figure it out. It occurred to me that for what I’m doing, javax.imageio should be sufficient to do the job. It can read an image file, standard Java classes can scale the BufferedImage it produces, and then it can write the scaled image to a file.

Only one trouble: javax.imageio knows nothing about TIFF. A search on imageio and TIFF leads to suggestions to use JAI.

Really, what kind of language is that poor in dealing with common image formats?

State of HTML5 video

Long Tail Video has an interesting page on the state of HTML5 video. Their view is filtered through their own product, but it’s still a nice job of covering current trends.

A history of character encodings

Here’s a nice little history of character encodings, from ASCII through UTF-8.

It doesn’t really “date back to the earliest days of computers”; before ASCII there was a jumble of incompatible character encodings, some using as few as 5 bits. Even afterward, a bizarre IBM encoding called EBCDIC hung on for many years. But the path from ASCII to its descendants is fascinating enough by itself.

Thanks to Andy Jackson for the link.