I’ve made some changes to the format registry browser since yesterday. Changes include a help page, ability to use the “/” (slash) character in searches (very helpful when searching MIME types), and links to the registry entries from search results (not working right for PRONOM).
I attempted to make the search fields persist through a session, but that isn’t working, even though it works on the local emulation. Hopefully I’ll figure that out.
Google App Engine is a pain to work with, even though it’s free and has a number of simplifying features. It’s good for getting a quick demo up, though.
To get started, you need to get an Eclipse plugin from Google. Then you need to create a Google web application project, which needs to be in just the structure they want. It needs to have a top-level directory called “war,” and that needs to have a file called WEB-INF/appengine-web.xml. If you’re starting a project from scratch, that’s not too heavy a requirement; other web application servers will just ignore that special file. But since I was working from an existing project, the differences were just enough that I had to create a separate Eclipse project for the Google version. Still, not too bad. The project is there and running. I don’t even need to run Ant; the plugin magically finds my classes. It also provides an emulation environment and simple uploading.
This morning I was working on a few enhancements on the main line when the Google version spontaneously rebuilt itself. The console reported:
DataNucleus Enhancer (version 3.1.0.m2) : Enhancement of classes
DataNucleus Enhancer completed with success for 0 classes. Timings : input=401 ms, enhance=0 ms, total=401 ms. Consult the log for full details
DataNucleus Enhancer completed and no classes were enhanced. Consult the log for full details
Now there were errors in Java files which are used only in the GUI version and reference AWT and Swing classes. An example: “java.awt.Dimension is not supported by Google App Engine's Java runtime environment.” Fortunately, my code is clean enough that I could fix the problem by deleting a few classes and turning one into a stub. Still, such a pain. There certainly are web applications that use AWT for offscreen drawing, and they just won’t work with Google. There have been complaints about this.
The environment is good enough for its purpose, but I wouldn’t try to do serious work with it.
The disappearing format blues
Old formats sometimes fade into obscurity and can no longer be supported, even if they come from a big company like Microsoft. Chris Rusbridge has noted that Microsoft’s Open Specifications page only goes as far back as Office 97, and that PowerPoint 4.0 files can’t be opened with today’s Microsoft Office. Tony Hey at Microsoft has replied. (Hey is vice president of Microsoft Research Connections). The response was encouraging, particularly in suggesting that Microsoft might “participate in a ‘crowd source’ project working with archivists to create a public spec of these old file formats.”
There’s usually some kind of software around that can read old formats. A search turns doesn’t turn up a lot; there’s something called PowerPressed, which will wrap old PowerPoint files in a .exe application. It looks as if it should run on current Windows systems, but all I know is what that page says.
The situation shows the risk of using a format that isn’t publicly documented. Today this is less of a problem. I think it’s been shown that publishing format specs doesn’t lead to cannibalization of sales by competing software; the company that created the spec is in a position to produce the best implementation. The description of PDF is fully public, and Adobe still dominates the market for PostScript software. Publishing the spec has just made the pie bigger. There’s still quite a lot of software that uses unpublished proprietary specs, though, and it’s risky to rely on the long-term reliability of the files they produce.
1 Comment
Posted in commentary
Tagged Microsoft, preservation, standards