Tag Archives: Google

Google Docs: Not a File Format

What’s the format of a Google Docs file? The question may not even be meaningful. According to Jenny Mitcham at the University of York, there is no such thing as a Google Docs file. What you see when you open a document is an assembly of information from a database. You can export it in various file formats, but the exported file isn’t identical to the Google document.

This makes them risky from a preservation standpoint. You can’t save a local backup of a document. If you lose your Google account, or if censorship in your country cuts you off from it, you lose all your documents.
Continue reading

The future of WebM

Yesterday I posted about the WebP still image format, expressing some skepticism about how easily it will catch on. Its companion format for video, WebM, may stand a better chance, though. Images aren’t exciting any more; JPEG delivers photographs well enough, PNG does the same for line art, and there isn’t a compelling reason to change. Video is still in flux, though, and the high bandwidth requirements mean there’s a payoff for any improvements in compression and throughput. The long-running battle among HTML5 stakeholders over video shows that it’s far from being a settled area. Patents are a big issue; if you implement H.264, you have to pay money. Alternatives are attractive from both a technological and an economic standpoint.

With Google pushing WebM and having YouTube, there’s a clear reason for browser developers to support it. YouTube plans to use the new WebM codec, VP9, once it’s complete. I haven’t seen details of the plan, but most likely YouTube will make the same video available with multiple protocols and query the browser’s capabilities to determine whether it can accept VP9. If the advantage is real and users who can get it see fewer pauses in their videos, more browser makers will undoubtedly join the bandwagon.

An eye on WebP

Google has been promoting the WebP still image format for some time, and lately Facebook has added its support. It’s hard to displace the well-entrenched JPEG, but it could happen. It supports both lossy and lossless compression, and Google claims it offers a significant advantage in compression over PNG and JPEG. Google says it’s free of patent restrictions; the container is the familiar RIFF. The VP8 lossy format is available as an IETF RFC; a specification for the lossless format is also available.

The container spec supports XMP and Exif metadata. Canvas width and height can be as much as 16,777,216 pixels, though their product is limited to 4,294,967,296 pixels. As far as I can tell it doesn’t support tiling, though, so partial rendering of huge images in the style of JPEG2000 may not be practical.

Chrome, Opera, and Ice Cream Sandwich offer WebP support, but not many other browsers do. Facebook’s offerings of WebP images have resulted in complaints from users whose browsers can’t read the format. The Firefox development team is starting to warm to it but hasn’t committed to anything yet. Internet Explorer hasn’t even reached that point.

It’s still early to make bets, but WebP increasingly bears watching. I’ve initiated a page for updates and errata for Files that Last with some updated information on WebP. (When I wrote the book, I couldn’t find the lossless spec.)

Registry browser update

I’ve made some changes to the format registry browser since yesterday. Changes include a help page, ability to use the “/” (slash) character in searches (very helpful when searching MIME types), and links to the registry entries from search results (not working right for PRONOM).

I attempted to make the search fields persist through a session, but that isn’t working, even though it works on the local emulation. Hopefully I’ll figure that out.

Google App Engine is a pain to work with, even though it’s free and has a number of simplifying features. It’s good for getting a quick demo up, though.

To get started, you need to get an Eclipse plugin from Google. Then you need to create a Google web application project, which needs to be in just the structure they want. It needs to have a top-level directory called “war,” and that needs to have a file called WEB-INF/appengine-web.xml. If you’re starting a project from scratch, that’s not too heavy a requirement; other web application servers will just ignore that special file. But since I was working from an existing project, the differences were just enough that I had to create a separate Eclipse project for the Google version. Still, not too bad. The project is there and running. I don’t even need to run Ant; the plugin magically finds my classes. It also provides an emulation environment and simple uploading.

This morning I was working on a few enhancements on the main line when the Google version spontaneously rebuilt itself. The console reported:


DataNucleus Enhancer (version 3.1.0.m2) : Enhancement of classes
DataNucleus Enhancer completed with success for 0 classes. Timings : input=401 ms, enhance=0 ms, total=401 ms. Consult the log for full details
DataNucleus Enhancer completed and no classes were enhanced. Consult the log for full details

Now there were errors in Java files which are used only in the GUI version and reference AWT and Swing classes. An example: “java.awt.Dimension is not supported by Google App Engine's Java runtime environment.” Fortunately, my code is clean enough that I could fix the problem by deleting a few classes and turning one into a stub. Still, such a pain. There certainly are web applications that use AWT for offscreen drawing, and they just won’t work with Google. There have been complaints about this.

The environment is good enough for its purpose, but I wouldn’t try to do serious work with it.