Tag Archives: software

The horrible state of Java image processing

A while back I posted on the painfully poor choices in creating thumbnails of JPEG2000 files. Since then I’ve come to realize that support for image file processing in Java is even worse than I’d realized. Now I’m trying to make thumbnails from TIFF files. At first I went with JAI, even though it hasn’t been supported for five years and relies on implementation-dependent classes. I’d done this before successfully, but now I’m trying to do it in an EJB under JBoss. This runs into a NoClassDefFoundError trying to get com.sun.image.codec.jpeg.JPEGCodec. A web search suggests there’s some obscure trick necessary to access com.sun.image, but I couldn’t figure it out. It occurred to me that for what I’m doing, javax.imageio should be sufficient to do the job. It can read an image file, standard Java classes can scale the BufferedImage it produces, and then it can write the scaled image to a file.

Only one trouble: javax.imageio knows nothing about TIFF. A search on imageio and TIFF leads to suggestions to use JAI.

Really, what kind of language is that poor in dealing with common image formats?

JPEG2000 thumbnails

I’ve been trying to find software for batch generation of thumbnails for JPEG2000 images. So far this is what I’ve looked at:

Kakadu is commercial software that looked hopeful at first, but the licensing is confusing. The description of the “Non-commercial, Named User Licence” says it “can only be purchased by individuals, Academic Institutions, not-for-profit organizations and libraries which do not gain financially by using this software,” but the license itself doesn’t say anything about licensing to institutions, only individuals. Our attempts to get a clarification have gotten no response. If they ignore us when we want to buy something, that doesn’t bode well for support.

OpenJPEG has its supporters, but it has a command line API which can’t create JPEG, GIF, or PNG, and it can’t create images of a specified size. There are C functions which may or may not be directly callable, but the documentation for them is really scanty.

ImageMagick didn’t seem appealing at first because of its command-line orientation, but it may be the best option. JMagick provides a JNI connection. The documentation indicates it can generate images of a specified size and format, which is what we need.

If anybody reading this has other suggestions, let me know.

The Lib-Ray project

Just last weekend I got my first Blu-Ray disk and found that it came with a warning that if I didn’t have the latest software updates on my player, it might not play. (It did play, being far older than my player.) This annoyed me enough that I’m glad to hear of an open-source, non-DRM alternative to Blu-Ray in the works. Lib-Ray is a project to create a high-definition video standard with “no DRM,” “no region codes,” “no secrets,” and “no limits.” There’s a Kickstarter page looking for funding for the project.

According to the current specification, Lib-Ray uses the Matroska (MKV) container format.

Creating a mass market for Lib-Ray player boxes sounds like a long shot, but it’s easy enough to imagine open-source software being developed and distributed that would let any modern computer play the disks. This could be a boon to anyone who wants to distribute high-quality video discs without DRM.

Some articles on Lib-Ray:

Contributors to JHOVE2

The JHOVE2 project has issued a governance document (PDF) for contributors to the JHOVE2 project. Stephen Abrams writes that “we believe it important to enlist the efforts of the wider user community in future efforts. Working collectively, we can most effectively take advantage of opportunities to enhance and extend the utility of JHOVE2, especially in times of significant constraints on local institutional resources.”

FIDO 1.0.0 from Open Planets Foundation

Open Planets Foundation has announced FIDO 1.0.0, “a Python command line tool to identify the file formats of digital objects. A lot of improvements to the code and functionality have been made.”

DROID and JRE 7

According to a post on the DROID mailing list, DROID is not currently compatible with JRE 7. An issue with the Spring framework appears to be the cause. The next release of DROID should support Java 7.

Apache ODF toolkit

The Apache Software Foundation has made its first release of the ODF Toolkit. This version is called 0.5-incubating, so I imagine it still has rough edges. Officially, “incubating” means that “the project has yet to be fully endorsed by the ASF.”

This could be useful to software that validates or extracts metadata from Open Document Format files. It includes ODFDOM 0.8.7, which has been around for about a year. Anyone want to write a module for JHOVE or JHOVE2?

Concerns with Apple’s iBooks Author

Apple’s iBooks textbooks for iPad stakes a position against openness in e-book publishing.

The format of the books is not a standard EPub format. The only tool that can create this format is Apple’s iBooks Author, and the only application that can view it is iBooks. An article on Ars Technica reports that it uses “ePub 2 along with certain HTML5 and JavaScript-based extensions that Apple uses to enable multimedia and interactive features. Those interactive features will only work with Apple’s iBooks app, not with other e-reader software or hardware, because only Apple supports those extensions.”

A post on Glazblog (the author says he’s “Co-chairman of the W3C CSS Working Group”; it would be nice if he gave his name) gives technical details. It uses XML namespaces that aren’t publicly documented, a nonstandard MIME type, and a private CSS extension.

This means you can’t view the books on anything but iOS. If Apple ever drops support for the format, it’s obsolete and impossible to support.

On top of this, the EULA for iBooks Author restricts sale of books created with it to the Apple Store. You can give away your books by any channel you like, but if you sell them, you must use the Apple Store. This means that if Apple doesn’t accept your book for publication, you can’t sell it in that format. (Except maybe in France, as Glazblog amusingly notes.) This is like having a compiler that lets you create software which you may sell only through Petitmol, or a video application that forbids you from selling your movies through anyone but FooTube. I can’t think of a precedent for this.

Authors normally would like to be able to take a book to a different publisher if their previous one loses interest. With books created with iBooks Author, you can’t do that, for both technical and legal reasons. The format isn’t under DRM, though, and the exclusivity applies to the format, not the content. As far as I can tell, you should be able to extract most of the content and republish it in a different format.

Apple’s restrictions make iBooks textbooks unsuitable for assignment to classes, unless the school is willing to give every student an iPad. Those who use other devices would be left out in the cold.

Apart from the restrictions, does Apple’s new format offer anything exciting? My own reaction, from briefly looking at a few sample books on a co-worker’s iPad, is that the interactive graphics are attention-getting, but the most important form of “interactivity” with a textbook is trying things out on your own — playing with the equations, writing sentences in the language, whatever. The best accessory for that is still a pencil and paper.

Undocumented “open” formats

Recently I learned that I can’t upgrade to a current version of Finale Allegro, a music entry program, except by getting the very expensive full version or taking a step downward to PrintMusic. Since I don’t want to lose all my files when some “upgrade” makes Allegro stop working, I’ve been looking for alternatives. MuseScore has its attractions; it’s open source, powerful, and generally well regarded. But I ran across this discussion on the MuseScore forum, which has me just a bit worried. According to “Thomas,” whose user ID is 1 and so probably speaks with authority, “As the MuseScore format is still being shaped on a daily basis, we haven’t put any effort yet to create a schema.”

This doesn’t encourage me to use MuseScore. Even though it’s an “open” application, its format isn’t open in any meaningful sense. You can download the code and reverse-engineer it, of course, but it’s going to change in the next version. While I’m sure the developers will try not to break files created with earlier versions, there’s no guarantee they’ll succeed, and they’re likely to be especially careless about compatibility with files that are more than a few versions old.

You can export files to MusicXML, which is standardized, but in trying this out I came upon a disturbing bug. If I edit the file and save the changes, they’re saved not to the .xml file but to a .mcsz file, MuseScore’s native format. If there’s already an older file with that name, it gets overwritten without warning.

The dichotomy between “open” and “proprietary” formats is the wrong one. There are many formats which are trademarked by a business and their documentation copyrighted, but if the documentation is public and the format not encumbered by patents, anyone can use it. Formats which are created by open-source code but are undocumented and subject to change might are effectively closed formats.

This post grew, in part, from my thoughts on avoiding data loss due to format obsolescence, which is this topic of this week’s post on Files That Last.