Tag Archives: java

The Java file format API graveyard

If you look for Java libraries to support specific file formats, you’ll soon come upon the gloomy graveyard of Java APIs. Sun and Oracle have a history of devising nice packages for reading and writing different kinds of files, only to abandon their maintenance. You can still find pages for them, and it takes a close look to figure out that they aren’t supported any more.

Java Advanced Imaging (JAI) was nice in its time. It still has a page on Oracle’s website, but the latest “what’s new” item is dated 2007. The page brags about customer success stories as if it were still usable code. I’ve tried working with it. It’s out of sync with the current com.sun classes, and I got only limited use out of it. In its time it was a good way to read and write image files.

Java Media Framework (JMF) runs on a 166 MHz Pentium or 160 MHz PowerPC. The downloaded jars are dated May 1, 2003. It had a nice list of supported formats.

If you’re working with audio files, javax.sound looks more encouraging. Its API is listed with Java 8. The class java.sound.sampled.AudioSystem supports reading and writing of audio files. I can’t find a list of the supported formats.

Java does reliably support some formats. Its handling of text encodings is versatile, and java.util.zip handles ZIP and GZIP.

Third-party code can come to the rescue. For reading and writing PDF, Apache PDFBox looks like the best bet. You can use Apache Tika with lots of formats, if you just need to extract metadata. Another alternative is to use ImageMagick, but it runs natively rather than under the JVM, so you have to invoke it with exec calls. im4java and JMagick can save some of the tedium. There are open source Java libraries for reading and writing specific file formats. Some may be good, some not.

If you need to deal with the guts of file formats in Java, you’ll usually have to find some good third-party code or start writing your own.


A simple but useful tool that’s part of FITS’s collection is FFident, written by Marco Schmidt. He apparently is no longer maintaining it, and its page disappeared from the Web but was retained on the Internet Archive. It seemed like a good idea to make it more readily available, so I’ve put it, using its LGPL license, into a Github repository.

FITS uses its own copy of the source code, so this really isn’t tested at all in its own right, but it’s there for people to play with. I added a build.xml file and organized the code the way Eclipse likes it. I don’t have any plans to support it, but if anyone wants to play with it, it’s there.

Format registry browser online

In an effort to promote interest in my format registry browser, I’ve built a Java web application out of it and put it up on Google App Engine at regbrowser.appspot.com. It lets you search PRONOM, UDFR, and the DBPedia structured summaries of format articles, by name, MIME Type, creator, and extension. It uses SPARQL Linked Data queries to obtain data.

It’s still in a rough form; the point is to show what it can do and hopefully get some interest in putting money into further development. Obvious improvements, which I may do shortly, would include checkboxes for which repositories to search and retention of text fields when returning to the search page.

UDFR times out a lot. If you get a timeout error, trying again has a good chance of working.

Format registry browser on Github

I’ve put the format-reg-browser project up on Github, in case anyone wants to play with the code. This is the first time I’ve committed code to any kind of Git site, but it looks as if the code’s really there. Let me know if there are any problems.

The horrible state of Java image processing

A while back I posted on the painfully poor choices in creating thumbnails of JPEG2000 files. Since then I’ve come to realize that support for image file processing in Java is even worse than I’d realized. Now I’m trying to make thumbnails from TIFF files. At first I went with JAI, even though it hasn’t been supported for five years and relies on implementation-dependent classes. I’d done this before successfully, but now I’m trying to do it in an EJB under JBoss. This runs into a NoClassDefFoundError trying to get com.sun.image.codec.jpeg.JPEGCodec. A web search suggests there’s some obscure trick necessary to access com.sun.image, but I couldn’t figure it out. It occurred to me that for what I’m doing, javax.imageio should be sufficient to do the job. It can read an image file, standard Java classes can scale the BufferedImage it produces, and then it can write the scaled image to a file.

Only one trouble: javax.imageio knows nothing about TIFF. A search on imageio and TIFF leads to suggestions to use JAI.

Really, what kind of language is that poor in dealing with common image formats?