Tag Archives: TIFF

fixit_tiff, a TIFF repair tool

The Sächsische Landesbibliothek – Staats- und Universitätsbibliothek Dresden (Saxon State and University Library Dresden), which somehow gets abbreviated to SLUB, has developed a tool for working with TIFF files in digital preservation. fixit_tiff is a command line utility, written in C, which can do some repairs on defective TIFF files. The focus appears to be on correcting common errors, not on repairing corrupted files. A blog post from July (in German) indicates it can do configurable validation using a simple query language.

It’s available under the same license as Libtiff. Just what is that license? The only thing I can find is a very outdated “Use and Copyright” statement, which is on a page so old it warns about patents on LZW compression. It’s available for free, anyway.

PDF/R

The PDF Association and TWAIN Working Group have announced a partnership to develop a specification called PDF/Raster or PDF/R. It’s described as “a component of TWG’s TWAIN Direct™ initiative, a language/protocol that eliminates the need for users to install vendor specific drivers as communication between scanning devices and image capture software applications.”
Continue reading

Video

Video: A short history of graphic file formats

My video for today briefly covers graphic file formats from the forties to the present. I made some interesting discoveries along the way, especially Laposky’s CRT “Oscillons.”
Continue reading

TI(FF)/A

As I mentioned in an earlier post, Adobe objected to the use of the name TIFF in the TIFF/A Initiative and proposed TIFF profile. Since Adobe holds the trademark, their objection has legal force. Accordingly, TIFF/A has become TI/A (Tagged Image for Archival), and the Initiative is now using the domain ti-a.org. The old domain redirects to the new one.

This is bound to cause some confusion, but it looks as if there wasn’t any choice.

TIFF/A by any other name

TIFF/A is in search of a new name.

Today’s online kickoff discussion for the TIFF/A Initiative was productive in a lot of ways, but the big news for the broader public is that it will have to change its name. Adobe owns the TIFF trademark, and it doesn’t want “TIFF/A” used for the proposed new standard for archival TIFF.
Continue reading

TIFF/A kickoff

TIFF/A logoThe TIFF/A Initiative has announced its kickoff online conference for September 15 at 3 PM CEST. TIFF/A (see my earlier post) is a proposal for a set of rules, not yet defined, for archival-quality TIFF files. It’s still possible to sign up for participation. According to the email, the conference will cover:
Continue reading

TIFF/A

TIFF has been around for a long time. Its latest official specification, TIFF 6.0, dates from 1992. The format hasn’t held still for 23 years, though. Adobe has issued several “technical notes” describing important changes and clarifications. Software developers, by general consensus, have ignored the requirement that value offsets have to be on a word boundary, since it’s a pointless restriction with modern computers. Private tags are allowed, and lots of different sources have defined new tags. Some of them have achieved wide acceptance, such as the TIFFTAG_ICCPROFILE tag (34675), which fills the need to associate ICC color profiles with images. Many applications use the EXIF tag set to specify metadata, but this isn’t part of the “standard” either.

In other words, TIFF today is the sum of a lot of unwritten rules.

It’s generally not too hard to deal with the chaos and produce files that all well-known modern applications can handle. On the other hand, it’s easy to produce a perfectly legal TIFF file that only your own custom application will handle as you intended. People putting files into archives need some confidence in their viability. Assumptions which are popular today might shift over a decade or two. Variations in metadata conventions might cause problems.
Continue reading

TIFF/EP vs. Exif

I just discovered today that there are two different TIFF tags called “FocalPlaneResolutionUnit.” Tag 41488 goes by this name and is part of the Exif tag set. Accepted values for it are:

  • 1 = No absolute unit of measurement
  • 2 = Inch
  • 3 = Centimeter

Tag 37392 is a TIFF/EP (Electronic Photography) tag (working draft, final version not available online), also used in other raw formats, including DNG. Its accepted values are:

  • 1 = Inch
  • 2 = Metre
  • 3 = Centimetre
  • 4 = Millimetre
  • 5 = Micrometre

Recently I was sent a TIFF file, as a JHOVE issue, that had a tag 41488 with a value of 4. JHOVE correctly, but perhaps confusingly, reported that the fFocalPlaneResolutionUnit tag had an invalid value.

There are other tags in TIFF/EP that are equivalent, or nearly, to Exif tags. In some cases their values are identically specified, sometimes not. The Exif SubjectLocation tag is numbered 41492 and always has two shorts for its value, giving an X and Y value. The TIFF/EP counterpart is tag 37396, which can also have three shorts (specifying a circle) or four (specifying a rectangle).

I don’t know how this came about, but it’s something to watch out for in software that deals with both Exif and TIFF/EP tags. Some software may accept the EP extensions for Exif tags, but there’s no guarantee this will work.

Format conformity

By design JHOVE measures strict conformity to file format specifications. I’ve never been convinced this is the best way to measure a file’s viability or even correctness, but it’s what JHOVE does, and I’d just create confusion if I changed it now.

In general, the published specification is the best measure of a file’s correctness, but there are clearly exceptions, and correctness isn’t the same as viability for preservation. Let’s look at the rather extreme case of TIFF.

The current official specification of TIFF is Revision 6.0, dated June 3, 1992. The format hasn’t changed a byte in over 20 years — except that it has.

The specification says about value offsets in IFDs: “The Value is expected to begin on a word boundary; the corresponding Value Offset will thus be an even number.” This is a dead letter today. Much TIFF generation software freely writes values on any byte boundary, and just about all currently used readers accept them. JHOVE initially didn’t accept files with odd byte alignment as well-formed, but after numerous complaints it added a configuration option to allow them.

Over the years a body of apocrypha has grown around TIFF. Some comes from Adobe, some not. The titles of the ones from Adobe don’t clearly mark them as revisions to TIFF, but they are. The “Adobe PageMaker® 6.0 TIFF Technical Notes,” September 14, 1995, define the important concept of SubIFD, among other changes. The “Adobe Photoshop® TIFF Technical Notes,” March 22, 2002, define new tags and forms of compression. The “Adobe Photoshop® TIFF Technical Note 3,” April 8, 2005, adds new floating point types. The last one isn’t available, as far as I can tell, on Adobe’s own website, but it’s canonical.

Then there’s material without official Adobe approval. The JPEG compression defined in the 2002 tech notes is an official acceptance of a 1995 draft note that had already gained wide acceptance.

What’s the best measure of a TIFF file? That it corresponds strictly to TIFF 6.0? To 6.0 plus a scattered set of tech notes? Or that it’s processed correctly by LibTiff, a freely available and very widely used C library? To answer the question, we have to specify: Best for what? If we’re talking about the best chance of preservation, what scenarios are we envisioning?

One scenario amounts to a desert-island situation in which you have a specification, some files that you need to render, and a computer. You don’t have any software to go by. In this case, conformity to the spec is what you need, but it’s a rather unlikely scenario. If all existing TIFF readers disappear, things have probably gone so far that no one will be motivated to write a new one.

It’s more likely that people a few decades in the future will scramble to find software or entire old computers that will read obsolete formats. This doesn’t necessarily mean today’s software, but what we can read today can be a pretty good guide to what will be readable in the future. Insisting on conformity to the spec may be erring to the safe side, but if it excludes a large body of valuable files, it’s not a good choice.

Rather than insisting solely on conformity to a published standard, preservation-worthy files need to be measured by a balance between accepting files that will cause reading problems down the road and rejecting files that won’t. Multiple factors come into consideration, of which the spec is just one.

The horrible state of Java image processing

A while back I posted on the painfully poor choices in creating thumbnails of JPEG2000 files. Since then I’ve come to realize that support for image file processing in Java is even worse than I’d realized. Now I’m trying to make thumbnails from TIFF files. At first I went with JAI, even though it hasn’t been supported for five years and relies on implementation-dependent classes. I’d done this before successfully, but now I’m trying to do it in an EJB under JBoss. This runs into a NoClassDefFoundError trying to get com.sun.image.codec.jpeg.JPEGCodec. A web search suggests there’s some obscure trick necessary to access com.sun.image, but I couldn’t figure it out. It occurred to me that for what I’m doing, javax.imageio should be sufficient to do the job. It can read an image file, standard Java classes can scale the BufferedImage it produces, and then it can write the scaled image to a file.

Only one trouble: javax.imageio knows nothing about TIFF. A search on imageio and TIFF leads to suggestions to use JAI.

Really, what kind of language is that poor in dealing with common image formats?