TIFF/EP vs. Exif

I just discovered today that there are two different TIFF tags called “FocalPlaneResolutionUnit.” Tag 41488 goes by this name and is part of the Exif tag set. Accepted values for it are:

  • 1 = No absolute unit of measurement
  • 2 = Inch
  • 3 = Centimeter

Tag 37392 is a TIFF/EP (Electronic Photography) tag (working draft, final version not available online), also used in other raw formats, including DNG. Its accepted values are:

  • 1 = Inch
  • 2 = Metre
  • 3 = Centimetre
  • 4 = Millimetre
  • 5 = Micrometre

Recently I was sent a TIFF file, as a JHOVE issue, that had a tag 41488 with a value of 4. JHOVE correctly, but perhaps confusingly, reported that the fFocalPlaneResolutionUnit tag had an invalid value.

There are other tags in TIFF/EP that are equivalent, or nearly, to Exif tags. In some cases their values are identically specified, sometimes not. The Exif SubjectLocation tag is numbered 41492 and always has two shorts for its value, giving an X and Y value. The TIFF/EP counterpart is tag 37396, which can also have three shorts (specifying a circle) or four (specifying a rectangle).

I don’t know how this came about, but it’s something to watch out for in software that deals with both Exif and TIFF/EP tags. Some software may accept the EP extensions for Exif tags, but there’s no guarantee this will work.

When the Internet Archive gets a National Security Letter

This post is off topic for this blog, so ignore it if you like. A number of people connected with archives and preservation activities read this, though, and I think it’s important for people to know that the Internet Archive was subjected to a National Security Letter and successfully fought it, thus becoming one of the very few recipients of these Orwellian orders to be allowed to talk about it. Please read the article.

The FBI has issued tens of thousands of National Security Letters. If you’re the target of one, you can’t tell anyone, not even your own family. The Patriot Act originally prohibited people from even talking to a lawyer about them, but that ban was struck down. I have never been issued a National Security Letter, so I can tell you I haven’t. If I had been, I couldn’t say anything, and if you asked me, I’d have to say, “I can’t answer that.”

Google Reader is gone (yawn)

feedAs of today, Google Reader is gone. (Correction: It goes away at the end of today. You still have time to export your feed list.) When its termination was announced, some writers declared it meant the end of RSS feeds. From what I’m seeing today, the attempts at panic have died away, replaced by a realization that RSS and Atom are well-understood feed formats and that lots of alternatives exist. Tristan Louis writes for Forbes:

While the death of the most popular RSS reader on the internet could have been seen as something that would represent a grave danger for RSS as a standard, for openness as a concept, and for heavy news consumption, the inverse has been true, as it only solidified RSS’ position in the world as the format for news delivery. Reader was a good product but one can hardly call it a great product and its demise will help rectify some imbalances it created in the market.

Hopefully dedicated users backed up their feed collections as an OPML file. If not, all they have to do is start collecting feed URLs again.

There’s no need to use a website at all to manage your feeds. On my iPod, I use Free RSS Reader, a simple, straightforward reader, though unfortunately it’s no longer being updated. On my main computer I use Sage, a Firefox extension.

A few columnists got a temporary boost in readership and a long-term loss in credibility by proclaiming the demise of RSS. The rest of us are still fine.

No, Andy, Amazon won’t last forever

Last week I attended a talk by Andy Ihnatko at the Nashua Public Library. He talked about a lot of interesting things and gave us a close-up of Google Glass in action, but there was one point I had to take issue with. He said it was unreasonable to complain about Amazon’s DRM, because you can play Amazon media on just about any device. During the question period I asked him: If you buy a DRM product from Amazon today, how long do you think they’ll support it? He answered that “Amazon will be around forever.”

This is an astonishing thing to say, especially for someone so intelligent. If he thinks Amazon will never go out of business and will support its DRM through all the coming centuries, probably a lot of other people think that. If you look at DEC, Data General, Wang, Commodore, and Control Data, though, it’s hard to believe in corporate immortality. Even when companies don’t disappear or become assimilated, they usually stop supporting old products after a while.

Maybe Andy’s definite of “forever” is 10 or 20 years. A lot of people don’t think any books or recordings are worth keeping even that long. Personally, I have quite a few books from the 19th century, and it would be a sadder and poorer world if those weren’t available any more.

DRM isn’t forever. In the future, if there are materials that aren’t distributed except in DRM form, they could disappear completely, making the world sadder and poorer.

Before leaving, I handed Andy a card promoting Files that Last. I hope he reads it and learns something from it. Oh, yes, and that he reviews it and boosts my sales. :)

JHOVE 1.10

JHOVE 1.10 is now available for downloading. It’s the same as 1.10B3 except for the version numbering. The Javadoc has been brought up to date.

I haven’t included the MD5 files, since SourceForge provides MD5’s. If you still want them, let me know.

JHOVE 1.10b3

JHOVE 1.10b3 is now available. This is the release candidate, and there won’t be any further changes beyond the version number designation unless a serious problem shows up.

Audio and video in HTML5

I’ve been studying up on streaming audio and video and related issues, so lately I’ve been playing with the <audio> and <video> tags in HTML5. It’s possible to put them to good use, but there are more issues than their proponents will readily admit.

A good piece of news is that both tags do exactly the same thing except for their appearance. You can play video with the audio tag and vice versa, and they implement the same DOM model. (Of course, you won’t see anything interesting if you use <audio> for video.)

The main limitation is that these tags support only progressive streaming, which differs from “true” streaming in some important ways. Progressive streaming means downloading a file and starting to play it almost immediately, rather than after it’s finished downloading. Its disadvantages are that the bit rate can’t be adjusted while playing, you can’t keep the file from being grabbed in its entirety with a simple HTTP call, and the download continues to completion even if the user pauses the player. These aren’t always significant problems, but they mean that the new HTML5 tags aren’t the full replacement for Flash which they’re sometimes claimed to be.

There’s enough interest in true streaming that various parties have developed protocols to do it over HTTP. These include HTTP Live Streaming from Apple, HTTP Dynamic Streaming from Adobe, Smooth Streaming from Microsoft, and Dynamic Streaming Over HTTP from MPEG (which its proponents insist isn’t a protocol). There are more details on streaming on my website.

The other problem with the HTTP tags is that there’s no one encoding that all major browsers support. This problem is well known on the video side, but I was surprised to discover it’s even true for audio. The current version of Firefox doesn’t natively support MP3 in the audio tag, and the QuickTime plugin isn’t used in this case (or at least I can’t get it to work). The reason for this is software patents. There’s a good discussion of the state of MP3 with Firefox on Stack Overflow.

You can specify several <source> elements within an audio or video element, and the browser will try each one in turn till it finds one it can play. Two formats or at most three will cover all major browsers. For audio, including both an MP3 and an Ogg Vorbis version should cover all the bases; for video, MP4/H.264 and Ogg Theora should do it, though you might want to add WebM.

Specifying the type attribute as the MIME type of the file (e.g., <source src="song.mp3"
type="audio/mpeg">
helps the page to load faster, since the browser can determine without examining the file if it can play the file in principle. Make sure, however, to use only the canonical MIME types. From experimentation with various browsers, these include:

  • audio/mp4
  • audio/mpeg
  • audio/ogg
  • video/mp4
  • video/ogg
  • video/webm

If you specify application/mp3 rather than audio/mpeg for an MP3 source, the browser may decide it can’t play it even though it really can.

Another issue is the AV API for HTML5. There’s a pretty decent DOM API to go with the audio and video tags, allowing you to override the player controls and dynamically change content. Some implementations (e.g., Mozilla’s version) have added private extensions. Some people want more power, so there are third-party plugins and JavaScript libraries such as MediaElement.js that extend the API.

It’s a minefield, except that the mishaps come from the absence of an earth-shattering kaboom. Still, using the HTML5 tags is much simpler than Flash or HTTP streaming.

JHOVE and XHTML

I’m surprised I only got a complaint about this recently. Using JHOVE to validate XHTML files is often painfully slow. In fact, using anything to validate them without caching or redirection of DTDs would be painfully slow. The DOCTYPE declaration brings in the standard XHTML DTD, and it in turn brings in lots of other DTDs. These all have URLs on w3.org. As you can imagine, this is a lot of traffic converging in one place, and the response is often very slow.

JHOVE has a remedy, but it turns out not to work in this case. In the configuration file, you can declare local copies of schemas and DTDs to be loaded by the SAX entity resolver. This looks something like this:

 <module>
   <class>edu.harvard.hul.ois.jhove.module.XmlModule</class>
  <param>schema=http://www.w3.org/TR/REC-smil/SMIL10.dtd;/Users/gmcgath/schemas/SMIL10.dtd</param>
 </module>

Unfortunately, there are some problems in JHOVE 1.9. The HTML module processes XHTML files by passing them to the XML module. In this case, the module doesn’t get the parameters that the config file declared for it. In JHOVE 1.10, I’ll fix this by having the HTML module pass its own parameters to the XML module. At present, JHOVE’s processing of XHTML files makes no use of the configuration file’s instructions to the entity resolver.

There’s another complication. The XHTML DTD invokes other DTDs, and JHOVE has to get every one of those in turn. Some of them have relative URLs to other DTDs; these break when they’re redirected to local files. Even making local copies of all the files doesn’t work, as JHOVE doesn’t handle the relative URLs correctly within the file system, and making them work would require changing some existing assumptions. The best fix for the user is to get JHOVE 1.10 when it’s ready (version 1.10B2 doesn’t have the XHTML fix yet) edit all those files so that all the URLs are absolute.

This is a big chunk of work, and I haven’t tested the approach fully. Any ideas on how this might be better handled would be appreciated.

JHOVE 1.10b2

I’ve put up JHOVE 1.10b2. It has a bit of optimization for the PDF module, though files with huge structure trees are still painfully slow.

Streaming protocols

Last week I was doing some consulting work on Wowza Media Server for the Harvard Library, and I noticed there are some issues about streaming protocols which often aren’t well understood. To help clarify them in my own mind, and hopefully provide a useful resource for others, I’ve put a page on Basics of Streaming Protocols on my business website.

If you notice anything that’s wrong or confusing, please let me know.