Category Archives: commentary

Making sense of MPEG audio formats

Lately I’ve been trying to clarify in my mind exactly how certain common MPEG-related audio formats are defined. I think I’ve got this right, but if anyone can offer corrections, they’d be appreciated.

MP3

This is the common name for MPEG-1 and MPEG-2, Audio Layer 3, which defines an encoding method. The difference between the MPEG-1 and MPEG-2 variants is just in the sampling rates. It’s audio-only. An “MP3 file” is normally a raw MP3 stream without a container.

MP4

MP4 means MPEG-4 Part 14. This is a container format which can hold audio, video, or both. It doesn’t specify the encoding method. In principle you could have an MP3 stream in an MP4 file. The preferred extension for MP4 containers is .mp4, but many others are used to denote specific encodings within MP4 containers.

AAC

This is short for MPEG-2 Part 7, Advanced Audio Coding. MPEG-4, Part 7 defines some extensions of it. That’s the encoding; several different containers may be called “AAC files” if they hold an AAC stream. A raw AAC stream file is possible but not common. MP4 is the most common container, so “MP4 audio” and “AAC” are often treated as if they were synonyms. HE-AAC, also known as aacPlus, is an MPEG-4 audio profile. HE-AAC decoders can decode AAC, but not vice versa.

Concerns with Apple’s iBooks Author

Apple’s iBooks textbooks for iPad stakes a position against openness in e-book publishing.

The format of the books is not a standard EPub format. The only tool that can create this format is Apple’s iBooks Author, and the only application that can view it is iBooks. An article on Ars Technica reports that it uses “ePub 2 along with certain HTML5 and JavaScript-based extensions that Apple uses to enable multimedia and interactive features. Those interactive features will only work with Apple’s iBooks app, not with other e-reader software or hardware, because only Apple supports those extensions.”

A post on Glazblog (the author says he’s “Co-chairman of the W3C CSS Working Group”; it would be nice if he gave his name) gives technical details. It uses XML namespaces that aren’t publicly documented, a nonstandard MIME type, and a private CSS extension.

This means you can’t view the books on anything but iOS. If Apple ever drops support for the format, it’s obsolete and impossible to support.

On top of this, the EULA for iBooks Author restricts sale of books created with it to the Apple Store. You can give away your books by any channel you like, but if you sell them, you must use the Apple Store. This means that if Apple doesn’t accept your book for publication, you can’t sell it in that format. (Except maybe in France, as Glazblog amusingly notes.) This is like having a compiler that lets you create software which you may sell only through Petitmol, or a video application that forbids you from selling your movies through anyone but FooTube. I can’t think of a precedent for this.

Authors normally would like to be able to take a book to a different publisher if their previous one loses interest. With books created with iBooks Author, you can’t do that, for both technical and legal reasons. The format isn’t under DRM, though, and the exclusivity applies to the format, not the content. As far as I can tell, you should be able to extract most of the content and republish it in a different format.

Apple’s restrictions make iBooks textbooks unsuitable for assignment to classes, unless the school is willing to give every student an iPad. Those who use other devices would be left out in the cold.

Apart from the restrictions, does Apple’s new format offer anything exciting? My own reaction, from briefly looking at a few sample books on a co-worker’s iPad, is that the interactive graphics are attention-getting, but the most important form of “interactivity” with a textbook is trying things out on your own — playing with the equations, writing sentences in the language, whatever. The best accessory for that is still a pencil and paper.

Correcting Harvard Library rumors

In spite of rumors that have shown up in the #hlth feed on Twitter, no one at the Harvard Library was laid off yesterday, let alone “everybody.” We were told, however, that there will be cutbacks.

We were told that we should all fill out “employee profiles” online to aid in determining what future career we’d have, if any, at Harvard. An official pronouncement quoted in Library Journal has denied that we will all have to “reapply” for our positions, but many of us find the distinction subtle even if it’s technically true.

Take a look at this post for a good summmary.

Further update: Here’s a transcript of yesterday’s presentation at Harvard. There is one significant discrepancy between the transcript and what I and others recall: Helen Shenton did not say at the 9 AM meeting that the deadline for employee profiles was February 29. The deadline was initially earlier — mid-February, I think — and was changed to February 29 by the end of the meeting, following numerous expressions of concern from the audience. (She may have said February 29 at the later meetings.)

Undocumented “open” formats

Recently I learned that I can’t upgrade to a current version of Finale Allegro, a music entry program, except by getting the very expensive full version or taking a step downward to PrintMusic. Since I don’t want to lose all my files when some “upgrade” makes Allegro stop working, I’ve been looking for alternatives. MuseScore has its attractions; it’s open source, powerful, and generally well regarded. But I ran across this discussion on the MuseScore forum, which has me just a bit worried. According to “Thomas,” whose user ID is 1 and so probably speaks with authority, “As the MuseScore format is still being shaped on a daily basis, we haven’t put any effort yet to create a schema.”

This doesn’t encourage me to use MuseScore. Even though it’s an “open” application, its format isn’t open in any meaningful sense. You can download the code and reverse-engineer it, of course, but it’s going to change in the next version. While I’m sure the developers will try not to break files created with earlier versions, there’s no guarantee they’ll succeed, and they’re likely to be especially careless about compatibility with files that are more than a few versions old.

You can export files to MusicXML, which is standardized, but in trying this out I came upon a disturbing bug. If I edit the file and save the changes, they’re saved not to the .xml file but to a .mcsz file, MuseScore’s native format. If there’s already an older file with that name, it gets overwritten without warning.

The dichotomy between “open” and “proprietary” formats is the wrong one. There are many formats which are trademarked by a business and their documentation copyrighted, but if the documentation is public and the format not encumbered by patents, anyone can use it. Formats which are created by open-source code but are undocumented and subject to change might are effectively closed formats.

This post grew, in part, from my thoughts on avoiding data loss due to format obsolescence, which is this topic of this week’s post on Files That Last.

The HTML5 “sarcasm” tag

In the November 5 Editor’s Draft of HTML5: A vocabulary and associated APIs for HTML and XHTML, there is a curious reference to the “sarcasm” tag.

8.2.5.4.7 The “in body” insertion mode

When the user agent is to apply the rules for the “in body” insertion mode, the user agent must handle the token as follows:

An end tag whose tag name is “sarcasm”

Take a deep breath, then act as described in the “any other end tag” entry below.

This is the only reference to the tag, so I guess only the closing </sarcasm> tag is allowed, not the opening <sarcasm> tag.

Perhaps this was a test to see if anyone’s actually reading?

The email jungle

In researching tomorrow’s post on email preservation on Files That Last, I came to appreciate more thoroughly how messy email formats are. RFC 4155, which defines “the ‘default’ mbox database format” (their quotes around “default”) and application/mbox MIME type, tells us that “The mbox database format is not documented in an authoritative specification, but instead exists as a well-known output format that is anecdotally documented, or which is only authoritatively documented for a specific platform or tool.”

Some versions may have eight-bit character data with the character encoding not explicitly specified, and possibly varying from one file creator to another. The format of email addresses isn’t specified. A short page on qmail.org, referenced from RFC 4155, discusses some of the variants, including mboxo, mboxrd, mboxc1, and mboxc12. The differences may appear minor, but they’re sufficient that a parser that assumes one of the variants can fail when it encounters the others.

Then there’s the encoding issue. Most of the world has settled on MIME by now, but older archives (and perhaps some recent ones) may contain messages encoded with uuencode, BinHex, or Apple Single. The last two are found mostly with mail that was sent from Macintosh clients, but uuencode was once widely used — and poorly standardized.

An alternative email archiving format is the CERP XML schema. This looks at a glance as if it provides better structuring than MBOX, but it isn’t as widely supported.

Update: The FTL post is now available at “You HAD mail.”

Format registries

Two posts in one day!

That Library Journal article led to a number of interesting links (you’ll notice I’ve added Karen Coyle’s blog to my blogroll), and eventually I came upon this article by Chris Rusbridge on file format registries.

GDFR was originally supposed to be a distributed registry without a central site, but that idea collapsed under the weight of its complexity before GDFR itself ran out of steam. UDFR is trying to do something similar. I wonder if the best solution to the problem of format coverage would be a moderated wiki that didn’t require new, complicated software underpinnings.

HTML5 as a “programming language”

A JavaWorld article rhetorically asks, “Will HTML5 kill the mobile app?” Windows 8 will purportedly have a new type of application, written in HTML5 and JavaScript. I have to wonder whether the people who are proposing HTML5, CSS3, and JavaScript as a programming environment have the least idea of what programming is about.

The idea is so bizarre that it’s hard to know where to start a refutation. How would you refute a claim that silly putty is going to be the new way to build skyscrapers? HTML, in any version, just isn’t a programming language. JavaScript can be used for some programming tasks — in principle, it can implement any computation that you could write in another language — but doing anything but the simplest programming tasks in it is agonizing.

There are innocent people who’ve copied a script to produce a Web page effect, and there are less innocent people who find it convenient to delude them with the notion that that’s what programming is. The web page for HTML5 for Dummies declares: “HTML is the predominant programming language used to create Web pages.” If you can believe that, you’re part of the target audience of the title.