The MobileRead Wiki is a great place to jump into if you’re looking for information on ebook formats. It isn’t uniformly up to date; for instance, it says there is “a new version of [EPUB] called ePub 3 but it is not in wide use.” But it covers lots of formats and has some excellent analysis, especially a look in depth at the Amazon MOBI format.
Archiveteam.org points at that wiki page as its reference on the format, so it seems as close to an “official description” as anyone has offered.
Amazon is the one company that uses MOBI and its bastard children, while everyone else is using EPUB, but obviously Amazon can’t be ignored. It distributes Kindle software so widely that you can read MOBI files on any device.
You can create a file at least in older versions of MOBI. Calibre, for instance, lets you create MOBI. What you can’t do is add DRM that will work. (I’ll skip the rant on DRM in ebooks this time.) It’s impressive that anyone has figured out the format. It has its origins in the old PalmOS, designed at a time when saving memory was the most important consideration.
The wiki gives us observations like “The MOBI header is of variable length and is not documented” and “The EXTH header is also undocumented, so some of this is guesswork.” The “guesswork” is an impressive piece of reverse engineering. The wiki page devotes most of its space to the headers. The content is HTML with some modifications (e.g., extra attributes for the IMG tag).
It gets more confusing. Amazon’s successor to MOBI is a maze of twisty little formats. There’s KF8, AZW, AZW3.AZW6, and more. It’s not fully clear whether these are all different formats or just different names. Jonny Greenwood has made an impressive attempt to sort them out. Amazon appears to use the AZW extension for more than one format. It’s impossible for anyone except Amazon (or someone with access to Amazon’s technology) to create a usable AZW file with DRM, since Kindle readers wouldn’t recognize it as an authenticated file. The Sustainability of Digital Formats site of the Library of Congress doesn’t even mention these formats.
The MOBI family is the White Whale of formats. You could spend years obsessing over it, and in the end you’ll probably regret what you find.