Monthly Archives: June 2016

Unicode 9.0

The Unicode Consortium has announced the release of Unicode 9.0. It adds character sets for some little-known languages, including Osage, Nepal Bhasa, Fulani, the Bravanese dialect of Swahili, the Warsh orthography for Arabic, and Tangut. It updates the collation specification and security recommendations.

Most Unicode implementations will require just font upgrades, but full support of some of the more unusual scripts will require attention to the migration notes.

“Asymmetric case mapping” sounds interesting. I believe this means that the conversion between upper case and lower case isn’t one-to-one and reversible. The notes give the example of “the asymmetric case mapping of Greek final sigma to capital sigma.” Lowercase sigma has two forms; it’s σ except at the end of a word, where it’s ς. Both turn into Σ in uppercase.

What really has people excited about Unicode 9, if a Startpage search is any indication, isn’t any of these things, but that about 1% of the new characters are emoji and that Apple and Microsoft lobbied against one candidate emoji. I wonder if the Unicode Consortium regrets having gotten involved in that mess in the first place. There are no possible criteria except whims for what the set should include. There’s no limit on how many could be added. OK, having a universally set of encodings promotes information interchange, but the tail is wagging the 🐕.

By the way, what’s the plural of “emoji”? I use “emoji” as both singular and plural, but I’m seeing “emojis” with increasing frequency. It just looks wrong to me. Does anyone say “kanjis” or “romajis” for the other Japanese character sets? I had to argue with the editor to keep the title of my article “The War on Emoji” that way.

Don’t hide those file extensions!

Lately I’ve ghostwritten several pieces on Internet security and how to protect yourself against malicious files. One point comes up over and over: Don’t hide file extensions! If you get a file called Evilware.pdf.exe, then Microsoft thinks you should see it as Evilware.pdf. The default setting on Windows conceals file extensions from you; you have to change a setting to view files by their actual names.

What’s this supposed to accomplish, besides making you think executable files are just documents? I keep seeing vague statements that this somehow “simplifies” things for users. If they see a file called “Document.pdf,” Microsoft’s marketing department thinks people will say, “What’s that .pdf at the end of the name? This is too bewildering and technical for me! I give up on this computer!”

They also seem to think that when people run a .exe file, not knowing it is one because the extension is hidden, and it turns out to be ransomware that encrypts all the files on the computer, that’s a reasonable price to pay for making file names look simpler. It’s always marketing departments that are to blame for this kind of stupidity; I’m sure the engineers know better.
Continue reading

APFS, Apple’s replacement for HFS+

Apple is introducing a new file system to replace the twentieth-century HFS+. The new one is called APFS, which simply stands for “Apple File System.” When Apple released HFS+, disk sizes were measured in megabytes, not terabytes.

New features include 64-bit inode numbers, nanosecond timestamp granularity, and native support for encryption. Ars Technica offers a discussion of the system, which is still in an experimental state.
Continue reading

The state of PDF 2.0

The next big jump in PDF may finally happen this year. The PDF association tells us that the spec for PDF 2.0 is “feature-complete” and will be available to the ISO PDF committee and members of the PDF Association in July. When this will turn into a public release still isn’t clear. A year ago the target was “mid-2016”; that seems unlikely now.

The specification will be ISO 32000-2. The current version of PDF, 1.7, is ISO 32000-1. More precisely, Adobe has published several extension levels to PDF 1.7. They’re a way of getting around having a version 1.8, which would be an admission that the ISO standard is outdated. Version 2.0 will get Adobe and ISO back in sync. Hopefully Adobe will publish the PDF spec for free, as it has in the past, so that it won’t be available just to people who pay for the ISO version. Currently an electronic copy of ISO 32000-1 costs 198 Swiss francs, or a bit more than $200.
Continue reading

Recreating Clarke’s “The Sentinel” in real life

Plexiglass monolithLunar Mission One, a private nonprofit organization, is trying to recreate Arthur C. Clarke’s “The Sentinel” (the inspiration for the movie 2001) in real life. They hope to send a digital archive to the moon in 2024 and bury it there. As long as whatever is stored there can withstand intense cold, it should last a very long time.

The plan calls for two archives. One would contain items privately provided by people paying to have their data stored on the moon; the other would be a history of humanity. CEO David Iron (no relation to Tony Stark) raises the question of how living beings of the future will find it and says, “We need a permanent sign that will last for a billion years. … We need to invert the normal logic of searching for extra-terrestrial intelligence by transmitting; they can come to us.”
Continue reading


File format analysis tools for archivists

My article on “File Format Analysis Tools for Archivists” is up on