The Unicode Consortium has announced the release of Unicode 9.0. It adds character sets for some little-known languages, including Osage, Nepal Bhasa, Fulani, the Bravanese dialect of Swahili, the Warsh orthography for Arabic, and Tangut. It updates the collation specification and security recommendations.
Most Unicode implementations will require just font upgrades, but full support of some of the more unusual scripts will require attention to the migration notes.
“Asymmetric case mapping” sounds interesting. I believe this means that the conversion between upper case and lower case isn’t one-to-one and reversible. The notes give the example of “the asymmetric case mapping of Greek final sigma to capital sigma.” Lowercase sigma has two forms; it’s σ except at the end of a word, where it’s ς. Both turn into Σ in uppercase.
What really has people excited about Unicode 9, if a Startpage search is any indication, isn’t any of these things, but that about 1% of the new characters are emoji and that Apple and Microsoft lobbied against one candidate emoji. I wonder if the Unicode Consortium regrets having gotten involved in that mess in the first place. There are no possible criteria except whims for what the set should include. There’s no limit on how many could be added. OK, having a universally set of encodings promotes information interchange, but the tail is wagging the 🐕.
By the way, what’s the plural of “emoji”? I use “emoji” as both singular and plural, but I’m seeing “emojis” with increasing frequency. It just looks wrong to me. Does anyone say “kanjis” or “romajis” for the other Japanese character sets? I had to argue with the editor to keep the title of my article “The War on Emoji” that way.