The Unicode Consortium has announced the release of Unicode 9.0. It adds character sets for some little-known languages, including Osage, Nepal Bhasa, Fulani, the Bravanese dialect of Swahili, the Warsh orthography for Arabic, and Tangut. It updates the collation specification and security recommendations.
Most Unicode implementations will require just font upgrades, but full support of some of the more unusual scripts will require attention to the migration notes.
“Asymmetric case mapping” sounds interesting. I believe this means that the conversion between upper case and lower case isn’t one-to-one and reversible. The notes give the example of “the asymmetric case mapping of Greek final sigma to capital sigma.” Lowercase sigma has two forms; it’s σ except at the end of a word, where it’s ς. Both turn into Σ in uppercase.
What really has people excited about Unicode 9, if a Startpage search is any indication, isn’t any of these things, but that about 1% of the new characters are emoji and that Apple and Microsoft lobbied against one candidate emoji. I wonder if the Unicode Consortium regrets having gotten involved in that mess in the first place. There are no possible criteria except whims for what the set should include. There’s no limit on how many could be added. OK, having a universally set of encodings promotes information interchange, but the tail is wagging the 🐕.
By the way, what’s the plural of “emoji”? I use “emoji” as both singular and plural, but I’m seeing “emojis” with increasing frequency. It just looks wrong to me. Does anyone say “kanjis” or “romajis” for the other Japanese character sets? I had to argue with the editor to keep the title of my article “The War on Emoji” that way.
Emoji interoperability (or its lack)
Unicode characters ought to have a specific denotation, even if their exact appearance depends on the font. A letter, a punctuation mark, or a Chinese ideograph should have the same meaning to everyone who reads it. There are problems, of course. There’s no systematic difference in appearance between A, the first letter of the Roman alphabet, and Α, Alpha, the first letter of the Greek alphabet. (However, when I had my computer read this article aloud to me for proofreading, it pronounced the latter as “Greek capital letter alpha”! Nice! It also pronounced the names of the emoji in this article, except the new ones in Unicode 11.0.) In some fonts, you can’t even tell the lower case letter l from the number 1 without looking carefully. This problem allows homograph attacks and “typosquatting.”
But the worst problem is with the Unicode Consortium’s great headache, emoji. These picture characters have just brief verbal descriptions in the Unicode standard, and font designers for different companies produce renderings that have vastly different connotations. Motherboard offers a sampling of the varied renderings. Here’s the “grimacing face” from Apple, Google, Samsung, and LG respectively.
Continue reading →
Comments Off on Emoji interoperability (or its lack)
Posted in commentary
Tagged emoji, standards, Unicode