Unicode characters ought to have a specific denotation, even if their exact appearance depends on the font. A letter, a punctuation mark, or a Chinese ideograph should have the same meaning to everyone who reads it. There are problems, of course. There’s no systematic difference in appearance between A, the first letter of the Roman alphabet, and Α, Alpha, the first letter of the Greek alphabet. (However, when I had my computer read this article aloud to me for proofreading, it pronounced the latter as “Greek capital letter alpha”! Nice! It also pronounced the names of the emoji in this article, except the new ones in Unicode 11.0.) In some fonts, you can’t even tell the lower case letter l from the number 1 without looking carefully. This problem allows homograph attacks and “typosquatting.”
But the worst problem is with the Unicode Consortium’s great headache, emoji. These picture characters have just brief verbal descriptions in the Unicode standard, and font designers for different companies produce renderings that have vastly different connotations. Motherboard offers a sampling of the varied renderings. Here’s the “grimacing face” from Apple, Google, Samsung, and LG respectively.
Today I learned from a science fiction discussion group that SMS messages don’t use UTF-8. In fact, they don’t even use ASCII or an extension of it. It’s a case of old technology which has survived beyond its time.
The usual encoding for SMS text messages is GSM-7. Most cell phones use it, regardless of whether they’re on the GSM network or not. They generally support Unicode as well, but in a strange way.
July 17 was World Emoji Day. Anyone can declare a World Anything Day, but my local library thought it was important enough to give it part of a sign, along with Cell Phone Courtesy Month.
They didn’t think it was important enough to give accurate information, though. It does tell us something about how non-tech people think of emoji. Here’s the content of the sign, with commentary.
In Orwell’s 1984, the Newspeak language followed the principle that if you can abolish certain words, you can abolish the thoughts that go with them.
It was intended that when Newspeak had been adopted once and for all and Oldspeak forgotten, a heretical thought — that is, a thought diverging from the principles of Ingsoc — should be literally unthinkable, at least so far as thought is dependent on words. … This was done partly by the invention of new words, but chiefly by eliminating undesirable words and by stripping such words as remained of unorthodox meanings, and so far as possible of all secondary meanings whatever.
Apple is doing something like this with Unicode codepoint U+1F52B (🔫), which the code chart defines as PISTOL, with the explanatory text of “handgun, revolver.” There’s nothing that suggests it’s supposed to represent a water gun or any other kind of toy. However, Apple has elected to represent this character as a water pistol in iOS 10.
This post may be illegal in Indonesia. It includes the code point sequence U+1F468 U+200D U+2764️ U+FE0F U+200D U+1F48B U+200D U+1F468, which renders as the emoji 👨❤️💋👨 or “man kissing man.” According to a Time article, the Indonesian Ministry of Communication and Informatics is “asking” Facebook to block the use of “gay” emoji. Failure to comply could mean the Negative Content Management Panel (George Orwell would have been impressed!) will block Facebook in Indonesia.
Emoji have generated several controversies already, but this is the first I’ve heard of a government censoring code points. It’s couched in terms of “sensitivity,” “respect,” and protecting children.
Encoding all the characters of all the world’s languages is an endless task. Unicode 8.0 improves the treatment of Cherokee, Tai Lue, Devangari, and more. For a lot of people, the most interesting part will be the implementation of “diverse” emoji in a variety of colors. A Unicode Consortium report explains:
People all over the world want to have emoji that reflect more human diversity, especially for skin tone. The Unicode emoji characters for people and body parts are meant to be generic, yet following the precedents set by the original Japanese carrier images, they are often shown with a light skin tone instead of a more generic (nonhuman) appearance, such as a yellow/orange color or a silhouette.
Five symbol modifier characters that provide for a range of skin tones for human emoji are planned for Unicode Version 8.0 (scheduled for mid-2015). These characters are based on the six tones of the Fitzpatrick scale, a recognized standard for dermatology (there are many examples of this scale online, such as FitzpatrickSkinType.pdf). The exact shades may vary between implementations.
… When a human emoji is not immediately followed by a emoji modifier character, it should use a generic, non-realistic skin tone.