Unicode security mechanisms

Unicode is a great thing, but sometimes its thoroughness poses problems. Different character sets often include characters that look exactly like common ASCII characters in most fonts, and these can be used to spoof domain names. Sometimes this is called a homograph attack or script spoofing. For instance, someone might register the domain gοοgle.com, which looks a lot like “google.com,” but actually uses the Greek letter omicron instead of the Roman letter o. (Search this page in your browser for “google” if you don’t believe me.) Such tricks could lure unwary users into a phishing site. A real-life example, which didn’t even require more than ASCII, was a site called paypaI.com — that’s a capital I instead of a lower-case L, and they look the same in some fonts. That was way back in 2000.

The report “Unicode Security Considerations” discusses the problem, and
a new draft technical standard, “Unicode Security Mechanisms,” suggests wqys to address it. It’s a complicated matter, since so many different forms of spoofing are available, and so many combinations of characters might be legitimate in some cases. The report mentions issues such as using CSS to invoke confusing fonts, throwing in right-to-left scripts like Hebrew, and spoofing syntax with characters that look like the ASCII slash or question mark. Browsers, email programs, and users all have a role to play.

The possibilities for spoofing lead to a contest between efforts to block spoofing and efforts to bypass security checks. Be careful, and beware of links in suspicious email even if the domain names look right.

Comments are closed.