Tag Archives: HTML

Canvas fingerprinting, the technical stuff

The ability of websites to bypass privacy settings with “canvas fingerprinting” has caused quite a bit of concern, and it’s become a hot topic on the Code4lib mailing list. Let’s take a quick look at it from a technical standpoint. It is genuinely disturbing, but it’s not the unstoppable form of scrutiny some people are hyping it as.

The best article to learn about it from is “Pixel Perfect: Fingerprinting Canvas in HTML5,” by Keaton Mowery and Hovav Shacham at UCSD. It describes the basic technique and some implementation details.

Canvas fingerprinting is based on the <canvas> HTML element. It’s been around for a decade but was standardized for HTML5. In itself, <canvas> does nothing but define a blank drawing area with a specified width and height. It isn’t even like the <div> element, which you can put interesting stuff inside; if all you use is unscripted HTML, all you get is some blank space. To draw anything on it, you have to use JavaScript. There are two APIs available for this: the 2D DOM Canvas API and the 3D WebGL API. The DOM API is part of the HTML5 specification; WebGL relies on hardware acceleration and is less widely supported.

Either API lets you draw objects, not just pixels, to a browser. These include geometric shapes, color gradients, and text. The details of drawing are left to the client, so they will be drawn slightly differently depending on the browser, operating system, and hardware. This wouldn’t be too exciting, except that the API can read the pixels back. The getImageData method of the 2D context returns an ImageData object, which is a pixel map. This can be serialized (e.g., as a PNG image) and sent back to the server from which the page originated. For a given set of drawing commands and hardware and software configuration, the pixels are consistent.

Drawing text is one way to use a canvas fingerprint. Modern browsers use a programmatic description of a font rather than a bitmap, so that characters will scale nicely. The fine details of how edges are smoothed and pixels interpolated will vary, perhaps not enough for any user to notice, but enough so that reading back the pixels will show a difference.

However, the technique isn’t as frightening as the worst hype suggests. First, it doesn’t uniquely identify a computer. Two machines that have the same model and come from the same shipment, if their preinstalled software hasn’t been modified, should have the same fingerprint. It has to be used together with other identifying markers to narrow down to one machine. There are several ways for software to stop it, including blocking JavaScript from offending domains and disabling part or all of the Canvas API. What gets people upset is that neither blocking cookies nor using a proxy will stop it.

Was including getImageData in the spec a mistake? This can be argued both ways. Its obvious use is to draw a complex canvas once and then rubber-stamp it if you want it to appear multiple times; this can be faster than repeatedly drawing from scratch. It’s unlikely, though, that the designers of the spec thought about its privacy implications.

HTML5 schedule

The HTML Working Group Chairs and the Protocols and Formats WG Chair have proposed a plan for making HTML5 a Recommendation by the end of 2014. Features would be postponed to subsequent releases as necessary.

Accomplishing this, of course, requires that the proposal be accepted by the end of 2014.

The two faces of HTML5

The question “What is HTML5?” has gotten more complicated. While W3C continues work on a full specification of HTML5, the Web Hypertext Application Technology Working Group (WHATWG) is pursuing a “living standard” approach that is frequently updated. Both groups are reassuring us that this doesn’t constitute a rift, but certainly it will make things tricky when resolving the fine points of the standard(s). Ian Hickson has gone into some detail on the W3C site about the relationship between the WHATWG HTML living standard and the W3C HTML5 specification.

The WHATWG “HTML Living Standard” site significantly has no version number.

Considering that HTML5 is already widely implemented even though it won’t be finalized till the year after next, it’s unlikely this will add any further confusion. By the time it becomes a W3C Recommendation, many implementers will doubtless have moved beyond it to new features.

State of HTML5 video

Long Tail Video has an interesting page on the state of HTML5 video. Their view is filtered through their own product, but it’s still a nice job of covering current trends.

HTML5 Encrypted Media Extensions

The Encrypted Media Extensions draft from W3C is drawing controversy. DRM on the Web is traditionally implemented in the service provider, where the content delivery service has full control. But what’s streamed can be captured, and there is software readily available to do it, even if it may violate the DMCA.

An article on Ars Technica reports that Ian Hickson of Google criticized the proposal as both unethical and technically inadequate. Mark Watson, one of the authors of the draft, suggested that strong copy protection can be obtained by building it into hardware, which would mean that only some computers could receive the protected content. Hickson’s email is posted here; unfortunately, it doesn’t expand on what he thinks the problems are.

The draft is intended to accommocate “a wide range of media containers and codecs”; the question is which one or ones will be widely used in practice, and how they’ll be made available, particularly in connection with open-source browsers.

This is a potential area for browser fragmentation.

The HTML5 “sarcasm” tag

In the November 5 Editor’s Draft of HTML5: A vocabulary and associated APIs for HTML and XHTML, there is a curious reference to the “sarcasm” tag.

8.2.5.4.7 The “in body” insertion mode

When the user agent is to apply the rules for the “in body” insertion mode, the user agent must handle the token as follows:

An end tag whose tag name is “sarcasm”

Take a deep breath, then act as described in the “any other end tag” entry below.

This is the only reference to the tag, so I guess only the closing </sarcasm> tag is allowed, not the opening <sarcasm> tag.

Perhaps this was a test to see if anyone’s actually reading?

Adobe getting out of Flash for mobile

Steve Jobs gets a posthumous victory as Adobe will not be developing Flash for mobile devices past version 11. Adobe states that:

HTML5 is now universally supported on major mobile devices, in some cases exclusively. This makes HTML5 the best solution for creating and deploying content in the browser across mobile platforms. We are excited about this, and will continue our work with key players in the HTML community, including Google, Apple, Microsoft and RIM, to drive HTML5 innovation they can use to advance their mobile browsers.

Our future work with Flash on mobile devices will be focused on enabling Flash developers to package native apps with Adobe AIR for all the major app stores. We will no longer continue to develop Flash Player in the browser to work with new mobile device configurations (chipset, browser, OS version, etc.) following the upcoming release of Flash Player 11.1 for Android and BlackBerry PlayBook. We will of course continue to provide critical bug fixes and security updates for existing device configurations.

W3Conf

W3C has announced an upcoming conference on “HTML5 and the Open Web Platform”. The total information currently available is:

W3C, the web standards organization, is holding its first conference.
 
If you are a developer or designer wanting to hear the latest news on HTML5 and the open web platform, and your place in it, save the date. This event will be held in Seattle and live streaming to the world on November 15-16.
 
More details soon…

This is very short notice for a conference, but the topic is interesting.

HTML5 as a “programming language”

A JavaWorld article rhetorically asks, “Will HTML5 kill the mobile app?” Windows 8 will purportedly have a new type of application, written in HTML5 and JavaScript. I have to wonder whether the people who are proposing HTML5, CSS3, and JavaScript as a programming environment have the least idea of what programming is about.

The idea is so bizarre that it’s hard to know where to start a refutation. How would you refute a claim that silly putty is going to be the new way to build skyscrapers? HTML, in any version, just isn’t a programming language. JavaScript can be used for some programming tasks — in principle, it can implement any computation that you could write in another language — but doing anything but the simplest programming tasks in it is agonizing.

There are innocent people who’ve copied a script to produce a Web page effect, and there are less innocent people who find it convenient to delude them with the notion that that’s what programming is. The web page for HTML5 for Dummies declares: “HTML is the predominant programming language used to create Web pages.” If you can believe that, you’re part of the target audience of the title.

HTML5, just three years away

According to the latest version of the HTML Working Group Charter, HTML5 will become a W3C recommendation in 2014.

Smart money is on the AES audio metadata schema being made public first, but I wouldn’t be too sure.