Tag Archives: HTML

More on SVG risks

SVG is a risky format in more ways than I’d realized. I’d previously mentioned the risk of cross-site scripting with embedded JavaScript, but I’ve found it gets worse.

The article “Crouching Tiger – Hidden Payload: Security Risks of Scalable Vector Graphics” covers the hazards in detail. There are two problems: (1) HTML5 requires SVG support in multiple contexts, and (2) SVG can have embedded JavaScript and CSS.

SVG is XML, and embedding it in HTML means switching between two different parsing modes. The author, Thorsten Holz at Ruhr-University Bochum, states that “SVG files must be considered fully functional, one-file web applications potentially containing HTML, JavaScript, Flash, and other interactive code structures.” I still haven’t digested all the content, but it describes lots of ways SVG could be exploited.

Websites that allow third-party posting should disallow or filter SVG content. WordPress disallows SVG uploads by default.

SVG is a designed-in danger in HTML5.

HTML 5.1 and 5.2

HTML 5.1 is now a W3C proposed recommendation, and the comment period has closed. If no major issues have turned up, it may become a recommendation soon, susperseding HTML 5.0.

Browsers already support a large part of what it includes, so a discussion of its “new” features will cover ones that people already thought were a part of HTML5. The implementations of HTML are usually ahead of the official documents, with heavy reliance on working drafts in spite of all the disclaimers. Things like the picture element are already familiar, even though they aren’t in the 5.0 specification.
Continue reading

Security risk in “target=_blank”

I’ve often used “target=_blank” in my posts so that people can click on a link without leaving the original page. So do many people. This turns out to be a seriously risky practice, though. When you open a window with an anchor tag specifying “target=_blank”, you give the target window control of the original window’s location object! This means that the target window can modify the content of the original window, possibly redirecting it to a phishing page.

We could also call this a security hole in the HTML DOM, or perhaps in the whole idea of allowing JavaScript in Web pages. I use NoScript with Firefox so that unfamiliar pages won’t run JavaScript, preventing them from exploiting this hole. I can’t expect everybody reading this blog to do that, though. To protect against exploits, I’d need to add “rel=noopener” for some browsers and “rel=”noreferrer” for others. That would require custom JavaScript, which wordpress.com won’t let me do, and would be a lot of work just to modify link behavior. Starting with this post, I’m not using “target=_blank” in my links. The sites I’ve linked to in the past are reputable, as far as I know, so the risk from existing links should be minimal. At least I hope so; supposedly trustworthy websites allow advertisers to include unvetted JavaScript, allowing malware attacks.

Taming websites in your own browser

Keep Calm and Don't Blink (with Tardis)HTML lets Web designers annoy you with tags like embed, marquee, and blink, or with light green text against a blue-sky background. You can just curse or use a different site, but there’s a way to fight back: custom CSS in your browser. It can not only disable whole tags, but modify or get rid of unwanted elements in a site by setting rules for their classes.

You need to know CSS pretty well to venture into this; I’m assuming you’re comfortable with it. If you are, the tricky part is just to find out where it goes. For Firefox under OS X, under the “Help” menu, choose “Troubleshooting information.” In the window that comes up, look under “Application Basics” for “Profile Folder.” There’s a “Show in Finder” button next to it. Click on this, and you’ll see the directory which holds your profile.
Continue reading

The misuses of HTML frames

HTML framesets have some good uses, such as including third-party content. They also have misuses, such as disguising third-party involvement.

Recently I needed to set up domain forwarding for a subdomain registered with Godaddy. (The choice of registrar wasn’t my fault.) A couple of options were available, including one that claimed to guarantee that the subdomain would persist through navigation in the address bar. That sounded like a good thing, so I picked it.

At first it seemed to work fine; but when I tried to use the URL of an image on the site, there were weird errors. I soon found out what was going on: Godaddy was wrapping every page referenced by the subdomain in a frameset! This looks like a duck and clicks like a duck, but it isn’t one, and anything that tries to treat HTML as a JPEG file isn’t going to work very well.

Stack Overflow has several reports of people being bitten by this:

Frame wrapping is a good-enough solution for some cases, but when you aren’t told it’s happening, that’s a seriously wrong way to do it. It’s also a security concern, since your domain points at an IP address that you don’t control, and only indirectly at your own site.

This is a blog on file formats, not on irresponsible domain registrars, so the moral here is to realize that framesets aren’t a completely transparent way to provide third-party content. It’s fine to use them, but only if you’re aware that the frameset host and the frame provider are active partners.

Best viewed with a big-name browser

A few websites refuse to present content if you use a browser other than one of the four or so big-name ones.

An "unsupported browser" message from Apple's support website

The example shown is what I got when I accessed Apple’s support site with iCab, a relatively obscure browser which I often use. Many of Google’s pages also refuse to deliver content to iCab.

There is a real problem that JavaScript isn’t standardized, and it’s necessary to test with each browser to be confident that a page will work properly. However, if a page sticks with the basics of JavaScript and isn’t trying to do animations, video, or other cutting-edge effects, then any reasonably up-to-date implementation of JavaScript should be able to handle it. It’s reasonable to display a warning if the browser is an untested one, but there’s no reason to block it.

Browsers can impersonate other browsers by setting the User-Agent header, and small-name browsers usually provide that option for getting around these problems. After a couple of tries with iCab, I was able to get through by impersonating Safari. Doing this also has an advantage for privacy; identifying yourself with a little-used browser can greatly contribute to unique identification when you may want anonymity. From the standpoint of good website practices, though, a site shouldn’t be locking browsers out unless there’s an unusual need. Web pages should follow standards so that they’re as widely readable as possible. This is especially important with a “contact support” page.

Apple and Google both are browser vendors. Might we look at this as a way to make entry by new browsers more difficult?

The animated GIF is the new blink tag

In the early days of HTML, the most hated tag was the <blink> tag, which made text under it blink. There were hardly any sensible uses for it, and a lot of browsers now disable it. I just tested it in this post, and WordPress actually deleted the tag from my draft when I tried to save it. (I approve!)

Today, though, the <blink> tag isn’t annoying enough. Now we have the animated GIF. It’s been around since the eighties, but for some reason it’s become much more prevalent recently. It’s the equivalent of waving a picture in your face while you’re trying to read something.

I can halfway understand it when it’s done in ads. Advertisers want to get your attention away from the page you’re reading and click on the link to theirs. What I don’t understand is why people use it in their own pages and user icons. It must be a desire to yell “Look how clever I am!!!” over and over again as the animation cycles.

Fortunately, some browsers provide an option to disable it. Firefox used to let you stop it with the ESC key, but last year removed this feature.

If you think that your web page is boring and adding some animated GIFs is just what’s needed to bring back the excitement — Don’t. Just don’t.

Update: I just discovered that a page that was driving me crazy because even disabling animated GIFs wouldn’t stop it was actually using the <marquee> tag. I believe that tag is banned by the Geneva Convention.

Canvas fingerprinting, the technical stuff

The ability of websites to bypass privacy settings with “canvas fingerprinting” has caused quite a bit of concern, and it’s become a hot topic on the Code4lib mailing list. Let’s take a quick look at it from a technical standpoint. It is genuinely disturbing, but it’s not the unstoppable form of scrutiny some people are hyping it as.

The best article to learn about it from is “Pixel Perfect: Fingerprinting Canvas in HTML5,” by Keaton Mowery and Hovav Shacham at UCSD. It describes the basic technique and some implementation details.

Canvas fingerprinting is based on the <canvas> HTML element. It’s been around for a decade but was standardized for HTML5. In itself, <canvas> does nothing but define a blank drawing area with a specified width and height. It isn’t even like the <div> element, which you can put interesting stuff inside; if all you use is unscripted HTML, all you get is some blank space. To draw anything on it, you have to use JavaScript. There are two APIs available for this: the 2D DOM Canvas API and the 3D WebGL API. The DOM API is part of the HTML5 specification; WebGL relies on hardware acceleration and is less widely supported.

Either API lets you draw objects, not just pixels, to a browser. These include geometric shapes, color gradients, and text. The details of drawing are left to the client, so they will be drawn slightly differently depending on the browser, operating system, and hardware. This wouldn’t be too exciting, except that the API can read the pixels back. The getImageData method of the 2D context returns an ImageData object, which is a pixel map. This can be serialized (e.g., as a PNG image) and sent back to the server from which the page originated. For a given set of drawing commands and hardware and software configuration, the pixels are consistent.

Drawing text is one way to use a canvas fingerprint. Modern browsers use a programmatic description of a font rather than a bitmap, so that characters will scale nicely. The fine details of how edges are smoothed and pixels interpolated will vary, perhaps not enough for any user to notice, but enough so that reading back the pixels will show a difference.

However, the technique isn’t as frightening as the worst hype suggests. First, it doesn’t uniquely identify a computer. Two machines that have the same model and come from the same shipment, if their preinstalled software hasn’t been modified, should have the same fingerprint. It has to be used together with other identifying markers to narrow down to one machine. There are several ways for software to stop it, including blocking JavaScript from offending domains and disabling part or all of the Canvas API. What gets people upset is that neither blocking cookies nor using a proxy will stop it.

Was including getImageData in the spec a mistake? This can be argued both ways. Its obvious use is to draw a complex canvas once and then rubber-stamp it if you want it to appear multiple times; this can be faster than repeatedly drawing from scratch. It’s unlikely, though, that the designers of the spec thought about its privacy implications.

HTML5 schedule

The HTML Working Group Chairs and the Protocols and Formats WG Chair have proposed a plan for making HTML5 a Recommendation by the end of 2014. Features would be postponed to subsequent releases as necessary.

Accomplishing this, of course, requires that the proposal be accepted by the end of 2014.

The two faces of HTML5

The question “What is HTML5?” has gotten more complicated. While W3C continues work on a full specification of HTML5, the Web Hypertext Application Technology Working Group (WHATWG) is pursuing a “living standard” approach that is frequently updated. Both groups are reassuring us that this doesn’t constitute a rift, but certainly it will make things tricky when resolving the fine points of the standard(s). Ian Hickson has gone into some detail on the W3C site about the relationship between the WHATWG HTML living standard and the W3C HTML5 specification.

The WHATWG “HTML Living Standard” site significantly has no version number.

Considering that HTML5 is already widely implemented even though it won’t be finalized till the year after next, it’s unlikely this will add any further confusion. By the time it becomes a W3C Recommendation, many implementers will doubtless have moved beyond it to new features.