Tag Archives: HTML

Web archiving and languages

Web archiving is difficult. Few sites consist entirely of static, self-contained content. Most use JavaScript, often from external sites. Responsive pages are designed to look different in different environments. An archive needs to make a snapshot that reflects its appearance at a given point in time, but what exactly does that mean? Should an archive pick an appearance for one reasonable set of parameters, or should it try to keep the page’s dynamic nature? Will the fact that it’s an archive rather than an interactive browser affect what the server gives it?
Continue reading

HTML mail is a terrible idea — but at least please do it right

Originally email consisted just of text messages. They were straightforward to read. It was very hard to send malware in a convincing way, since the recipient would have to extract any malicious attachment and run it by hand. There was a hoax in 1994 warning of the alleged “Goodtimes virus”, which caused a lot of merriment among the computer-literate. The only “virus” was the hoax email itself, which the less computer-literate forwarded to all their friends.

Then came HTML mail, a huge advance in email insecurity. Now malicious URLs could hide behind links or even be opened automatically. It could include JavaScript to exploit client weaknesses and trick recipients. Today, almost everyone recognizes these advantages, and malware and phishing by email are multi-billion-dollar businesses.

Doing it right, or not doing it at all

Even so, there are good and bad ways to create HTML mail. Continue reading

Canvas fingerprinting in Web pages

The array of sneaky tricks to get past Internet users’ veil of privacy is astonishing. At least it would be, if we weren’t all past the capacity for astonishment. One which has been around for years is Canvas fingerprinting. It lets servers narrow your profile down to a small number of clients. Combined with other measures, it can uniquely identify you.

How Canvas works

Canvas wasn’t designed to spy on you. It’s a way to draw graphics very efficiently in a browser. It supports animation and interaction. In order to get fast performance, it allows hardware acceleration and doesn’t mandate the exact set of pixels to be drawn. The server can then get those pixels back using getImageData() or toDataURL() in the Canvas API.
Continue reading

PDF or HTML for public documents?

Should official online documents be PDF files? Many institutions say they obviously should, but the format has some clear disadvantages. An article on the UK’s Government Digital Service site argues that HTML, not PDF, is the right format for UK government documents. Its arguments, to the extent that they’re valid, apply to lots of other documents.

It makes a plausible case against PDF. The trouble is that the case against HTML is even stronger in some ways.
Continue reading

The end of Flash — for real this time

We’ve been hearing reports of Adobe Flash’s death for years. But it’s not over till Adobe says it is, and now Adobe has declared a termination date for Flash support.

Adobe is planning to end-of-life Flash. Specifically, we will stop updating and distributing the Flash Player at the end of 2020 and encourage content creators to migrate any existing Flash content to these new open formats.

Continue reading

More on SVG risks

SVG is a risky format in more ways than I’d realized. I’d previously mentioned the risk of cross-site scripting with embedded JavaScript, but I’ve found it gets worse.

The article “Crouching Tiger – Hidden Payload: Security Risks of Scalable Vector Graphics” covers the hazards in detail. There are two problems: (1) HTML5 requires SVG support in multiple contexts, and (2) SVG can have embedded JavaScript and CSS.

SVG is XML, and embedding it in HTML means switching between two different parsing modes. The author, Thorsten Holz at Ruhr-University Bochum, states that “SVG files must be considered fully functional, one-file web applications potentially containing HTML, JavaScript, Flash, and other interactive code structures.” I still haven’t digested all the content, but it describes lots of ways SVG could be exploited.

Websites that allow third-party posting should disallow or filter SVG content. WordPress disallows SVG uploads by default.

SVG is a designed-in danger in HTML5.

HTML 5.1 and 5.2

HTML 5.1 is now a W3C proposed recommendation, and the comment period has closed. If no major issues have turned up, it may become a recommendation soon, susperseding HTML 5.0.

Browsers already support a large part of what it includes, so a discussion of its “new” features will cover ones that people already thought were a part of HTML5. The implementations of HTML are usually ahead of the official documents, with heavy reliance on working drafts in spite of all the disclaimers. Things like the picture element are already familiar, even though they aren’t in the 5.0 specification.
Continue reading

Security risk in “target=_blank”

I’ve often used “target=_blank” in my posts so that people can click on a link without leaving the original page. So do many people. This turns out to be a seriously risky practice, though. When you open a window with an anchor tag specifying “target=_blank”, you give the target window control of the original window’s location object! This means that the target window can modify the content of the original window, possibly redirecting it to a phishing page.

We could also call this a security hole in the HTML DOM, or perhaps in the whole idea of allowing JavaScript in Web pages. I use NoScript with Firefox so that unfamiliar pages won’t run JavaScript, preventing them from exploiting this hole. I can’t expect everybody reading this blog to do that, though. To protect against exploits, I’d need to add “rel=noopener” for some browsers and “rel=”noreferrer” for others. That would require custom JavaScript, which wordpress.com won’t let me do, and would be a lot of work just to modify link behavior. Starting with this post, I’m not using “target=_blank” in my links. The sites I’ve linked to in the past are reputable, as far as I know, so the risk from existing links should be minimal. At least I hope so; supposedly trustworthy websites allow advertisers to include unvetted JavaScript, allowing malware attacks.

Taming websites in your own browser

Keep Calm and Don't Blink (with Tardis)HTML lets Web designers annoy you with tags like embed, marquee, and blink, or with light green text against a blue-sky background. You can just curse or use a different site, but there’s a way to fight back: custom CSS in your browser. It can not only disable whole tags, but modify or get rid of unwanted elements in a site by setting rules for their classes.

You need to know CSS pretty well to venture into this; I’m assuming you’re comfortable with it. If you are, the tricky part is just to find out where it goes. For Firefox under OS X, under the “Help” menu, choose “Troubleshooting information.” In the window that comes up, look under “Application Basics” for “Profile Folder.” There’s a “Show in Finder” button next to it. Click on this, and you’ll see the directory which holds your profile.
Continue reading

The misuses of HTML frames

HTML framesets have some good uses, such as including third-party content. They also have misuses, such as disguising third-party involvement.

Recently I needed to set up domain forwarding for a subdomain registered with Godaddy. (The choice of registrar wasn’t my fault.) A couple of options were available, including one that claimed to guarantee that the subdomain would persist through navigation in the address bar. That sounded like a good thing, so I picked it.

At first it seemed to work fine; but when I tried to use the URL of an image on the site, there were weird errors. I soon found out what was going on: Godaddy was wrapping every page referenced by the subdomain in a frameset! This looks like a duck and clicks like a duck, but it isn’t one, and anything that tries to treat HTML as a JPEG file isn’t going to work very well.

Stack Overflow has several reports of people being bitten by this:

Frame wrapping is a good-enough solution for some cases, but when you aren’t told it’s happening, that’s a seriously wrong way to do it. It’s also a security concern, since your domain points at an IP address that you don’t control, and only indirectly at your own site.

This is a blog on file formats, not on irresponsible domain registrars, so the moral here is to realize that framesets aren’t a completely transparent way to provide third-party content. It’s fine to use them, but only if you’re aware that the frameset host and the frame provider are active partners.