Category Archives: News

HTML5 security

Yesterday, February 24, Ming Chow gave a talk to the ABCD security group at Harvard on HTML5 security. As far as I can tell he hasn’t made any of the content publicly available online, but here are some high points:

  • HTML5 has a lot of new features, giving it a bigger “attack surface.”
  • There’s no effective security to local and session storage, so writing sensitive information there is a bad idea.
  • The database feature raises all the standard concerns about injection of malicious SQL code into fields.
  • Application caches can be written by any website. It may be possible to spoof pages this way.
  • There is now a function, XDomainRequest, in JavaScript, which allows communication between different sites. The receiver of the request must specify Access-Control-Allow-Origin to indicate whose requests are allowed. Wild-carding this allows anyone at all to send data to a page, which may be dangerous. Implementers of a receiver should always verify the sender’s identity.
  • With the audio, video, and canvas tags, the codecs can be vulnerable. Opera has been hit with a heap buffer overflow exploit in HTML5.
  • The noscript tag is no longer supported. Users who try to make themselves safer by disabling Javascript are more screwed than ever.
  • The problems are new, but the approach to safety is the same: common sense, input validation, being careful with unsecured connections, etc.

JHOVE CVS repository back up

Because of security issues at SourceForge, all CVS repositories, including the one for JHOVE, were down for a week or so. They’re back up now. SourceForge provides details here.

CVS is getting to be ancient technology, so I may migrate the repository to Subversion at some point.

The HTML5 logo again

In an earlier post, I questioned how W3C’s new HTML5 logo could help provide a “consistent, standardized visual vocabulary” when it stood for nothing in particular. Others have taken even stronger positions than mine, and W3C has backtracked. The HTML5 logo now stands for HTML5, not for HTML5, CSS3, H.264, and every other “cool” technology showing up on the web these days.

It’s still, as I noted, not a mark of conformance or certification, so its use on a website proves nothing, but at least now what it’s claiming to say is clearer.

SourceForge security incident and doppelgänger characters

This morning I got an email from SourceForge saying that all passwords had been reset because of a password sniffing incident. Naturally, I’m suspicious of all email of this kind, but I do have a SourceForge account. So rather than follow any of the links in the mail, I tried to log in normally and found that passwords were in fact reset. I followed the procedure for resetting by email and my account’s working again.

I’m sure some of you reading this also have SourceForge accounts, so this bit of reassurance may be helpful, especially if your phishing filters (philters?) kept you from seeing the notice in the first place. It’s likely some fakers will set up scams to take advantage of this issue, so always go to the SourceForge website by typing in the URL or using a bookmark, rather than by following a link from email. It’s easy to mistake a near-lookalike URL on a quick glance.

Worse yet (yes, this post has something to do with formats), there are now exact lookalike URL’s, thanks to the unfortunate policy of allowing Unicode in URL’s. There are numerous cases where characters in non-English character sets normally look just like letters of the Roman alphabet. Someone could, in principle, register sourceforgе.net, which looks just like sourceforge.net — but do a local text search for “sourceforge” in your browser, and you’ll notice the first “sourceforgе.net” (and this one) are skipped over. The sixth letter isn’t the ASCII letter “e” but the Russian letter “e,” which usually looks the same or very nearly.

If your browser doesn’t have a Cyrillic font, you may be seeing a placeholder glyph instead. Or if it views the page in Latin-1 instead of UTF-8, you may see a Capital D followed by a Greek lower-case mu.

With any email that offers to correct a password issue, exercise extreme caution, even though some are legitimate.

HTML5 logo

HTML5 logoW3C has a new logo for HTML5. The blog post says:

As you’re aware, the term HTML5 has taken on a life of its own; there has been significant confusion and debate both within the developer community and in the public at large as to what exactly HTML5 is when the term is used outside of simply referring to the spec itself. This variability in perception is what inspired the project – a group of developers and HTML5 evangelists came to us and posed the question, ‘How can we better communicate all of the technologies and potential that HTML5 represents?’ …and the resounding answer was, the standard needs a standard. That is, HTML5 needs a consistent, standardized visual vocabulary to serve as a framework for conversations, presentations, and explanations moving forward.

How it will do this when the logo stands for nothing in particular — it isn’t a mark of conformance, certification, or anything else, and anyone can use it under a CC license — isn’t clear.

PDF/A-2 ratified

This time it’s from the PDF/A Competence Center, so I’m pretty sure it’s real: On November 30, the committee for ISO 19005 met in Ottawa and ratified Part 2 of IDO 19005, aka PDF/A. PDF/A is a restricted profile for PDF which is designed to guarantee long-term usability of conforming files.

The previous version, PDF/A-1, was based on PDF 1.4. This is based on ISO 32000-1, which is equivalent to PDF 1.7. Valid PDF/A-1 files are also valid under PDF/A-2.

ISO 19005:2005, or PDF/A-1, is available for purchase from ISO, but as of this writing the new one, which presumably will be ISO 19005:2010, isn’t being offered online yet.

I can’t make any promises about when JHOVE will support PDF/A-2, if ever. Any work I do on it is on my own time. Of course, if someone else wants to run with it, the source is there and I can answer questions.

JHOVE2 goes to beta

The JHOVE2 team has announced a beta release:

This beta code release supports all the major technical objectives of the project, including a more sophisticated, modular architecture; signature-based file identification; policy-based assessment of objects; recursive characterization of objects comprising aggregate files and files arbitrarily nested in containers; and extensive configuration and reporting options. The release also continues to fill out the roster of supported formats, with modules for ICC color profiles, SGML, Shapefile, TIFF, UTF-8, WAVE, and XML.

The source code page provides the source as a Mercurial repository, or as a single download. The gzip download expands into a file called main-14e8a6102f63 and it isn’t at all obvious what to do with it. Chmoding it to an executable and running it doesn’t work. I’ve asked what this is supposed to be; I’ll update this post when I get a response.

Update: That’s a tarball. Adding the .tar extension and using tar -xvf works nicely.

ZIP standardization

The ZIP format is widely used, both by itself and as part of other widely used formats such as ODF, yet it’s never been standardized. Caroline Arms of the Library of Congress has informed the JHOVE2 list that there’s a new study group under ISO/IEC JTC1 SC34 WG1, which is looking into the standardization of ZIP. There is a Wiki for this study as well as a mailing list archive.

Membership in the group requires going through the appropriate national standards group.