New look and URL for LoC formats site

The Library of Congress has reorganized its site on file format sustainability and given it a new URL. (The old one redirects there.) A blog entry discusses the change. Relationships among formats are a big part of the site. It’s significant, for instance, that the MP3 encoding and the de facto MP3 file format get separate entries.

My reactions are mixed. When you click “Format Descriptions” on the main page, you get a page titled “Format Description Categories.” The nesting description at the top says you’re in “Format Descriptions as XML.” Eight categories are listed, and two formats plus “All xxx format descriptions” are listed under each category. There’s no obvious reason why those two formats get special prominence, or what the page has to do with XML.
Continue reading

JHOVE online hack day

My venture into the Techno-Liberty blog didn’t work so well. In fact, I’m getting more views on this blog, in spite of not having posted in months, than I got on my best days on the other blog. So … I’m back.

JHOVE is still doing well too, thanks to excellent work by Carl Wilson and others at the Open Preservation Foundation. There will be an online hack day for JHOVE on April 27. The aim is to find ways to improve JHOVE by improving error reporting, collecting example files, and documenting the preservation impact of JHOVE validation issues. (I think that last one means “Why does McGath’s PDF module suck?” :)

The time listed is 8 AM-8 PM. I asked what time zone that is, and was told it means any and all, from New Zealand the long way around to Hawaii.

Last time I said I’d drop in and didn’t really manage to. This time I won’t make promises, but I’ll try to be around in some form. If nothing else, people can ask me questions about JHOVE in the comments.

Shifting focus

You may have noticed this blog has been less active for a while. It’s several years since I’ve been actively involved in digital preservation, apart from a PNG module for JHOVE. File formats are still a special love of mine, but I’m moving on to a new blog, reflecting more urgent concerns. This blog is called Techno-Liberty. It’s about the tools for staying free through open communication, privacy, and new technologies.

This blog will stay around as long as WordPress doesn’t purge it, but new posts may be rare. I want to put as much effort as I can into making Techno-Liberty an interesting blog with a steady stream of substantial content. I hope many of you will find it worth following.

Malware in a PNG file (for real)

Not every report of malware in an image file is spurious. A report of malware smuggled through a PNG file looks plausible to me. It claims that for two years, criminals got malware undetected onto respectable sites with this technique. Ironically, the ads included ones for a so-called “Browser Defence.”

Unlike Check Point’s “Imagegate,” this report doesn’t claim the image alone can do anything, and it describes the technique in considerable detail. Check Point said it would give specifics about “Imagegate,” like what format is affected, “only after the remediation of the vulnerability in the major affected websites.” It’s still waiting, apparently.

The PNG exploit is impressively sneaky. A script which doesn’t trigger alarms checks the host browser’s defenses. If it finds a vulnerable target, it loads a PNG file whose alpha channel encodes the malware script, then decodes the script and runs it. The actual malware takes advantage of — wouldn’t you know it? — Flash vulnerabilities. The user doesn’t have to do anything except view the page to be victimized.

This doesn’t mean any PNG file is dangerous in itself. An external script has to extract the JavaScript from the alpha channel and run it. So this counts as an exploit of a file format, but not as a vulnerability in it. Malicious code can be embedded in any format that has room for some noise in its data.

Attacks like this are why ad blockers have become so popular.

More on SVG risks

SVG is a risky format in more ways than I’d realized. I’d previously mentioned the risk of cross-site scripting with embedded JavaScript, but I’ve found it gets worse.

The article “Crouching Tiger – Hidden Payload: Security Risks of Scalable Vector Graphics” covers the hazards in detail. There are two problems: (1) HTML5 requires SVG support in multiple contexts, and (2) SVG can have embedded JavaScript and CSS.

SVG is XML, and embedding it in HTML means switching between two different parsing modes. The author, Thorsten Holz at Ruhr-University Bochum, states that “SVG files must be considered fully functional, one-file web applications potentially containing HTML, JavaScript, Flash, and other interactive code structures.” I still haven’t digested all the content, but it describes lots of ways SVG could be exploited.

Websites that allow third-party posting should disallow or filter SVG content. WordPress disallows SVG uploads by default.

SVG is a designed-in danger in HTML5.

The “Imagegate” rumor mill goes wild

Search for “imagegate,” and you’ll find lots of articles claiming there’s a malware risk in JPEG files. Look more closely, and you’ll notice they don’t provide any support for the claim. They all take an article from Check Point as their source, but there are two little problems with that: (1) The article doesn’t blame JPEG files, and (2) as I noted in my last post, it’s inept reporting.

Looking very closely at the video which accompanies the Check Point article, I see that it shows a file called “payload.jpg” being uploaded. This must be what all these sites are going by. You have to look really close to see the blurry file name coming up, and I give these sites credit for examining it more closely than I did.

If this was Check Point’s way of tipping people off that the weakness is in JPEG, it’s a strange way to do it. Did they think that ordinary users would catch it, but malware authors would be tricked by the article’s reference to non-image formats like JS and HTA as “images”?

None of the articles I’ve seen question why Check Point tipped them off in this way or note the inconsistencies in the article and the video. None of them ask why Check Point refuses to give any information until “the major affected websites” fix the problem, when a format vulnerability impacts any software that reads the files.

Doesn’t anyone know how to do journalism any more?

Update: Facebook says that “Imagegate” is bull.

Aside

Lately this blog hasn’t been showing up on Google. It’s unfortunately necessary to convince Google I’m real, so I’ve added a confirmation meta tag and linked to this blog from a Google Page. As an extra advantage, you’ll be able to read my posts from Google, if you’re so inclined.

This is also a test post to see if that’s working. I’ll post about something more interesting soon.

Sloppy reporting of image file hazards

Reporting carries responsibility. When you tell the public about a risk, you need to tell them what the risk is, not just scare them. An article from Check Point Software Technologies, titled “ImageGate,” shows how bad even tech sites can get at clickbait reporting. According to Wikipedia, Check Point is a business with thousands of employees, not a hole-in-the-wall IT company that hires ghostwriters to write filler.

The article claims:

the attackers have built a new capability to embed malicious code into an image file and successfully upload it to the social media website. The attackers exploit a misconfiguration on the social media infrastructure to deliberately force their victims to download the image file. This results in infection of the users’ device as soon as the end-user clicks on the downloaded file.

Continue reading

JavaScript risk in SVG images

Malicious SVG images sent over Facebook Messenger are being used to deliver Locky ransomware.

An SVG file can contain a <script> tag, which contains executable JavaScript as CDATA. If it’s an image on a Web page, the JavaScript can run in the browser. This is a potential XSS weakness, if users can submit images to a site.
Continue reading

HTML 5.1 and 5.2

HTML 5.1 is now a W3C proposed recommendation, and the comment period has closed. If no major issues have turned up, it may become a recommendation soon, susperseding HTML 5.0.

Browsers already support a large part of what it includes, so a discussion of its “new” features will cover ones that people already thought were a part of HTML5. The implementations of HTML are usually ahead of the official documents, with heavy reliance on working drafts in spite of all the disclaimers. Things like the picture element are already familiar, even though they aren’t in the 5.0 specification.
Continue reading