A Vice.com article has brought fresh publicity to an old trick. The so-called “Zip bomb” is a Zip file with a fantastically high compression ratio. Researcher David Fifield created a 46-megabyte file that expands into 45 petabytes. That’s a compression ratio of about a billion. Fifield’s own article provides a lot more technical information.
The article says such files are “so deeply compressed that they’re effectively malware.” That strikes me as a bit of an exaggeration. “Nuisanceware” seems more accurate, if there’s such a word. However, they could be used in a denial of service attack. They could crash a server or browser, and the work removing the expanded files could cause some downtime. A Zip bomb might be a setup for another attack, tying up system resources and distracting administrators.
Continue reading
Identifying files by programming language
Most of today’s programming languages look vaguely similar. They’re derived from the C syntax, with similar ways of expressing assignments, arithmetic, conditionals, nested expressions, and groups of statements. If the files have their original extension and it’s accurate, format identification software should be able to classify them correctly.
The software should do some basic checks to make sure it wasn’t handed a binary file with a false extension, which could be dangerous. A code file should be a text file. regardless of the language. (This isn’t strictly true, but non-text languages like Piet and Velato are just obscure for the sake of obscurity.) The UK National Archive recognizes XML and JSON (which is a subset of JavaScript) but doesn’t talk about programming languages as file formats. Exiftool identifies lots of formats but makes no attempt to discern programming languages.
Continue reading →
1 Comment
Posted in commentary
Tagged archiving, file identification, software