Tag Archives: software

Identifying files by programming language

Most of today’s programming languages look vaguely similar. They’re derived from the C syntax, with similar ways of expressing assignments, arithmetic, conditionals, nested expressions, and groups of statements. If the files have their original extension and it’s accurate, format identification software should be able to classify them correctly.

The software should do some basic checks to make sure it wasn’t handed a binary file with a false extension, which could be dangerous. A code file should be a text file. regardless of the language. (This isn’t strictly true, but non-text languages like Piet and Velato are just obscure for the sake of obscurity.) The UK National Archive recognizes XML and JSON (which is a subset of JavaScript) but doesn’t talk about programming languages as file formats. Exiftool identifies lots of formats but makes no attempt to discern programming languages.
Continue reading

Aside

JHOVE 1.22 is now available from OPF.

Path traversal bugs in archive formats

Malware has shown up which takes advantage of a path traversal bug in the WinRAR archiving utility. The bug, which reportedly existed for 19 years, is fixed in the latest version. The problem stems from an old, buggy DLL which WinRAR used. It allowed the expansion of an archive with a file that would be extracted to an absolute path rather than the destination folder. In this case, the path was the system startup folder. The next time the computer was rebooted, it would run the malware file.
Continue reading

A screen capture tip using Grab on the Mac

MacOS provides a few different ways to do screen captures. My personal favorite is Grab, which is found in the Applications/Utilities folder. It lets me capture a selection, a window, or the whole screen without having to remember any magic key combinations. I keep it in the Dock for quick access.

Grab has one deficiency, though. It can save screenshots only as TIFF files. If Apple had to pick just one format, that’s hardly the most useful one. But there’s an easy workaround.

After you’ve got your screen shot, press Command-C or choose “Copy” from the Edit menu. Open the Preview application. Press Command-N or select “New from clipboard” from the File menu. You now have the screenshot in Preview.

In Preview, press Command-S or choose “Save…” from the File menu. You’ll get a dialog to save the file, with a choice of formats: JPEG, JPEG2000, OpenEXR, PDF, PNG, or TIFF. Pick whichever one you like. If you’re going to put the image into a Web page, PNG is usually the best choice. Preview will remember your choice for next time. Then save the file.

If you prefer, you can do the equivalent in Photoshop, Gimp, or any other image-processing application, but Preview has the advantage of launching quickly and keeping the process simple.

That’s it. You can now use Grab to save screenshots to a Web-friendly format.

What are “positives” in format validation?

Articles about JHOVE, such as Good GIF Hunting, grab my attention for obvious reasons. This article talks about false positive and negative results, and got me to thinking: What constitutes a “positive” result in file format validation? There are two ways to look at it:

  1. The default assumption is that the file is of a certain format, perhaps based on its extension, MIME type, or other metadata. The software sets out to see if it violates the format’s requirements. In that case, a positive result is that the file doesn’t conform to the requirements.
  2. The default assumption is that the file is just a collection of bytes. The software matches it against one or more sets of criteria. A positive result is that the file matches one of them.

Continue reading

Libtiff 4.0.9 released

Libtiff 4.0.9 has been released. According to the email announcing it:

A great many security improvements have been implemented by Even Rouault.

Much thanks to OSS Fuzz, team OWL337, Roger Leigh, and of course Even Rouault.

Obligatory reminder: Don’t download from libtiff dot org. It’s many years out of date.

JHOVE webinar

An Open Preservation Foundation webinar, “Putting JHOVE to the acid test: A PDF test-set for well-formedness validation in JHOVE,” will be held on November 21, 10 AM GMT (that’s 11 AM in Central Europe and a ludicrous 5 AM or earlier in the US).
Continue reading

Popular Science on format conversion

Popular Science has an article, “How to convert any file to any format.” The title overreaches, but the article actually isn’t too bad. It’s addressed at the ordinary user, not the file format specialist, so it wouldn’t be appropriate to complain too much that it has more breadth than depth.

It starts by recommending using the application that created the file, and that’s certainly good advice. Even when formats are open standards, an app knows more about how it creates its own files than anyone else does. Its files might have bits of application-specific information.
Continue reading

JHOVE online hack day

My venture into the Techno-Liberty blog didn’t work so well. In fact, I’m getting more views on this blog, in spite of not having posted in months, than I got on my best days on the other blog. So … I’m back.

JHOVE is still doing well too, thanks to excellent work by Carl Wilson and others at the Open Preservation Foundation. There will be an online hack day for JHOVE on April 27. The aim is to find ways to improve JHOVE by improving error reporting, collecting example files, and documenting the preservation impact of JHOVE validation issues. (I think that last one means “Why does McGath’s PDF module suck?” :)

The time listed is 8 AM-8 PM. I asked what time zone that is, and was told it means any and all, from New Zealand the long way around to Hawaii.

Last time I said I’d drop in and didn’t really manage to. This time I won’t make promises, but I’ll try to be around in some form. If nothing else, people can ask me questions about JHOVE in the comments.

A Libtiff mirror

Libtiff is still offline at remotesensing.org, but there’s a mirror of the source available on GitHub. I held off on mentioning it in this blog till Bob Friesenhahn confirmed it’s reliable.