JHOVE 1.11 is now available at
Thanks to Maurice de Rooij for helping to debug the Windows batch files.
JHOVE 1.11 is now available at
Thanks to Maurice de Rooij for helping to debug the Windows batch files.
In spite of my new job, I’m finding some time to work on JHOVE. Version 1.11a1 is now available for testing. Please give it a try and let me know of any problems.
As of today, Google Reader is gone. (Correction: It goes away at the end of today. You still have time to export your feed list.) When its termination was announced, some writers declared it meant the end of RSS feeds. From what I’m seeing today, the attempts at panic have died away, replaced by a realization that RSS and Atom are well-understood feed formats and that lots of alternatives exist. Tristan Louis writes for Forbes:
While the death of the most popular RSS reader on the internet could have been seen as something that would represent a grave danger for RSS as a standard, for openness as a concept, and for heavy news consumption, the inverse has been true, as it only solidified RSS’ position in the world as the format for news delivery. Reader was a good product but one can hardly call it a great product and its demise will help rectify some imbalances it created in the market.
Hopefully dedicated users backed up their feed collections as an OPML file. If not, all they have to do is start collecting feed URLs again.
There’s no need to use a website at all to manage your feeds. On my iPod, I use Free RSS Reader, a simple, straightforward reader, though unfortunately it’s no longer being updated. On my main computer I use Sage, a Firefox extension.
A few columnists got a temporary boost in readership and a long-term loss in credibility by proclaiming the demise of RSS. The rest of us are still fine.
JHOVE 1.10 is now available for downloading. It’s the same as 1.10B3 except for the version numbering. The Javadoc has been brought up to date.
I haven’t included the MD5 files, since SourceForge provides MD5’s. If you still want them, let me know.
JHOVE 1.10b3 is now available. This is the release candidate, and there won’t be any further changes beyond the version number designation unless a serious problem shows up.
I’m surprised I only got a complaint about this recently. Using JHOVE to validate XHTML files is often painfully slow. In fact, using anything to validate them without caching or redirection of DTDs would be painfully slow. The DOCTYPE declaration brings in the standard XHTML DTD, and it in turn brings in lots of other DTDs. These all have URLs on w3.org. As you can imagine, this is a lot of traffic converging in one place, and the response is often very slow.
JHOVE has a remedy, but it turns out not to work in this case. In the configuration file, you can declare local copies of schemas and DTDs to be loaded by the SAX entity resolver. This looks something like this:
<module> <class>edu.harvard.hul.ois.jhove.module.XmlModule</class> <param>schema=http://www.w3.org/TR/REC-smil/SMIL10.dtd;/Users/gmcgath/schemas/SMIL10.dtd</param> </module>
Unfortunately, there are some problems in JHOVE 1.9. The HTML module processes XHTML files by passing them to the XML module. In this case, the module doesn’t get the parameters that the config file declared for it. In JHOVE 1.10, I’ll fix this by having the HTML module pass its own parameters to the XML module. At present, JHOVE’s processing of XHTML files makes no use of the configuration file’s instructions to the entity resolver.
There’s another complication. The XHTML DTD invokes other DTDs, and JHOVE has to get every one of those in turn. Some of them have relative URLs to other DTDs; these break when they’re redirected to local files. Even making local copies of all the files doesn’t work, as JHOVE doesn’t handle the relative URLs correctly within the file system, and making them work would require changing some existing assumptions. The best fix for the user is to get JHOVE 1.10 when it’s ready (version 1.10B2 doesn’t have the XHTML fix yet) edit all those files so that all the URLs are absolute.
This is a big chunk of work, and I haven’t tested the approach fully. Any ideas on how this might be better handled would be appreciated.
I’ve put up JHOVE 1.10b2. It has a bit of optimization for the PDF module, though files with huge structure trees are still painfully slow.
I’ve put up a new beta version of JHOVE, 1.10b1, on SourceForge.
The major change since last time is the handling of structure trees in PDF files; this should keep JHOVE from hanging or running out of memory on some PDF files as it used to. Please report any problems soon.
It’s been a problem for a while that DROID 6 won’t run under Java 7. Matt Palmer has reported a simple fix for this, requiring only a change in pom.xml. Hopefully a release incorporating this change will appear soon.
Tools come and go, effort must be ongoing
In a comment on a JHOVE bug, I said offhandedly that it’s approaching the end of its life. This caused a certain amount of concern in Twitter discussions. Andy said that software tools are one of the best ways to “preserve specific, reproducible knowledge about processes.” I don’t think dropping support of a rather dated tool is a big concern, though, as long as the code doesn’t vanish.
A software application is good for a certain number of years before it needs to be either left as legacy code or completely rewritten. Throwing out code and starting over takes a lot of effort, but it can result in much better code. I started on JHOVE in 2003 as a contractor to the Harvard University Libraries. After a few years it became clear that some of the design decisions weren’t ideal. Its all-or-nothing approach and its tendency to give up after the first error have long been obvious problems. The PDF module is a kludge built on a crock, and that’s without even talking about its profiles. The TIFF module, on the other hand, has a fair amount of elegance.
JHOVE2 was supposed to be the successor to JHOVE. Its creators learned from JHOVE and produced a better design. What they didn’t have was enough time and money to cover all the formats that JHOVE covered. I’ve continued to work on JHOVE because I know it inside and out. Someone else could pick up the work, but it might make more sense for a newcomer to the code to join the JHOVE2 effort instead. However, Maurice noted on Twitter that there hasn’t been much activity lately on JHOVE2 issues.
Both JHOVE and JHOVE2 were funded under grants. When the grant money ended, progress slowed down. The one-time grant model is the wrong way to fund preservation software. It’s an ongoing effort; new formats arise and old ones change, and there are always bugs to fix. What I’d like to see happen is for major libraries in the US to create an ongoing consortium for preservation work, similar to the Planets project in Europe. Or better yet, a consortium bringing together libraries all over the world. It wouldn’t take a lot from any individual institution. Its job would be to maintain information, preservation tools, test suites, and so on, on an ongoing basis. Instead of rushing to create a tool and then leaving it to freelancers like (formerly) me to maintain, it would support maintenance of tools for as long as it made sense and creation of new ones when it’s appropriate.
My voice isn’t enough to call anything like this into existence, but I can hope.
Comments Off on Tools come and go, effort must be ongoing
Posted in commentary
Tagged JHOVE, preservation, software