I’ve put up JHOVE 1.9 on the SourceForge site today. I think it’s the
least buggy version ever. Please let me know if I’m wrong.
Release notes:
GENERAL
- Jhove.java and JhoveView.java now get their version information from
JhoveBase.java. Before it was redundantly kept in three places, and
sometimes they didn’t all get updated for a new release. Like in 1.8. - ConfigWriter was in the package edu.harvard.hul.ois.jhove.viewer, which
caused a NoClassDefFoundError if non-GUI configurations didn’t include
JhoveViewer.jar in the classpath. It’s been moved to
edu.harvard.hul.ois.jhove. - Added script packagejhove.sh and made md5.pl part of the CVS repository
to make packaging for delivery easier. - jhove.bat now simply uses the Java command rather than requiring
the user to set up the Java path. - JhoveView.jar and jhove (the top level shell script) are now forced
by ant to be executable so there are no mistakes. - Warning message given on invalid buffer size string, and minimum
buffer size is 1024. - Configuration file code for adding handlers and giving init strings
to modules was an awful mess that never could have worked. Major repairs done.
AIFF MODULE
- If an AIFF file was found to be little-endian, the module instance
would stay in little-endian mode for all subsequent files. This
has been fixed.
TIFF MODULE
- TIFF files that had strip or tile offsets but no corresponding byte
counts were throwing an exception all the way to the top level. Now
they’re correctly being reported as invalid.
XML MODULE
- Cleaned up reporting of schemas, Added some small classes to replace
the use of string arrays for information structures. Made URI comparison
for local schema parameter case-independent. Resolved conflict between
“s” and “schema” parameters.
WAVE MODULE
- Some uncaught exceptions caused the module to throw all the way
back to JhoveBase and not report any result for certain defective
files. These now report the file as not well-formed.
So far there isn’t, as far as I know, a book to promote and explain digital preservation to people who understand computers but aren’t part of the library and archiving world. That’s where I’m aiming this book. If you look at the Library of Congress’s
When is a PDF not a PDF?
Yesterday I was doing some experiments with Adobe Illustrator. According to some web sites, The CS5 version saves its files as PDF, though with the extension .AI. When you save a file, though, the options dialog has a checkbox labeled “Create PDF Compatible File.” I unchecked it and saved the file, then opened it in JHOVE. JHOVE says it’s perfectly good PDF — indeed, PDF/A. Then I tried opening it in Preview, and this is what it looked like:
If you don’t actually look at the file but trust the mere fact that it’s a PDF, you might put it into a repository and find out later on that it’s worthless as a PDF. What’s happening is that PDF can embed any kind of content, and this one embeds its native PGF data. Any PDF reader can open the file, but only an application that understands PGF can use its actual content. Anyone putting PDF into a repository should be aware of this risk.
It’s outside the scope of JHOVE to check whether embedded content is acceptable to PDF/A, so the claim that it’s correct PDF/A is probably spurious. It is, however, definitely legal PDF.
This type of situation helps to show why PDF/A-3 is a bad idea.
10 Comments
Posted in commentary
Tagged PDF, preservation