Tag Archives: software

Reaching out from L-Space

(This article is based on a presentation I made at Dartmouth’s Baker Library on February 7. I’m working from the outline rather than a transcript and have made some changes for the written medium. It’s split into two parts because of its length.)

Terry Pratchett wrote in Guards! Guards!:

It seemed quite logical to the Librarian that, since there were aisles where the shelves were on the outside then there should be other aisles in the space between the books themselves, created out of quantum ripples by the sheer weight of words. There were certainly some odd sounds coming from the other side of some shelving, and the Librarian knew that if he gently pulled out a book or two he would be peeking into different libraries under different skies.

All libraries everywhere are connected in L-space. All libraries. Everywhere.

Right now we’re in the L-space connection between developers and librarians, and the one between librarians and developers on the one hand and students and faculty on the other. L-Space can be a trap, though. If we stay inside it so much that we only talk to each other, we’re missing the whole point of the library’s existence. Pratchett’s Librarian falls a bit short on communication skills, since he’s an orangutan; then again, so do a lot of programmers. Maybe that’s why they call us code monkeys.

The issue of talking tech to non-techies isn’t just for programmers. Librarians are immersed in tech jargon these days: OPACs, MARC records, the OAIS model, etc. Communication levels aren’t just a binary issue. There’s a saying: “There are 10 kinds of people: those who understand binary and those who don’t.” It’s easy to split the world into “us” and “everyone else.” We all have our own sets of assumptions, which we may not realize are there. “Everyone knows” certain things, and those who don’t must be “hopelessly ignorant.” Everyone but the ignorant knows the difference between an application and a file format, Java and JavaScript, what happens in the browser and what happens in the server. It’s easy for any in-group to think of the rest of the world as just outsiders, and for programmers to think of everyone else as computer-illiterate.

However, all people have their own specialties and knowledge. Faculty clearly have their specialties. Students are more comfortable with some kinds of tech, like mobile devices, than many of us are. A good friend of mine is a grocery clerk, and she can teach me things about product codes and scanners. It’s a deadly error to assume that people are too dumb to grasp the benefits of something. This assumption can be harder to work past than actual user ignorance.

For example: I live in a condominium, which is very well-managed on the whole. At one owners’ meeting, though, I pointed out a problem with the PDF newsletters that were being sent by email. They’re sent as scanned images, not as text PDFs, which means they aren’t searchable and people with vision problems can’t take advantage of technologies such as text-to-speech. One of the board members told me I was entirely right, but the owners just weren’t capable of understanding such issues, so it wasn’t worth doing anything. He said this in front of the owners!

People are generally better at solving practical problems than at abstract reasoning. We evolved to survive, not to fit any specific paradigm of knowledge. People understand what they need to understand.

Successful communication happens when the message received equals the message sent. It requires that the parties have a common language, and it can happen only when they share an area of understanding.

Developers need to understand their audience. “Non-programmer” doesn’t mean “non-computer-literate.” Communication needs to be in terms which relate to the audience’s purpose. This comes in two levels for library developers: Talking to library people in library terms, and talking to library users in the terms in which they use the library. We need the help of library people when doing the second.

We’re dealing with a knowledgeable audience: students and faculty. They understand the Internet on a user level. They know how to look for books, even if they do it mostly on Amazon. Students in particular understand mobile devices. Talking below their level is as bad as going over their heads. We need to know what their world is, and we need to address its needs. We need to make the library fit the users’ world.

We have to offer something that’s worth trying out and make it easy to understand. It has to offer something they don’t already have. There’s a saying: “The Internet is the world’s largest library, with all the books on the floor.” The users should get the sense not just that the books are on shelves, but that they control the shelving, that they can organize information the way they need it.

On the whole and on average, users think less analytically than programmers. They don’t see all the consequences of a proposed fix. For instance: Users may complain about having to log back into a system too frequently. The obvious fix for them is to increase session length and time out less often, but they may not think of the loss of security that results, especially on public computers.

Users like DWIM systems — ones that “do what I mean.” These have to guess what the user means. When they guess right, it’s great, but it’s really annoying when they guess wrong. If you’ve ever had a search engine rewrite your search, you know what I mean. Try searching for “droid file tool,” looking for results about the UK National Archives’ file-identification tool called Droid. On Google, you’ll get a bunch of results for “Android.” That’s not the Droid you’re looking for.

Developers need to explain the consequences of a design choice, that getting X implies also getting Y. Figuring out what will really meet the users’ needs, as opposed to what they initially say they want, can be a challenge.

Again, two paths through L-space are needed here. Librarians need to talk the users’ language, and programmers need to talk the librarians’ and the users’ language. Librarians need to assist us in talking the users’ language.

(Continued in part 2)

Future paths for JHOVE

With the next SPRUCE Hackathon coming up, I’m thinking of possible ways to improve JHOVE that I might present there. The home page says, “This hackathon will therefore focus on unifying our community’s approach to characterisation by coordinating existing toolsets and improving their capabilities.” So aside from the general goal of improving JHOVE, coordination is a key point.

I’d posted earlier on some possible enhancements. These are all still possibilities. The focus on coordination brings up other things that could be done. In general, the API hasn’t been given as much thought as the command line interface, and it could be improved without a huge amount of effort. Here are a few thoughts:

  • The API currently requires creating an output stream, such as an XML or text file. It should be possible to call JHOVE and get back an in-memory object. The RepInfo object already serves this purpose; it’s mostly a matter of writing a new method that returns it instead of writing a stream.
  • The caller has the choice of running one module or all the modules in the configuration file and can’t change their order. It might improve efficiency if the caller could specify a list indicating the modules to try and the order in which they should be applied. For instance, a caller might use DROID to get the signature and use this information to pick the module that JHOVE should run first.
  • There’s currently no provision for selecting which output items to generate, except for a few ad hoc options. Would a way to do this, eliminating items that are unwanted, be helpful?
  • Would any additional output handlers, such as JSON, be useful?

I’d welcome any thoughts on which of these, or what other changes, would help JHOVE to coordinate with other applications.

JHOVE statistics

Here are a few statistics on JHOVE, taken from SourceForge. The period I checked is from January 1, 2012, through January 29, 2013.

Total downloads, all files: 3,081
Downloads for Windows: 2,160
Linux: 350
Macintosh: 294

Top 5 countries:
United States: 831
Germany: 316
Spain: 235
France: 184
Canada: 129

Releases of JHOVE since I left Harvard: 2

Total income from JHOVE since I left Harvard: $12.70 (from sales of JHOVE Tips for Developers)

Optimizing FITS

January’s mostly over, and I’ve only posted three times to this blog. Files that Last has been keeping me busy. My posting should pick up again before long, once I get a draft out to first readers.

One thing I’ve been looking at, with an eye to the upcoming SPRUCE Hackathon, is things that can be done with FITS. I’ve written up the results of some profiling experiments and quick attempts at optimization. FITS puts together a lot of tools for extracting file metadata, but there have been some complaints that it’s not as fast as it might be. The first results were surprising; the easiest way to get a small improvement was to factor out the initialization of namespace URIs for parsing XML. You wouldn’t think that would make any detectable difference, but the initialization of URIs in Xerces is surprisingly slow.

Another possibility to explore is improving the connection between FITS and JHOVE. Even though JHOVE is intended for use as a callable library, among other things, it’s designed to write to an output file. Some simple changes would let it provide an in-memory response without writing a file, which would be more useful to an application like FITS.

New E-booklet: JHOVE Tips for Developers

My new E-booklet, JHOVE Tips for Developers, is now for sale on Smashwords.com. This was in part a trial run for publishing Files that Last, but anyone who integrates JHOVE with other software will find it useful. The chapters are:

  1. JHOVE Basics: A readable guide to installing, configuring, and running JHOVE, with information about each of the modules.
  2. The JHOVE API: Necessary information for integrating the JHOVE JAR into an application.
  3. Custom output: How to create a new output handler, for producing output in a different format or for better integration with an embedding application.
  4. Modules: Some supplemental information to the online tutorial on writing a module.

It’s a “name your own price” book. If you work with JHOVE and will have a use for the booklet, or if you just want to support JHOVE development, I hope you’ll buy it and pay a price you consider reasonable.

JHOVE Tips for Developers: Call for proofreaders

As a practice run for publishing Files that Last on Smashwords, I’ve put together a small but hopefully useful e-booklet, JHOVE Tips for Developers, which I’m planning to put up there on a “choose your own price” basis. This will help me work out the process of creating the book on a small scale, and maybe it will buy me a Whopper and fries.

For a book of this sort I obviously can’t afford paid proofreading, but I’m hoping one or two people might give it a looking over before I submit the book. You can get the draft as a PDF here.

I’d offer you a free copy in return, but you can get that anyway. What I can do is offer people who give useful feedback credit in the book, as well as my personal thanks.

JHOVE 1.9

I’ve put up JHOVE 1.9 on the SourceForge site today. I think it’s the
least buggy version ever. Please let me know if I’m wrong.

Release notes:

GENERAL

  1. Jhove.java and JhoveView.java now get their version information from
    JhoveBase.java. Before it was redundantly kept in three places, and
    sometimes they didn’t all get updated for a new release. Like in 1.8.
  2. ConfigWriter was in the package edu.harvard.hul.ois.jhove.viewer, which
    caused a NoClassDefFoundError if non-GUI configurations didn’t include
    JhoveViewer.jar in the classpath. It’s been moved to
    edu.harvard.hul.ois.jhove.
  3. Added script packagejhove.sh and made md5.pl part of the CVS repository
    to make packaging for delivery easier.
  4. jhove.bat now simply uses the Java command rather than requiring
    the user to set up the Java path.
  5. JhoveView.jar and jhove (the top level shell script) are now forced
    by ant to be executable so there are no mistakes.
  6. Warning message given on invalid buffer size string, and minimum
    buffer size is 1024.
  7. Configuration file code for adding handlers and giving init strings
    to modules was an awful mess that never could have worked. Major repairs done.

AIFF MODULE

  1. If an AIFF file was found to be little-endian, the module instance
    would stay in little-endian mode for all subsequent files. This
    has been fixed.

TIFF MODULE

  1. TIFF files that had strip or tile offsets but no corresponding byte
    counts were throwing an exception all the way to the top level. Now
    they’re correctly being reported as invalid.

XML MODULE

  1. Cleaned up reporting of schemas, Added some small classes to replace
    the use of string arrays for information structures. Made URI comparison
    for local schema parameter case-independent. Resolved conflict between
    “s” and “schema” parameters.

WAVE MODULE

  1. Some uncaught exceptions caused the module to throw all the way
    back to JhoveBase and not report any result for certain defective
    files. These now report the file as not well-formed.

And … JHOVE 1.9b3

Lately I’ve been writing a user guide for JHOVE as part of an upcoming
book. This means going through all the features to see how they really
work, and this has turned up a number of bugs. Among the latest fixes
are are: (1) If the AIFF module encounters a little-endian file, it
treats all subsequent files as little-endian whether they are or not.
(2) Certain errors in WAVE files throw an exception from the module
instead of reporting that the file isn’t well-formed. (3) The XML
module’s “s” and “schema” parameters conflicted, with “schema” being
treated as both, and there was a problem with schema URIs with
upper-case characters.

Version 1.9b3 should fix all of these. Hopefully I won’t find anything
else that needs fixing soon, so we can finally have a 1.9 release. but
if there are any problems with this beta, please let me know!

JHOVE 1.9b2

JHOVE 1.9b2 is up, fixing issues with the configuration file. The code for editing the configuration file from the GUI was just completely broken, but I think it’s fixed now. I can’t imagine anyone was ever trying to add init strings to modules (none of the standard ones use one anyway) or add handlers using the GUI, or someone would already have noticed. But I couldn’t stand having it not fixed, so the new build is there.