Category Archives: News

UDFR news

The Library of Congress’s digital preservation blog has an update on UDFR (Unified Digital Formats Registry). Holding a meeting of stakeholders may not be much of a “milestone,” but it shows the effort is still alive.

JPEG2000 summit

It’s a bit late to get there if you didn’t already know about it, but the Library of Congress is hosting a JPEG 2000 summit in Washington today and tomorrow. Hopefully some interesting materials will be made public.

Workshop on preservation and JHOVE2

A workshop on digital preservation and JHOVE2 will be held at FAO (Food and Agriculture Organization of the United Nations) in Rome, Italy on May 23-27. Presenters will include Stephen Abrams and Perry Willett from California Digital Library, Tom Cramer from Stanford, and Sheila Morrissey from Portico. Days 1 and 2 (on preservation) are free; there is a $300 fee for the JHOVE2 tutorial.

JHOVE2 2.0.0

JHOVE2 2.0.0 has been released. Supported formats are ICC Color Profile, SGML, Shapefile, TIFF, UTF-8, WAVE, and XML. The first three of these aren’t supported by the old JHOVE. There’s also a Zip module which validates files within a Zip repository, but not the Zip file itself. JHOVE2 can be downloaded in Zip or Gzip form, or from the Mercurial repository.

Congratulations to everyone who worked on this project!

Preservation Week

April 24-30 is Preservation Week.

EXI is W3C recommendation

Efficient XML Interchange or EXI, the controversial binary representation of XML, is now a W3C standard. Unlike approaches which apply standard compression schemes to XML (e.g., Open Office’s XML plus ZIP), Efficient XML represents the structure of an XML document in a binary form. For some, this adds unnecessary obscurity to a format based on (somewhat) human-readable text. Others consider it a necessary step to reduce the bloat and slow processing of text XML.

The press release says: “EXI is a very compact representation of XML information, making it ideal for use in smart phones, devices with memory or bandwidth constraints, in performance sensitive applications such as sensor networks, in consumer electronics such as cameras, in automobiles, in real-time trading systems, and in many other scenarios.”

There are some things that can be done in XML but not in EXI. The W3C document says: “EXI is designed to be compatible with the XML Information Set. While this approach is both legitimate and practical for designing a succinct format interoperable with XML family of specifications and technologies, it entails that some lexical constructs of XML not recognized by the XML Information Set are not represented by EXI, either. Examples of such unrepresented lexical constructs of XML include white space outside the document element, white space within tags, the kind of quotation marks (single or double) used to quote attribute values, and the boundaries of CDATA marked sections.” Whether this is important will doubtless continue to be the subject of heated debate.

JHOVE2 tutorial at IS&T Archiving

Forwarded from Stephen Abrams:

The JHOVE2 project team will be presenting a one day tutorial on the use of JHOVE2 at the IS&T Archiving conference on May 16.

http://www.imaging.org/ist/conferences/archiving/index.cfm

Description

JHOVE2 is an open source framework and application for next generation format-aware characterization of digital objects. Characterization is the process of deriving representation information about a formatted digital object that is indicative of its significant nature and useful for purposes of classification, analysis, and use in digital curation, preservation, and repository contexts. JHOVE2 builds on the success of the original JHOVE characterization tool by addressing known limitations and offering significant new functions, including: object-focused, rather than file-focused, characterization; signature-based file level identification using DROID; aggregate-level identification based on configurable file system naming conventions; rules-based assessment to support determinations of object acceptability in addition to validation conformity; and extensive user configuration options.

The 2011 release of JHOVE2 represents the availability of a significant new tool for digital preservation; this course will provide a broad overview of JHOVE2, as well as detailed information on its functionality, architecture, use in local workflows, and open source community.

Course Objectives:

This short course will give attendees both a broad conceptual overview and detailed information on JHOVE2, and equip them to use the open source tool in their local environments. Specifically, the course will:

  • Define the role of file characterization, including identification, feature extraction, validation, and assessment, in digital curation and preservation workflows.
  • Review the functionality of the JHOVE2 application, including the significant enhancements relative to JHOVE, and new capabilities based on object- and aggregate-level characterization
  • Detail the architecture, componentry, design patterns and Java API’s of the JHOVE2 framework, as well as the configuration options for plug-in modules, characterization strategies and results formatting
  • Demonstrate the use of JHOVE2’s new rule-based assessment capabilities, and integrating these into local workflows to determine object acceptability
  • Cover the community framework for the project, and how individual institutions can both contribute new format modules as well as resources to help extend and sustain the open source project.

Intended Audience:

This course is designed for technologists and practitioners (developers, managers, analysts and administrators) engaged in digital curation, preservation, and repository activities, and whose work is dependent on an understanding of the format and pertinent characteristics of digital assets.