RLG DigiNews, ISSN 1093-5371

Home

Table of Contents

 

Feature Articles

Digital Object Library Products

William Lund
Harold B. Lee Library
Brigham Young University
bill_lund@byu.edu

Introduction

Building digital collections has become a widespread activity in virtually all academic libraries. The Harold B. Lee Library at Brigham Young University is in the midst of incorporating digital content and services into the resources available to our patrons within the larger context of the library. Some large academic libraries are building their own digital library systems or co-partnering with a vendor to create one, whereas, the majority of libraries would not have the resources to build and support their own systems. This article will review the Lee Library's criteria and requirements for a digital library management system, the vendor products currently available, and our selection to accommodate our growing digital collections.

Just as the Lee Library is more than its physical collections, a digital library should be more than collections of objects in digital format. To avoid confusing the larger issue of what a digital library should be, we have chosen to call a collection of objects stored in an encoded digital form a "digital objects library" or "digital objects archive." A digital objects library will typically store images, text documents, sound, video, and animations, and provide descriptive metadata for each object, along with search capabilities and access management facilities.

The Lee Library's History with Digital Object Libraries

In 1998, the Lee Library contracted with Sirsi Corporation of Huntsville, Alabama, to provide Unicorn, an automated library system. At that time, they also provided Hyperion, a digital object library, or digital media archive, as they prefer to call it. After completing the bulk of the migration from our former automated library system, we began experimenting with Hyperion, discovering its strengths and weaknesses. Although Hyperion shows promise, particularly for organizing and displaying written documents, we chose earlier this year to investigate other vendors' products, having learned through experience what features were needed for our envisioned digital collections. In the end, we determined that none of the existing products fulfilled all the needs of our proposed collections. As will be discussed here, we have chosen to retain Hyperion, primarily for our document collections, and, during the fall of this year to build collections with two other products that exhibit characteristics we feel are needed for the collections to be of greatest value to our patrons. A review of our requirements analysis (Table 2) and our gathering of the list of available digital library products (Table 1) may be of use to other libraries and archives just embarking on the search for a digital object library product.

Digital Object Library Products

From our experience with Hyperion, we developed a list of features we desired in a digital library product, recognizing that no one product was likely to incorporate every feature. List in hand; we began our review by attempting to identify as many digital object library products on the market as possible. Quickly we discovered that there was a gray area between digital object libraries and digital asset management tools. Whereas digital library products tended to focus on wide distribution of content to a varied population, asset management tools tended to aim at a small group of users with a consistent set of requirements. Most of the asset management products also required a client program that, in our opinion, made them less useful for a large and diverse population. For these reasons, we decided to eliminate asset management tools from consideration. However, they are listed in Table 1 along with the digital object library products, including those provided by automated library system (ALS) vendors. Some of these do not provide a separate or stand-alone digital object library product. In some cases, they allowed for digital objects hosted on a separate Web server to be linked using an OPAC tagged field (the MARC 856, sub-field u) or to be integrated with a viewer. Since this feature is common to most automated library systems, we did not include them in our list of digital object libraries to be reviewed. Those products that are not in full release to all customers as of this writing (e.g., DigiTooLibrary product from Ex Libris and iLibrary from Epixtech) were also excluded from our evaluation. The products that remained in our review are listed in bold against a gray background.

The following list of products should not be considered an exhaustive list of all possible asset management or digital object library products. It represents an attempt at identifying products that may have been useful in BYU's environment. An alternate, somewhat longer list of products can be found in the Integrated Library System Reports site under "Vendor Info" in the navigation bar.

Table 1. Digital Object Library and Asset Management Products

Company Product Name Product Type Examples Comments
Artesia Teams Asset Management    
Blue Angel MetaStar Asset Management www.pictureaustralia.org  
Canto Cumulus Asset Management    
CARL Photo Imaging Module

Automated Library System

Digital Object Library

www.aclin.org

Click on "Other"; then select "Denver Public Library Western History Photos" and search on the keyword "Indian."

 
CONTENTdm CONTENTdm Digital Object Library contentdm.com  
Cuadra Associates Image Management

Automated Library System

Digital Object Library

   
DRA   Automated Library System   In the process of being purchased by Sirsi.
eMotion Media Partner Asset Management    
Endeavor ENCompass

Automated Library System


Digital Object Library

www.ksu.edu

KSU'S Digital Resources

Although Endeavor lists 8 customers for Encompass, none of the sites appears to be in full use of the product at this time. See www.endinfosys.com.
Epixtech iLibrary

Automated Library System

Digital Object Library

   This product is not shipping at this time.
Ex Libris DigiTooLibrary

Automated Library System

Digital Object Library

   This product is not shipping at this time.
Fotoware Fotoware Asset Management    
GEAC   Automated Library System    
IBM DB2 Library Digital Object Library

miless.uni-essen

 
Innovative Interfaces Image Linking Automated Library System    
Luna Insight Digital Object Library product www.davidrumsey.com The Java client for either Microsoft Windows or MacOS can be downloaded from www.davidrumsey.com.
New Zealand Digital Library Project

Greenstone

See SorceForge for product summary

Digital Object Library www.nzdl.org  
North Plains Telescope Asset Management product    
Sirsi Hyperion

Automated Library System

Digital Object Library

Go to the Lee Library's home page and select the link to the "Digital Library." Click on "Browse" to view the collections.  
SRZ Agora Digital Object Library     
University of Michigan DLXS Digital Object Library moa.umdl.umich.edu   
VTLS Visual MIS

Automated Library System

Digital Object Library

eagle.vsla.edu   

Requirements Analysis and Product Evaluation

The following evaluation is entirely subjective from the perspective of the Lee Library. All libraries have encountered the problem in which a product appears on paper to do what is needed but in reality provides the functionality in an unanticipated and less useful fashion. Our evaluation of a product is based on whether it does what we needed. In addition to specifically reflecting BYU's requirements and priorities, this evaluation information will become dated as products evolve. In selecting a product, libraries should evaluate available products using an explicit list of agreed upon requirements and the most current information about the products.

We evaluated the following products by visiting vendor Web sites, viewing collections managed by the product, speaking to current customers, and performing some on-site evaluations. We divided our list of requirements (1) into three sections: collection building and management; patron functions; and system administration. Collection building and management deals primarily with library staff functions, including adding objects and metadata to the archive. Patron functions are those that directly affect the patron's use and perception of the product and collections. System administration deals with the hosting of the product on the system hardware and operating system. Along with each set of features are comments on some of the high points and low points of the products we reviewed.

Table 2: Feature Requirements and Product Analysis

Collection Building and Management
A. Authority Control BYU Assessment

1. Does the system provide authority control for selected metadata?

2. Which authority control databases are supported?

3. Does the product create a local thesaurus and have the ability to add additional thesauruses as needed?

4. How is the authority control database maintained?

CONTENTdm is one of the few products that provides some type of authority control on the metadata. Selected metadata fields can be placed under the control of a list created by the collections administrator. The product does not directly integrate any standard authority control databases.

The ALS products listed earlier, all of which support using the MARC 856 field to point to an external data source, indirectly provide authority control through their own OPAC. For example, although Hyperion does not directly support authority control in its internal metadata, it is possible to link from an OPAC record in Unicorn, Sirsi's ALS product that is under authority control, to a digital object in Hyperion.

B. Foreign Languages BYU Assessment

1. Does the product support the full ASCII character set, including non-English characters in metadata and textual files?

2. Does the product support the full ALA character set?

3. Does the product support the full Unicode character set?

4. For records containing diacritics or non-Roman characters, how does the product display these to the browser?

5. How does the product know whether the patron's browser is set up to handle diacritics or non-Roman characters?

6. What does the product do if the patron's browser cannot handle diacritics or non-Roman characters?

The star performer in support of foreign languages is Greenstone, which supports Unicode and directly provides for Chinese and Arab languages, in addition to the languages using the Roman character set. Other vendors who support Unicode are VTLS and Endeavor. Ex Libris's DigiTooLibrary product is slated to support Unicode and the display of Latin, Greek, Cyrillic, Hebrew, and Arabic characters, when released.

At the other end of the spectrum is Sirsi's Hyperion, which supports only the English alphabet, although there are plans to market Hyperion in Europe, requiring the support of the full Roman character set with diacritics.

CONTENTdm does not appear to make any effort to accept or display diacritics; however, it will accept the HTML encoded diacritics when entered manually (for example "&auml" for a lower-case "a" umlaut in German).

The Insight Java client supports Unicode; however, there is limited font support for displaying non-Roman languages.

Versions of Internet Explorer 5.1 and later, which support the UTF-8 (2) encoding, do a good job of displaying Unicode-based content.

C. Metadata BYU Assessment

1. Does the system include metadata for each object?

2. Does the metadata follow existing standards? Which standards does it follow (MARC, Dublin Core, specific built-in XML DTDs such as EAD or TEI, or administrator defined DTDs)?

3. Does the product allow the creation of separate formats or templates for each project?

4. Is there a seamless interface between the product and an OPAC? This can take the form of direct access via a URL to objects in the archive. The object URL would then be included in the 856 field of the OPAC.

Only two products appear to use hierarchical metadata: DLXS and EnCompass. EnCompass uses XML as its standard.

Some of the products appear to have a fixed metadata standard that cannot easily be modified by the system or collection administrators. For instance, when DigiTooLibrary is released it may only support MARC and Dublin Core. DLXS also appears to have a set metadata standard. All other products allow a definition of metadata, at least by collection.

Linking between an OPAC, via the MARC 856 field, and a digital object library may be accomplished directly when the library uses an external Web server for storage.

D. Object Presentation BYU Assessment

1. Does the product permit flexible presentation of the objects using templates assigned to collections or individual objects?

2. Does the product allow objects to be organized into groups or collections for presentation?

3. Does the product allow multiple digital objects to be presented as a single object? For example, can all of the pages of a journal be presented as a single object in the proper order?

From our perspective, the best digital library presentation is from Insight by Luna. Insight provides a very flexible presentation to the patron, allowing him or her to gather objects, save them, export them to an HTML format, and create slide shows. Insight is used by many of the leading museums.

CONTENTdm, although lacking in many of the highly usable patron presentation features of Insight, does provide a highly flexible template-driven presentation of the collection and search results page. This allows the collection administrators to customize the overall look and feel of the collection's presentation.

Regarding creating groups of objects for later presentation, Insight and CONTENTdm both provide mechanisms to support this. Insight stores this information on the server, meaning that both the browser and the Java clients have access to the grouped objects. CONTENTdm requires the use of a Java client to create groups on the local workstation.

Where an object consists of many parts, such as a document, they need to be kept together and in order. Hyperion by Sirsi and CONTENTdm both accommodate this.

E. Object Types BYU Assessment

1. Does the product accept and deliver text files as: PDF as a file of images, PDF as a text file, PDF as Acrobat+ textual/image file, flat text, XML encoded text with assigned DTD, and HTML?

2. Does the product accept and deliver images as GIF, JPEG, TIFF, and PNG?

3. Does the product accept and deliver streaming data such as MPEG3, WAV, QuickTime, QuickTime VR, and Real?

4. For object types not supported in the product, or for objects that require a specialized service (e.g., streaming), does the product support creating a link to an external server?

5. For objects external to the product, does the product verify the existence of the object on the external server?

Those products, such as CONTENTdm, that use an external Web server to store and deliver the objects, typically will support any type of file that the Web server can support. With the use of redirections, it would be possible to support virtually any type of customized server, such as a streaming video content.

Products that store content internally may only support specific formats. For example, Hyperion by Sirsi supports PDF, Microsoft Word, ASCII, HTML, SGML, OCR, WAV, MIDI, Real Audio, RAM, MPEG, AVI, MOV, and QuickTime formats. Insight by Luna supports TIFF, PCD, JPEG, BMP, PICT, RASTER, TARGA, PCX, and MrSID. Notable by their absence in Insight are text formats such as PDF.

F. Rights Management BYU Assessment

1. Does the product provide access to content based on a user logon?

2. Is a guest logon possible for access to content without rights management issues?

3. Is rights management at the object or collection level? Can individual objects within a collection have distribution rights different from the rest of the collection?

4. Is the rights management system granular to the individual level? Are rights assigned to individuals based on their membership in one or more assigned groups?

Hyperion by Sirsi provides strong rights management features. Each collection, sub-collection, and object may be assigned a level of access defined by whether the user has no affiliation, has general affiliation, or is a staff member of the library's institution. With an optional add-on package, objects may be assigned to users or groups of users. The Insight by Luna and IBM DB2 Library also have strong rights management features.

Those products that use external file storage typically have rights management features no stronger than what is provided by the Web server. Typically, this means that directories of objects (not individual objects) can be filtered on the requesting IP address or the server can require a user name and password that is administered separately from the digital object library product. CONTENTdm is an example of this.

G. Staff Client Security BYU Assessment

1. Can the staff client be configured to the needs of the specific worker, locking out access to collections or functions that are inappropriate or not needed?

2. What type of client do people working on collections use, a browser or a software client?

3. If the staff client is not a browser, can it be easily distributed and configured without central coordination?

4. How does a staff user identify him- or herself to the system?

5. Can the product be configured to permit and exclude certain functions based on a staff logon?

6. Can the product be configured to permit access to collections based on a staff logon?

Hyperion, Insight, and CONTENTdm provide a staff client run in Windows. Each requires a logon to access the system. Hyperion allows the logon to be configured in a limited fashion to control features (adding versus deleting content), but not collections. CONTENTdm also uses a Web browser interface and configures the logon to access collections, but does not restrict features. The CONTENTdm logon needs to be consistent with the expected Web server logon where the objects are stored. The Insight logon allows access to all collections and all features.

Patron Functions
H. Access to Objects BYU Assessment

1. Does the product support the export of a single object and metadata by the patron to a local PC (Windows, Macintosh, Linux, etc.)?

2. Does the product support the export of multiple objects and metadata by the patron to a local PC (Windows, Macintosh, Linux, etc.)?

3. In what standard and to what format is the metadata exported?

Most content displayed via a browser may be copied by the user. The only exception is that some streaming files cannot be directly copied out of a browser window.

Insight by Luna has the strongest export features for the user. With the Java client, users may select to export groups of objects into an HTML format. The objects, at the pixel density shown on the screen, along with limited metadata in an HTML format, can be copied to the user's workstation. Insight uses MrSID (3) to deliver images at multiple resolution levels.

I. Accessibility BYU Assessment

1. What patron access methods are supported: browsers, client programs, required plug-ins?

2. What browsers and versions are supported?

3. Are smaller market browsers supported, such as Lynx, OmniWeb, WebTV, and Opera?

4. Which patron hardware and operating system platforms are supported: Windows (in all varieties), Macintosh, Linux, other Unix?

5. For all supported patron platforms (hardware, software, browser, and plug-ins), is the presentation and feature set consistent?

There should be as few roadblocks as possible in the way of access to the digital archive. In general, this means that any product that requires a software client, other than a browser, is not ideal. Although Insight by Luna does provide a Web interface (requiring the MrSID plug-in), the Java client for Windows and the MacOS has more features.

The presentation needs to be as consistent as possible across platforms. Since browsers are themselves incompatible between versions and products, this is almost a hopeless task. Insight by Luna has done an excellent job of making its Web interface as close to its browser interface as possible.

Although the smaller market browsers tend to work, they are not generally tested or directly supported by digital object library vendors.

J. Browsing BYU Assessment

1. Does the product support browsing the collection?

2. Does the product allow the patron to select objects of interest and view them collectively as thumbnails? As a list?

3. Does the product allow the patron to save the selected list for later retrieval?

4. Does the product allow the list to be organized according to the patron's criteria: adding, deleting, and reordering?

5. Does the product allow the patron to export the list to a local PC (Windows, Macintosh, Linux, etc.) as persistent URLs into the collection?

Insight starts the patron in a browse mode, showing the entire collection as thumbnails. Searches result in items being removed from view. Hyperion also supports browsing through its use of a hierarchy; however, at this writing thumbnail views of the objects are not visible. (Sirsi plans to add support for thumbnails in a near-future release.) Other products, such as CONTENTdm and Greenstone, only provide views of the objects as the result of a search.

Insight's list management features for objects in the collection are the strongest in the market, allowing creation, reordering, exporting, and printing of objects along with their metadata.

K. Context-free Object Delivery BYU Assessment

1. Does the product allow an external Web page to retrieve an object via a persistent URL outside of any frames or page provided by the product?

2. How does the product handle rights management in this case? Is it possible to provide and maintain a persistent "logon" for rights management from an external Web page?

The purpose behind context-free delivery of objects is to serve the contents of the object library directly into a patron's authored Web page. Ideally, the object library should provide a single consistent location for its objects that can be relied upon into the future. Some object libraries will only deliver objects into their own frame environment.

All of the products that use external storage support direct URL access. Hyperion by Sirsi has merged direct URL access with authentication so that objects under rights management control can be delivered into Web pages directly.

L. Integration with External Indexes BYU Assessment

1. Does the product integrate with external search engines such that an appropriately configured external search engine can submit searches to the product and the product can return results to the originator? What are the interfaces (Z39.50, OAI, other XML)?

2. Does the product provide external search capabilities to other database systems such that search requests originating on the product can be exported to an external search engine? What are the interfaces (Z39.50, OAI (4), other XML)?

It is unlikely that a single product will do everything desired. Consequently, products that allow an external index to access the contents are desirable. There are at least two ways in which this could happen, although there may be others: 1) The product permits external searches using some type of protocol such as the Open Archives Initiative (an XML-based standard) or Z39.50; or 2) The product permits the external delivery of content via a URL.

ALS products generally directly support Z39.50 searches. Greenstone has announced that they are enhancing their product to include Z39.50 searching.

M. Searching BYU Assessment

1. Does the system allow searching of the object metadata?

2. Is the patron led in the search to use specific metadata fields based on the collections being searched? In other words, can the patron search only on metadata fields that are appropriate to the desired collections?

3. Does the system allow searching of the object content?

4. What textual content is searchable (Flat ASCII text, PDF, on-the-fly OCR of textual documents as images, Acrobat+)?

5. If the product provides for authority control, is the patron assisted with authority information on the metadata fields being searched? Is this optional or mandatory?

6. Does the search return thumbnails of image objects when available?

7. Does the product allow searching across collections?

8. Can the product limit a search to a single collection?

Both Insight and CONTENTdm provide a tool to help the patron identify valid search entries for selected fields. For instance, if searching on authors, the patron can select from a list those authors that appear in the archive, rather than searching on entries that do not appear in the archive.

Typically, object content searching is limited to text only. Hyperion has an optional facility for searching text and Adobe Acrobat files.

CONTENTdm and Insight provide excellent thumbnail views of each object.

System Administration
N. Data Loading and Exporting BYU Assessment

1. Does the product permit the programmatic or batch loading of patron data? In what format, and using what standard, is the patron data imported?

2. Does the product permit the import of single objects and metadata via the staff client?

3. Does the product permit the bulk import of objects and metadata?

4. Does the product permit the export of a single object and metadata for administrative users?

5. Does the product permit the bulk export of objects and metadata?

6. In what format, and using what standard, is the metadata imported and exported?

7. When loading patron data and objects, are the distribution and viewing rights included in the load? In what format and using what standard?

CONTENTdm allows the bulk loading of objects but does not provide a means to bulk-load the associated metadata. For very large collections, this could be a problem. Since the objects are kept external to CONTENTdm on a Web server, downloading the objects is a function of the Web server itself. Downloading the associated metadata is supported with ASCII and XML downloads. All user identification and object rights are managed by the Web server.

Hyperion provides for bulk loading of patron data, objects, and associated metadata. Exporting objects and metadata is not directly supported by the product.

LunaPro, the Insight client, provides for both individual and bulk loading of objects and metadata.

Greenstone provides a unique feature in being able to export collections to a CD for distribution. This would be extremely useful where the digital library needs to be distributed to many users in locations where Internet access may not be possible or practical.

O. External Authentication BYU Assessment

1. Does the product support use of an external authentication service, such as LDAP?

2. Are authentication requests to the external server sent over a secure channel?

Most products do not take a broad view when building authentication into their product. Greenstone and CONTENTdm, for example, provide no internal patron identification. CONTENTdm uses an external Web server, which itself can provide for patron rights management.

DigiTooLibrary by Ex Libris and Insight by Luna both expect to include functionality to query an external LDAP server for authentication in a future release.

P. Licensing BYU Assessment

1. What is the cost of the product? How is the product sized?

2. If a clustered server solution is supported, how is it licensed? Worse case is that each instance of the server software is licensed individually instead of as a cluster.

3. Is the staff client licensed separately from the server? If so, what is the cost?

4. Is the patron client licensed separately from the server? If so, what is the cost?

5. Is a site license available for the staff client?

6. What is the yearly maintenance cost?

There is a wide range of prices for digital object library products. The least expensive products are Greenstone, CONTENTdm, and DLXS.

Greenstone is freely available for downloading under the GNU public license.

CONTENTdm provides a free 60-day trial. The product is sold by the number of items supported in the collection. An 8,000-item license costs $5000, while a 64,000-item license costs $15,000.

DLXS is a cooperative effort. The search engine is licensed for $15,000, and support costs $5000 per year. There is a discount for multiple servers.

Other vendors provide pricing only with a formal quote; however, the high end of this market can run from $200,000 to over a million dollars. Clearly, understanding what features are critical to your success and paying only for those features is important.

Some products, such as Hyperion by Sirsi, separately license the staff client. This seems counterproductive in that the vendor should be eager to see as much content incorporated into the product as possible. Limiting the number of staff clients serves only to make the process of building collections difficult.

Q. Object Ownership BYU Assessment
1. Does the product take ownership of the digital objects such that accidental deletion of an object within the system file space or database is minimized?

There are in general two models for managing content: those in which the product takes complete ownership of the digital object, and those that point to a separate Web server where the object resides. There are advantages to both approaches.

Hyperion follows the first model, taking direct ownership of digital objects. When an object is handed off to Hyperion through the collection building client software or through the bulk loading program, the object is renamed and copied into a directory structure on the Unix server created and managed by Hyperion. Although it would still be possible for someone with sufficient Unix file system rights to delete or modify the files, in practice the files are not accessible to most users except through Hyperion, preventing accidental, or perhaps intentional, deletion or modification of the files.

CONTENTdm takes the other approach. When an object is added to the library, the object remains on a Web server, potentially separate from the server running CONTENTdm. This has the advantage of offloading all of the rights management issues to the Web server. It also permits the library to host all objects that can be served by a Web server; no specialized programming in the library product is necessary to include new object types.

R. Scalability BYU Assessment

1. How does the product scale?

2. Is clustering supported?

3. Does the product run on systems that support high-capacity storage? Does the product integrate with a hierarchical storage management system?

Most of the products will run on some version of Windows. Experience with Windows as a server varies greatly. Some organizations seem to have no difficulty, while others are not as successful with scaling Windows server applications.

None of the products reviewed directly supports a clustered solution; however, some operating systems, such as Windows NT, Sun Solaris, HP-UX, and Linux, can provide a great deal of upward growth through multi-processing and clustering that is transparent to the application.

S. Stability and Reliability BYU Assessment

1. What operating systems and hardware does the product run on?

2. Which is considered the primary OS and hardware?

3. What is the delay between product availability on the primary platform and on the secondary platforms?

4. If the product supports clustering, is there automatic fail-over or does the user need to logon again or reconnect?

5. Does the product support separating file system, application, database, and reporting functions on separate servers? Are multiple licenses required for this configuration?

6. Are separate licenses required for a clustered solution?

7. What database does the product use?

8. What Web server does the product use?

Following is a list of the operating systems which are supported:

CONTENTdm - Windows NT 4.0, Windows 2000, Linux, AIX and Solaris

Endeavor, ENCompass - NT and Unix (Solaris)

IBM, DB2 Library - AIX, Mac OS, Windows 95 and Windows 98 and Windows NT

Luna, Insight - Windows NT 4.0, Sun Solaris, other Unix (with Java support)

New Zealand Digital Library Project, Greenstone - Windows NT, Linux (the documentation also mentions Mac OS X, although there is no specific distribution for it)

Sirsi, Hyperion - HP-UX, Sun Solaris, and AIX

University of Michigan, DLXS - Solaris

VTLS, Visual MIS - Windows

BYU's Decision

After this review, we decided that none of the products met all of the requirements for our planned collections. We have selected two products to use in addition to Hyperion (5). From our perspective, Hyperion's strength continues to be in the display and organization of textual documents. Its ability to search flat text as well as Adobe Acrobat files is important to our collections of theses and other scholarly publications. Insight by Luna provides, in our opinion, an excellent presentation of visual content, particularly for our collections from the BYU Museum of Art. We are currently working with Luna to build those collections, which should be available by January 2002. Lastly, CONTENTdm, through its use of templates and visual presentation, was selected for the Overland Trails project, a Library of Congress grant to BYU, the University of Utah, and other Utah institutions to digitize diaries and other printed materials of the pioneers to Utah, Oregon, and California. Although the exhibit will not be completed until January 2002, portions may be seen at zoram.byu.edu and www.lib.utah.edu.

Acknowledgements

I would like to thank Randy Olsen, Amy Stucki, and Scott Eldredge of the Lee Library for their assistance in building BYU's list of digital object library requirements and in evaluating products. I would also like to thank the Digital Library Initiative committee for their contributions.

Footnotes

(1) Two alternative lists of requirements may be found at the California Digital Library and California Digital Library Technical Requirements for Database Vendors and at Kansas State University. (Back)

(2) See http://www.cl.cam.ac.uk/~mgk25/unicode.html for more information on UTF-8 encoding of Unicode characters. The Greenstone demonstration site at www.nzdl.org has a collection in Arabic and another in Chinese using the UTF-8 encoding. (Back)

(3) MrSID is a product of Lizard Tech www.lizardtech.com and provides a way to display digital images at multiple levels of resolution. (Back)

(4) Information regarding the Open Archives Initiative can be found at www.openarchives.org. (Back)

(5) BYU's implementation of Hyperion can be seen by going to the library's home page at www.lib.byu.edu and selecting the link to the "Digital Library." From within Hyperion, select "Browse" to view the current collections. (Back)

 

Benchmarking Conversion Costs: A Report from the Making of America IV Project

Maria Bonn
Digital Library Services
University of Michigan Library
mbonn@umich.edu

From February 1999 to February 2001, the University of Michigan University Library engaged in a large-scale digitization project entitled "The Making of America IV: The American Voice, 1850-1877," commonly known as MoA4. MoA4 extends and tests the methods used in the original Making of America project, a collaborative endeavor between the University of Michigan and Cornell University, funded by the Mellon Foundation—referred to as MoA1(1). Moa4 increased the content of the Making of America by almost 500%. The project, also funded by The Andrew W. Mellon Foundation, was undertaken to answer the question: what are the costs and methods of using digital technologies for preserving and deploying monographic materials? The Mellon Foundation welcomed the creation of this digital content, but its primary interest was the accompanying documentation of costs and methods. This article is one venue for sharing some of that information; costs and methods are fully documented in Assessing the Costs of Conversion.

The University of Michigan MoA is a digital library of books and journals focusing on 19th Century America. MoA aims to preserve and make accessible through digital technology a significant body of print materials in United States history and seeks to develop protocols for the selection, conversion, storage, retrieval, and use of digitized materials on a large, distributed scale. Primary responsibility for the production of the MoA system lies with Digital Library Production Service (DLPS), a unit within the Digital Library Services division.

At the conclusion of MoA1 in 1996, the University of Michigan collection contained approximately 1,600 books and 50,000 journal articles, a total of over 630,000 pages (2). MoA4, begun in 1998, added almost 8,000 volumes to the MoA collection—over 2,500,000 pages of monographic content. MoA4 converted the vast majority of the 1850-1876 U.S. imprint, English language materials stored in the University of Michigan Buhr remote shelving facility. These volumes had been removed to remote storage either because of low use or deteriorating physical condition. The body of material converted through both MoA1 and MoA4 is so substantial that it has transformed both the perception of the size of collections on the Internet and the ways in which libraries are currently thinking about conversion activities.

The items in the MoA4 project underwent a simple and automatic conversion process. Each volume was collated page-by-page to ensure that the volume was complete. Missing pages and significant deterioration in the condition (other than expected high levels of embrittlement) were noted. Each of the volumes was disbound, a processing/ID sheet attached, and sent to an outside scanning vendor. At the scanning facility, pages were scanned as preservation-quality (600 dpi 1-bit) image files and burned to CD. Due to the age and fragility of the materials, the pages were, for the most part, hand placed on the scanner. The vendor was able to sheet-feed some of the materials that were in better condition, at a slightly reduced cost. Upon receipt of the CDs, the University of Michigan Preservation Department undertook a quality control process for the images, using a sample of approximately 5%. Images were inspected to ensure completeness, and to assess the legibility and alignment of the images. Page image files were then processed to generate OCR and simple SGML to enable search and navigation. These sit "behind" the image files that are still the primary means of access to the content.

In some ways MoA4 represents a significant turn from digital projects to digital production, a turn that the digital library community in general has begun to take. Rather than exploratory learning projects undertaken as additional and often grant-funded activities, digital conversion activities and system building are gradually being mainstreamed into library activities and budgets and are becoming part of the routine work of large libraries. Although MoA4 was largely grant-funded and did entail both significant ramp-up and dramatic tailing-off periods, in its duration and scale, it illustrates some of the possibilities of fully developed digital production activities.

Despite the fact that more and more institutions are turning to large-scale, routinized conversion activities, little work has been done to track and benchmark the costs of these efforts. Such benchmarking is important in helping institutions to plan effectively, both in terms of capacity and budgeting. Benchmarking is also important for funding agencies to be able to determine realistic proposals. Although the numbers reported here may be not be universally applicable, they represent one attempt to document and assess the costs of conversion and to provide data to help the digital library community refine its understanding about those costs.

Methods for measurement

The cost analysis uses the page as the unit for representing the costs. While page length varies widely between volumes, the total number of pages in the project remains constant. The conversion cost of each page carries within it a considerable number of factors: human labor from the project team, the costs of hardware and software, the costs incurred from outside contractors, and so forth.

Because so little benchmarking work has been done in this area, one of the first tasks of the project was to design a methodology for assessing the costs(3). As a first step, we studied the workflow and broke the conversion process down into distinct component activities (see Table A). These steps can obviously be broken down further, but this analysis attempts to aggregate data at a meaningful level of detail. Staff then conducted time studies for each step, measuring average productivity per hour. These rates are used as one benchmark of productivity and cost. At the conclusion of the project, we also calculated the real costs for the duration of the project, based on staff, equipment, and invoices (see below for some explanation of the disparity between these two cost reports). Finally, real costs and output (pages converted) were tracked for each month. This enabled us to compare months, to analyze variation between months, and to understand conditions that promoted maximum productivity and minimum cost.

Table A: Component activities in the digitization process that were studied
along with the necessary staff for each step.

Activity Staff involved
Retrieval of volumes from storage Preservation prep staff
Charging out of volumes Preservation prep staff
Identification, collation and repair Preservation prep staff
Disbinding Preservation prep staff
Removal of covers Preservation prep staff
Packing and shipping Preservation prep staff
Scanning and CD burning Outside vendor
Creation of page level metadata Preservation staff
Quality Control Preservation staff
OCR and SGML generation OCR operator

Some notes on interpreting costs

As will become obvious, the sum total of the costs of the component steps as determined by time studies (see Table B Column 4 and Table C) is less than the actual cost of the digitization process (see Table B Column 1). This will be particularly evident in the costs of the steps of the preparation process (Table C). The costs of the individual components reflect the amount of time and labor involved in those components when the preparation staff was working consistently and efficiently on that part of the process. The staff was constantly multi-tasking, however, and moving from one stage of preparation to another; when the work room got full, books had to be collated and disbound, shipments had to go out on monthly deadlines, work had to be adjusted accordingly, and so on. This meant that efficiencies and economy of scale that might take place from concentrating on one piece of the process until it was completed were difficult to obtain. Not only was that efficiency impossible, it was probably also undesirable as the variety of the prep tasks was important in breaking up often very repetitive, sometimes tedious work, and in keeping the staff "fresh."

Of course staff members also go on vacation, get sick, need time to recover from particularly big pushes to meet deadlines, and otherwise behave in perfectly human ways that interfere with maximum productivity. This helps us to understand why the lowest human costs achieved in the most productive month did not hold, even over a 3-month average in the most productive and cost-effective part of the project. Anne Kenney (4) has suggested that in order to take into account the variability in human productivity we should use a weighted rate that assumes 75% of the time spent on a project is "production time." Using that formula, we can account for the difference between the most productive month (in which, in fact, the prep staff were working at a heroic pace) and the three-month average.

Finally, these costs do not include the costs of online implementation. The MoA4 project took place in the context of an established digital library system for storing the material and for making it accessible. This system includes servers, considerable disk space, a search engine, staff to deploy the materials and to develop the software for search and retrieval. All of these elements are in place as part of the daily work of DLPS, and MoA4 was just one of many DLPS projects.

In order to understand both the actual costs involved in the MoA4 project and what costs might be possible to achieve in a similar endeavor, the table below represents costs in four different ways. The range of these costs should indicate what it is possible to achieve as well as illustrate some of the investment needed in a period of ramp-up and training.

Table B: Side by side comparison of fours costs

        1. total project 2. most productive month 3. three month average 4. measured by component activity
prep (see table below) $ 0.06 $ 0.03 $ 0.04 $ 0.02
shipping $ 0.01 $ 0.01 $ 0.01 $ 0.01
QC and page level metadata creation per-page costs $ 0.01 $ 0.01 $ 0.01 $ 0.006
OCR and SGML generation $ 0.04 $ 0.02 $ 0.02 $ 0.04
scanning $ 0.13 $ 0.13 $ 0.13 $ 0.13
process management $ 0.01 $ 0.01 $ 0.01 $ 0.01
total (5) $ 0.27 $ 0.21 $ 0.22 $ 0.21

Column 1: The actual per-page costs as measured over the life of the project and as based on staff salaries, costs of equipment and software and invoices.

Column 2: The per-page costs in a synthetic "most productive" month. This is a synthetic month since the highest production levels for preparation and OCR (the two most variable activities) did not coincide; the OCR had to wait to kick into high gear until September when the scans came back from the most productive preparation month in July. This "month" is therefore constructed out of July and September on the hypothesis that with sufficient production levels sustained, these costs would be possible to consistently achieve. On the other hand, the difference between this most productive month and the three-month average is entirely in the prep component, the component based almost exclusively in human labor. This would suggest that the productivity attained in that month might be difficult to sustain over any protracted period of time (arguing for using a weighted rate as posited above).

Column 3: A three-month average cost calculated over June, July and August of 2000, one year into the production phase of the project. These costs reflect the production levels and efficiencies that are possible to achieve and sustain with a complete and fully trained staff, with routines firmly in place.

Column 4: The sum of the component costs as determined by time studies. As discussed above, the sum total of the component steps is less than the actual cost of the digitization process. It is interesting to note, however, that despite the time and money lost in moving from one activity to another, as well as the normal distractions of the work day and human lives, the sum of these costs is indeed the same as that achieved during the project's synthetic best month. This would indicate that given a sufficient flow of materials and trained staff, these inefficiencies are negligible to overall costs and can be minimized.

The following table details the costs of the activities involved in the preparation of the materials as determined by time studies. Staff members were asked to measure their average hourly productivity for each step in the preparation process and total and per page costs were then calculated using the average hourly salary for full time prep staff (with benefits).

Table C: Cost of component prep activities as measured by time studies (6)

volumes retrieved per hour 40
hours for retrieval 188.6
total cost for retrieval $ 2,439.95
cost per-page $ 0.0009
volumes charged out per hour 40
hours for charging 188.6
total cost for charging $ 2,439.95
cost per-page $ 0.0009
volumes collated per hour 3
hours for collation 2514. 67
total cost for collation $ 32,532.63
cost per-page $ 0.01
covers removed per hour 30
hours for removing covers 251.47
total cost for removing covers $ 3,253.26
cost per-page $ 0.0017
packed per hour 21
hours for packing 359.24
total cost for packing $ 4,647.52
cost per-page $ 0.0017
total prep cost per page (rounded to the nearest cent) .02

Conclusions

Even though the costs reported above represent a careful inventory of the activities involved in the production of MoA4, there are, inevitably, some costs not directly accounted for. For example, unpacking and organizing the books upon their return from the vendor has occupied Preservation staff periodically long after the end of the project. Astute readers and those who have been involved with similar projects will no doubt discover other costs not accounted for in our study; it is our hope that they will share that information and thus augment our reported findings.

MoA4 has taught us a number of lessons potentially of value to the whole library community that are now shaping policy and practice at the University of Michigan library. By understanding the resources involved in ramping up to large conversion projects and the amount of material necessary to achieve the most efficient use of capacity, we can make informed decisions about how best to undertake such projects. The library community may, for example, see considerable value in moving away from distributed exploratory projects to more centralized/cooperative production facilities. At the local level, MoA has taught us that digital reformatting benefits the library and its users by providing both cost-effective preservation and functionally effective access. Because of the MoA4 process, more than 8,000 titles that were in danger of disintegration have undergone preservation-quality reformatting. Moreover, the review process culled more than 200 volumes of artifactual value that have now been removed from storage to special collections and that, when necessary, are being considered for other preservation treatment options. As well as physically "rescuing" this content, the project has brought these texts back into intellectual circulation. Many of the volumes had not circulated for decades. Now they are part of an online collection that has been searched an average of 120,000 times and in which an average of 835,000 pages have been viewed each month in 2001.

At the University of Michigan Library, we have become convinced that production level conversion activity is possible to attain and that such sustained activity represents a considerable savings over sporadic conversion projects. Such conviction has encouraged the University of Michigan library to move toward digital reformatting as the default preservation reformatting activity, a transition that is now underway.

Footnotes

(1) The Making of America 2 and Making of America 3 are non-affiliated projects carried out at other institutions. They are at various stages in their development. (Back)

(2) The Cornell Making of America collection exists as a separate body of materials, currently totaling 907,750 pages (967 monographs and 955 serial volumes). (Back)

(3) The notable exception to this is the work reported by Anne R. Kenney in "Projects to Programs: Mainstreaming Digital Imaging Initiatives" in Moving Theory into Practice: Digital Imaging for Libraries and Archives (Mountain View, CA: Research Libraries Group, 2000, Anne R. Kenney and Oya Rieger, editors and principal authors. 153-175). Her work on weighted rates is particularly valuable in explaining some of the variations in costs reported here. (Back)

(4) Anne R. Kenney, ibid. (Back)

(5) Columns may not sum properly due to rounding errors. (Back)

(6) Costs per-page values in this table have been subject to rounding. (Back)

 

 

Highlighted Web Site

Ellisisland.org is the Internet home for the American Family Immigration History Center, which oversees the Ellis Island Archives. This collection includes ship manifests documenting the arrival of passengers and crew in New York between 1892 and 1924. Over 3.5 million of the handwritten manifests have been scanned from microfilm as TIFF files, and are made available on the Web through a tif2gif conversion program. In addition, a searchable database gives access to machine-readable versions of the manifests, documenting approximately 22 million individuals. These records were transcribed by hand, by members of the Church of Jesus Christ of Latter-day Saints over a period of roughly seven years, beginning in 1993. As a tool designed to support genealogical research, the system is set up for users to search by passengers' names only, and not by ship or date of entry. When the site was officially launched in April 2001 it was overwhelmed by visitors, despite being equipped to handle as many as 100,000 visitors at once. Plans were quickly made to greatly increase server capacity. Ellisisland.org represents a major accomplishment in the digital conversion of historical manuscripts, and has proven itself to be a valued resource among genealogists and historians.

FAQ

For years now I've been hearing references to digital paper, electric paper, electronic paper, electronic ink and similar sounding names. What are these products and how do they differ?

In recent years, these terms have been used somewhat interchangeably in media reporting on information technology, as well as by several different companies, to refer to a variety of electronic and paper-based goods and services. At various points, different companies have either used—or applied to use—these terms as trademarks for their products. For instance in 1994, No Hands Software registered the term "DigitalPaper" (all one word) for the format used by its Common Ground electronic document exchange and distribution program. No Hands lost the marketplace battle for document exchange software with Abobe's Portable Document Format (PDF) and the registration for "DigitalPaper" was finally cancelled in July 2001. In 1991, Adobe Systems itself filed but then abandoned an application to trademark "Electronic Paper."

Thus, we cannot accurately say that any of these terms refers to a specific, unique product, even though it may be closely associated with a particular product. That situation may change as pending applications to use these terms as trademarks are processed. So the best way to answer the question may be to examine several products currently being developed and marketed that are referred to as "digital paper," "electronic paper," etc.

The products discussed below represent two major philosophical camps. The first believes in the inevitability of a paperless future and seeks to replace paper with electronic surrogates. The second camp includes those resigned to (or even embracing) a future where paper continues to play a significant role in human communication.

Technologies designed to replace paper with other materials

Frequently referred to as "electronic paper," the oldest of the proposed paper substitutes was developed at Xerox's Palo Alto Research Center (PARC) in 1975. Dubbed "Gyricon" (Greek for "rotating image"), it consists of a flexible transparent membrane embedded with tiny spheres, each with a light-colored hemisphere and a dark-colored hemisphere. Small electrical charges cause the spheres to rotate either dark or light side up, allowing the creation of a high-contrast image. Unfortunately, in the 1970s, the electronics required to cause the tiny spheres to spell out a coherent message was bulky and external to the Gyricon medium and the first prototypes had none of paper's flexibility and ease of handling.

Xerox put Gyricon on the shelf in 1977 and did not return to it until the mid-1990s when improvements in manufacturing processes allowed mass production of a more reliable product. However, the solution to the problem of supplying power and intelligence to the Gyricon medium without sacrificing its portability and flexibility continued to elude Xerox. Finally, in December 2000, Xerox created a subsidiary (Gyricon Media) to market a Gyricon-based product trademarked as SmartPaper™. SmartPaper's first major application is as a replacement for printed store display signs. These can be placed in a rigid frame containing the necessary electronics to allow the sign's message (product name, price, etc.) to be updated at any time through an in-store network. Though being marketed as a replacement for a paper product, SmartPaper is still well short of meeting the initial concept of Gyricon as a replacement for paper in books, magazines, and newspapers.

A competing product, usually called, "Electronic ink," also emerged from the lab in the 1990s. Instead of bi-colored rotating spheres, Electronic ink uses liquid-filled capsules containing tiny particles, some white and some black. As with Gyricon, differences in the particles' electrical charge allow their position to be manipulated through application of a small electric field. Electronic ink is being marketed by E Ink Corporation.

Though E Ink's first marketable product is also an in-store display sign, the company has made progress toward a truly flexible, programmable display. Through a collaboration with Lucent Technologies, E Ink is attempting to develop a flexible, rewritable medium that would not require rigid external control circuitry. The key technology is plastic transistors, which allows electronic circuitry to be literally printed on a thin film of plastic. E Ink has trademarked the name RadioPaper™ for a medium they envision as someday "allowing users to fully realize the dream of anytime, anywhere information."

In the near future, however, both these products may see more practical application as replacements for monochrome LCD screens, than as substitutes for paper in everyday use. Once perfected, displays made from SmartPaper or RadioPaper will overcome many of the drawbacks of LCD screens in portable and handheld devices. These technologies potentially offer much lower power consumption, higher contrast, wider viewing angle, higher resolution and lower cost than traditional LCD displays.

However, each manufacturer still envisions the day when its product will become paper's true and rightful heir. They see a new generation of e-books that much more closely resemble today's hardcovers and paperbacks than the current crop of bulky PDA-like book surrogates. Such e-books would have multiple, flexible pages and a binding, just like traditional books. The binding would contain the battery, memory and a communications port (perhaps wireless) to allow the "pages" to be filled with whatever content the reader desired. They would retain most of the physical qualities of the traditional book so valued by readers, while offering the "holy grail" of an infinitely rewritable blank slate. Though crude prototypes of such e-books have been demonstrated, most estimates are that marketable versions are still many years away.

Technologies designed to give ordinary paper a digital twist

Paper has endured as a preferred information storage medium for centuries, and not everyone is convinced that's going to change anytime soon. In fact, even Xerox is hedging its bets. Another technology from Xerox's PARC is DataGlyphs, often referred to as "digital paper." (1) DataGlyphs are a form of encoded data designed to be printed on ordinary paper for later scanning and decoding by a computer.

Each DataGlyph consists of a series of tiny hash marks, slanted either 45 degrees right or left, like a slash or backslash. Each hash mark stands for one bit (binary digit). Xerox claims that a one-inch square DataGlyph can encode up to 1000 bytes of data, but the density achieved depends on many factors, such as the printing resolution, the level of error correction, and the type of data compression. DataGlyphs are designed to be unobtrusive, since they appear to be simply a shaded area on the page and can be effectively hidden by incorporating them into a graphic symbol or company logo.

Like bar codes, DataGlyphs are designed to enhance a piece of paper's ability to move through an automated workflow and reduce the need for redundant data entry. They can also be used to improve security and reduce counterfeiting of negotiable paper documents such as checks and securities. DataGlyphs are designed to resist corruption and are able to withstand (up to a point) stapling, coffee stains, and stray ink marks without loss of data integrity.

Xerox's investment in DataGlyph technology suggests it doesn't expect the paperless office to come about anytime soon. It sees DataGlyphs as a mechanism to create hybrid paper/electronic documents that retain paper's comfort, portability, and readability while enhancing its data carrying capacity and integrating it more tightly into a computerized workplace. Xerox has demonstrated the use of DataGlyph elements to substitute for halftone dots used to create the patterns of complex graphic images, thus creating a printed analog image composed of embedded digital data. Such embedded data could contain various forms of metadata designed to facilitate identification, authentication, copyright control, or even copy resistance.

Another product being called "digital paper" anticipates restoring paper to its once central role as the primary information capture medium. The Swedish company Anoto (the name is derived from the Latin word for "I scribble") is developing ink on paper technology that turns an ordinary-looking piece of paper into a generalized data input screen, and a somewhat squat-looking pen into a data storage and transmission device that establishes an intimate connection between paper documents and nearby wireless communications networks.

When you write on a piece of Anoto paper with an Anoto pen, the memory in the pen's barrel stores the entire document you create and is able to produce a graphic file exactly reproducing every mark on the page. That's because the special grid of dots on Anoto paper allows the pen to know its absolute position on the page at all times. Each sheet of Anoto paper has special locations for indicating what you want to do with what you've written, such as transmitting it by email or fax. The Anoto Pen knows that when a mark is made in those locations, it should send the data in its memory to a wireless data port on the network for further processing.

Most of the data is transmitted as a single graphic image. However, certain areas of each page are reserved for input that will be processed with OCR (Optical Character Recognition) so that email addresses, subject lines, phone numbers, etc. can be properly interpreted as text. Though Anoto's primarily graphical view of writing limits its usefulness for lengthy text documents, it also frees it from the constraints of ASCII and removes barriers for hybrid text and graphic documents as well as for non-Roman alphabets and mathematical or scientific symbols.

How does Anoto know what to do with your document once you send it from the pen? The technology behind it is a bit daunting. Each piece of Anoto paper is unique. In effect, each sheet represents a tiny fraction of a huge virtual surface about 1.8 million square miles in area. Also, each Anoto pen has a unique serial number and can be separately addressed. Anoto will sell portions of the virtual surface to companies that make and use paper products. Everything from sticky notes and legal pads, to order and reservation forms, will be printed on paper containing Anoto's proprietary grid of dots.

Anoto is also setting up an array of specialized name servers that can be queried to determine who owns the portion of the virtual surface that the data just transmitted was written on. Once that's known, the transaction can be correctly completed.

Although Anoto is conceived primarily as a facilitator for e-commerce transactions, it could give paper an overall boost as a communications medium, even within highly computerized environments. By overcoming paper's most serious drawback from an information sharing point of view, it might just revive the lost art of scribbling.

Further reading:

Silberman, Steve," The Hot New Medium: Paper," Wired, April 2001.
Anoto Company FAQs
Hecht, David L.," Printed Embedded Data Graphical User Interfaces," IEEE Computer, March 2001.
Xerox DataGlyphs Overview
Mann, Charles C.,"Electronic Paper Turns the Page," Technology Review, March 2001.

Footnote

(1) Compounding the confusion even further, Xerox PARC has developed DigiPaper, an image-based document representation that uses token-based image processing to obtain very high compression for bitonal images. (Back)

—RE

Calendar of Events

Long Term Archiving of Digital Documents in Physics
November 5-6, 2001
Lyon, France
The charge of the International Union of Pure and Applied Physics (IUPAP) Working Group on Communication in Physics is to address the subject of the long-term availability of electronic publications. This conference will convene a group of stakeholders to address these issues.

School for Scanning: Creating, Managing, and Preserving Digital Assets
December 3-5, 2001
Delray Beach, FL
Sponsored by the Northeast Document Conservation Center this workshop provides current, essential information for managers of paper-based materials planning to create, manage, and preserve digital collections.

Second International Workshop on New Developments in Digital Libraries
Call for Papers: Due November 15, 2001
To be held: April 2-3, 2002, Ciudad Real, Spain
This workshop will serve as a forum to gather researchers, practitioners, and students. Topics of interest include: economic and management issues, metadata issues, digital library prototypes, systems interoperability, and digital library development.

DLM-FORUM 2002
Call for Papers: Due November 15, 2001
To be held: May 7-8, 2002, Barcelona, Spain
The DLM-Forum 2002 theme is access and preservation of electronic information. The objective of the forum is to examine best practices and practical solutions and to achieve concrete results in this area.

Announcements

Dublin Core Metadata Element Set Approved
The National Information Standards Organization (NISO) and the Dublin Core Metadata Initiative (DCMI) announce the approval by American National Standards Institute of the Z39.85-2001 metadata element set. DCMI began in 1995 and brought together librarians, digital library researchers, content providers, and text-markup experts to improve discovery standards for information resources.

SEPIA (Safeguarding European Photographic Images for Access) Web site Updated
New features have been added to the Web site that include current research, training, calendar of events, information on scanning equipment, handling procedures, and preservation aspects of digitization.

The National Library of the Netherlands Begins Long Term Preservation Study
This Web site describes the Library's study of long term digital preservation. The goal is to investigate in detail the issues and their impact on current digitization efforts.

Building and Sustaining Digital Collections: Models for Libraries and Museums
As libraries and museums take advantage of the Internet to find new opportunities to serve their traditional users and to attract new users, they face challenges in managing their collections. For the past several years the Council on Library and Information Resources (CLIR) and the National Initiative for a Networked Cultural Heritage (NINCH) have been addressing these issues, and as part of that effort, they have released a report on a meeting held with business and legal experts, technologists, and funders.

The Committee on Digital Preservation of the Conference of Directors of National Libraries (CDNL) Creates Draft Resolution on the Preservation of the Digital Heritage
The Dutch government has agreed to carry this draft resolution and has formulated a final version of the text as an amendment to the draft program and budget for 2002-2003 (31 C/5) of United Nations Educational, Scientific and Cultural Organization (UNESCO).

Institute of Museum and Library Services Announces 2001 Grant Recipients
Funding has been announced by the Institute of Museum and Library Services for a number of projects that will focus on digitization. They include: Brandeis University Libraries project to digitize the lithographs of Daumier; Cornell University Library project to preserve and digitize a collection of ephemera, published materials, and artifacts from U.S. national political campaigns; and the Washington Research Library Consortium plans to develop a collaborative digital production center.

Saganet Web site Now Available
The National and University Library of Iceland and Cornell University in association with the Árni Magnússon Institute in Iceland have completed a cooperative project of large scale digitization of 380,000 manuscript pages and 145,000 printed pages. The project focuses on Icelandic sagas. An economic analysis and a usability study were also conducted.

Moving Theory into Practice: Digital Imaging for Libraries and Archives
This publication has been selected as the recipient of the 2001 Society of American Archivists' Waldo Gifford Leland award for outstanding publication.

Preservation Management of Digital Materials: A Handbook
Previously available in draft form, the handbook by Maggie Jones and Neil Beagrie has been published. For ordering information contact: preservation@jisc.ac.uk.

Launch of e-TERM Project Web site
A collaborative project has been initiated to develop a European program for training in electronic records management to meet the needs of administrators, information professionals, archivists, and records managers.

ARCHIVE-COMM-L Medical Image Archive Listserv
Medical images are collected and stored in networked archives. The technology and applications of image archives is rapidly evolving and demands interdisciplinary expertise to identify opportunities for further development, exploit the potential of data sets collected with advanced imaging systems, and solve current problems in screening, diagnosis, and therapy. The listserv discusses the advancement of medical image archives for clinical research, medical informatics, related technologies, and applications.

Selection and Presentation of Commercially Available Electronic Resources: Issues and Practices
Published by the Digital Library Federation and the Council on Library and Information Resources this report examines strategies for managing the costs of commercial online materials.

California Digital Library Standards Documents
The California Digital Library (CDL) has recently adopted several guidelines for the use of standards in its collections and services. Notable among them are guidelines for the creation and encoding of digital objects, and best practices for the encoding of finding aids.

Virtual Libraries in the New Millennium Conference Now Online
Held in Atlanta, GA in May 2001 this conference included case studies of virtual library projects, with a focus on future directions; an update on standards and best practices; discussion of selection and access issues; and an overview of networking resources needed in the future to support the growth of the virtual library.

Open Language Archives Community (OLAC) Announces a Cross-Archive Searching Service
OLAC is currently harvesting over 9,000 metadata records from ten participating archives. They include the Linguistic Data Consortium, European Language Resources Association, Deutsche Forschungszentrum für Künstliche Intelligenz, Alaska Native Language Center, Perseus Project, American Philosophical Society American Indian Manuscript Collections, Typological Research Center, Langues et Civilisations à Tradition Orale, Comparative Bantu Online Dictionary, and the American Indian Studies Research Institute.

 

RLG News

RLG to become work coordinator for METS
METS is a generalized metadata framework, developed to encode the structural metadata for objects within a digital library and related descriptive and administrative metadata. METS provides for the responsible management and transfer of digital library objects by bundling and storing appropriate metadata along with the digital objects. Those managing digital files will want to learn more about METS, which can help to structure data for presentation and/or archiving.

Through the last several years, the Digital Library Federation (DLF) has undertaken the initial coordination of this important work. METS' roots are based in the DLF-sponsored and the NEH-funded Making of America 2 (MOA2) project. At the close of the MOA2 project, the DLF sponsored several meetings to continue the development of MOA2 into a widely applicable encoding format and METS was born. In the coming months, as work continues to develop this emerging standard, RLG will step up as the new work coordinator. The Network Development and MARC Standards Office of the Library of Congress will continue maintain the standard and documentation through its official METS web site.

For more information about METS, please see Merrilee Proffitt's article, "Touring the Information Landscape: RLG Backs METS," in the October 2001 issue of RLG Focus.


OCLC/RLG Working Group on Preservation Metadata Releases New Recommendation
The OCLC/RLG Working Group on Preservation Metadata has just released a report with the group's recommendation on Content Information, one of at least two reports the group will produce containing recommendations on preservation metadata and the OAIS Information Model. To quote the report's introduction, "The Open Archival Information System (OAIS) Reference Model defines Content Information as 'the set of information that is the original target of preservation. It is an Information Object comprised of its Content Data Object and its Representation Information.' In a digital archive, the Content Data Object is the bit sequence or set of bit sequences toward which the preservation action is primarily directed."

The working group will continue to follow up on the earlier white paper, Preservation Metadata for Digital Objects: A Review of the State of the Art, as it works to develop a comprehensive preservation metadata framework applicable to a broad range of digital preservation activity. For more information about the group's activities, please visit the new working group website: http://www.oclc.org/research/pmwg/ .

 

Publishing Information

RLG DigiNews (ISSN 1093-5371) is a newsletter conceived by the members of the Research Libraries Group's PRESERV community. Funded in part by the Council on Library and Information Resources (CLIR) from 1998-2000, it is available internationally via the RLG PRESERV Web site (http://www.rlg.org/preserv/). It will be published six times in 2001. Materials contained in RLG DigiNews are subject to copyright and other proprietary rights. Permission is hereby given for the material in RLG DigiNews to be used for research purposes or private study. RLG asks that you observe the following conditions: Please cite the individual author and RLG DigiNews (please cite URL of the article) when using the material; please contact Jennifer Hartzell, RLG Corporate Communications, when citing RLG DigiNews.

Any use other than for research or private study of these materials requires prior written authorization from RLG, Inc. and/or the author of the article.

RLG DigiNews is produced for the Research Libraries Group, Inc. (RLG) by the staff of the Department of Preservation and Conservation, Cornell University Library. Editor, Anne R. Kenney; Production Editor, Barbara Berger Eden; Associate Editor, Robin Dale (RLG); Technical Researchers, Richard Entlich and Peter Botticelli; Technical Assistant, Carla DeMello.

All links in this issue were confirmed accurate as of October 10, 2001.

Please send your comments and questions to preservation@cornell.edu.

Contents SearchHome

Trademarks, Copyright, & Permissions