|
Table of Contents
Feature
Article 1
"We don't know the first thing about digitization:" Assessing
the Need for Digitization Training in Illinois, by Trevor Jones and Beth
Sandore
Feature Article 2
Integrating a Free Digital Resource: The Status of Making of America
in Academic Library Collections, by Kizer Walker
Highlighted Web Site
METS: Metadata Encoding and Transmission Standard
FAQ
Web Log Analysis, by Lee LaFleur
Calendar of Events
Announcements
RLG News

Editors' Note:
This issue marks the beginning of our sixth year of publishing RLG
DigiNews, and we will be celebrating our anniversary in several ways.
First you will note the new look and feel of the publication. RLG DigiNews
has had several facelifts over the years, but we are quite taken with
this new design and hope you agree. We're also adding new features. Beginning
with this issue, you will be able to print individual articles and FAQs.
Just click on the printer icon
or accompanying text at the top of each article/FAQ.
We also are pleased to announce that Nancy Y. McGovern has become the
co-editor of RLG DigiNews. Nancy is the Digital Preservation Officer
at Cornell University Library and currently leads its digital imaging
and preservation research unit. Prior to coming to Cornell in August 2001,
Nancy managed and coordinated digital implementation projects at the U.S.
National Archives and Records Administration, and later served as a consultant
to a number of research projects, most recently the Digital Preservation
Testbed Project of the Dutch Government. She is working on her PhD in
digital preservation at University College London.
In
April we will publish a special anniversary issue of RLG DigiNews,
and would like to get your feedback on the journal, as well as suggestions
for the future. We always welcome comments,
but for this issue we have devised a special online
survey. Please help us make RLG DigiNews responsive to your
needs by taking the ten minutes to complete this survey. Many thanks!

print this article
"We
don't know the first thing about digitization:"
Assessing the Need for Digitization Training in Illinois
Trevor Jones
Illinois Digitization Institute,
University Library,
University of Illinois at Urbana-Champaign
trevorj@staff.uiuc.edu
Beth
Sandore
University
Library,
University of Illinois at Urbana-Champaign
sandore@uiuc.edu
Ask
non-specialists what it takes to complete a digital imaging project, and
responses will range from a desire to "slap it on a scanner and go"
to uncomprehending glassy-eyed stares. The reality lies somewhere between
these two extremes, but it is apparent that many cultural heritage professionals
are confused by the digitization process. Most are interested in digitizing
some part of their collections, but often possess only a vague idea of
how to begin. Although great advances have been made in the development
of standards and best practices for digitization, these principles have
yet to filter down to the majority of non-specialists. In Illinois, as
in many states, there is such pressure to "get materials on the Web"
that digitization projects are often hastily planned and poorly executed.
In January 2001, the Illinois
Digitization Institute was created to develop digitization training
materials for cultural heritage organizations throughout the state. The
Institute is part of the Digital Imaging and Media Technology Initiative
at the University Library at the University of Illinois at Urbana-Champaign.
Funded by a Library Services and Technology Act grant administered by
the Illinois State Library, the Institute's first priority was to determine
the extent and type of digitization training needed in the state. One
of the primary goals of the Institute was to develop training to provide
cultural heritage professionals with the means to mainstream digitization
into their institutions' activities. We were interested in developing
a model that differed from the nationally acclaimed workshops offered
by the Cornell University Library and the Northeast Document Conservation
Center (NEDCC), by providing both training opportunities and continuing
advice. The success of national and regional digitization workshops offered
by groups such as Cornell and the NEDCC made it clear that cultural heritage
institutions have a strong need for digitization training. Although there
is anecdotal information about training activities and needs throughout
the country, we found no examples of systematic assessment of digitization
training programs. Since we were working with a geographically defined
population, we felt it would be useful to gather information about Illinois
institutions' prior training, their perceived needs, and their digitization
activities to date.
In order to meet this objective, the Institute sent out surveys to 459
libraries, museums, and archives throughout Illinois. We surveyed institutions
of all sizes, ranging from large academic libraries to all-volunteer historical
museums. The survey hoped to determine:
- The extent
to which digitization training is needed
- The types
of training formats that are most desired
- The types
of digital projects currently under way
- The extent
to which current digital projects follow best practices
- The amount
and type of digitization equipment at cultural heritage institutions
in the state.
We sent the
survey to a random stratified sample of public, academic, and school libraries,
as well as "special" cultural heritage institutions, including
museums, historical societies, and archives. The overall response rate for
the survey was 32%, and the results were tabulated by the Survey Research
Laboratory at the University of Illinois at Chicago. Forty-seven percent
of responses were from public libraries, 30% from schools, 5.6% from academic
libraries, 4.2% from library systems, and 11.8% from museums and archives.
Not surprisingly, the responses indicated a substantial need for digitization
training in Illinois. Although the survey was limited to one state, it is
probable that similar surveys conducted elsewhere would produce comparable
results.
One of the survey's most surprising findings was the percentage of institutions
that already own some type of digitization equipment. Eighty-two percent
of all respondents reported owning a flatbed scanner, digital camera, or
some other digitization tool (See figure 1).
Figure 1: Types of Digital Equipment Owned by Survey Respondents
However, relatively few institutions had the knowledge required to effectively
digitize cultural heritage collections. Only 15% of respondents reported
that they or other staff members at their institution had attended digitization
workshops or in-depth training sessions like those hosted by the Northeast
Document Conservation Center or Cornell University Library (See figure
2).
Figure
2: Digitization Training Attended by Survey Respondents
Despite the prevalence of digitization equipment in the state, comparatively
few institutions had begun to digitize their collections at the time of
the survey. Although 82% of respondents owned digitization tools, only
35% had conducted digital projects and overall the results of these projects
had been discouraging. Only fifty-one percent of reported digital projects
were available on the Web, while 28% of completed digital projects had
not yet begun to provide any public access. More than two-thirds of the
digital projects reported on in the survey did not utilize any type of
metadata, and only eight percent had made use of the common Dublin Core
metadata element set (See figure 3).

Figure
3: Types of Metadata used by Survey Respondents
If the trends identified in this survey continue, the vast majority of
digital projects in Illinois will fail to meet even basic standards for
Internet access to digital materials. Unless more training is provided,
cultural heritage institutions will continue to underutilize their equipment
and produce substandard digital content. This failure could have long-term
consequences for the state's cultural heritage institutions. The lack
of robust metadata in the majority of the state's digital projects will
make it difficult if not impossible to share data, and will certainly
result in increased labor and material costs in the future.
We found that cultural heritage institutions are somewhat receptive to
learning more about the theory and practice of digitization. Over half
of the respondents expressed interest in one-day workshops on digitization
basics (56%), followed by Web-based tutorials (19%) (See figure 4).
Figure
4:Types of Digitization Training Favored by Survey Respondents
However, only 17% were interested in workshops longer than one day, and
although many expressed a desire to learn about "Digital Capture"
and "Materials Selection," the majority was indifferent to the
drier subjects of "Metadata" and "Project Planning."
These answers suggest that many respondents are primarily seeking an introduction
to digitization, and lack knowledge of the importance of project planning
in the digitization process. A cross-tabulation of the survey results
indicates that cultural heritage professionals are not fully aware of
the prerequisites for successfully completing a digitization project.
While only 17% percent of all respondents expressed interest in multi-day
training sessions, those who had already received formal digitization
training were more than twice as likely (50%) to express an interest in
multi-day workshops. This disparity suggests that individuals who have
received at least some digitization training understand the complexities
of the process and realize that it takes more than a day to learn how
to successfully implement a digitization program.
The Illinois Digitization Institute is using the results of the survey
to design training materials for cultural heritage organizations in Illinois.
Because the survey indicated that novices would most likely attend one-day
training sessions, we began offering a series of free one-day workshops
covering the basics of digitization. Limited to 15 participants, these
sessions focus on project planning, choosing equipment, and also provide
hands-on opportunities to work with a flatbed scanner and digital camera.
The aim of these introductory sessions is not to convince cultural heritage
organizations to embark on digitization projects, but rather to help them
make informed choices about digitization and its role in their institutions.
If participants decide to proceed with a digital project, they are encouraged
to do additional readings or enroll in the Institute's series of interactive
Web-based course modules. Using WebCT and WebBoard, these two-week modules
make use of discussion boards and collaborative assignments to help participants
plan and develop their own digitization program.
The Institute has also developed online training for recipients of digitization
grants funded by the Illinois State Library. Recipients of LSTA Educate
and Automate digitization grants are now required to complete a digitization
course before receiving their grant funds. For this training, the institute
has adopted a slightly different approach. Because participation in this
training is mandatory, training begins with a two-week online course module
covering the basics of digitization. Students are asked to do readings,
answer questions online, participate in WebBoard discussions, and prepare
a formal evaluation of another institution's digitization project. The
follow-up to this online training is a two-day hands-on intensive workshop
held at the University of Illinois at Urbana-Champaign. Because project
planning and evaluation have been covered in the online course, more workshop
time is available to address the practical aspects of scanning, image
manipulation, and problems specific to the grantees' own digital projects.
Although there was some grumbling from the grant recipients about the
time commitment required for the training, evaluations have been almost
uniformly positive. Participants in all of the Institute's training leave
with a detailed digitization bibliography, links to a technical insert
providing an overview of the digitization process, and access to an Image
Quality Calculator program that assists in determining optimal resolution
for scanning text documents. As we continue to assess the efficacy of
these training methods, the Institute is hopeful that these efforts will
eliminate some of the confusion surrounding digitization, and thus "raise
the bar" for digital projects throughout Illinois.
Acknowledgements: The Illinois Digitization Institute has been developed
pursuant to a Library Services and Technology Act grant administered by
the Illinois State Library. The authors would like to thank Anne Craig,
Joe Natale, Connie Frankenfeld, and Alyce Scott from the Illinois State
Library for their assistance.
Please
help us make the next five years of RLG DigiNews even better by spending
5-10 minutes filling out our online
survey. |

print this article
Integrating a Free Digital Resource:
The
Status of Making of America in Academic Library Collections
Kizer Walker
Cornell University
kw33@cornell.edu
The Making of America (MOA) projects at Cornell
University and the University
of Michigan provide searchable, full-text digital access to a growing
body of primary materials documenting American social history in the second
half of the 19th century. Between the separate sites maintained by the
two collaborating institutions nearly 9,000 monograph volumes and approximately
150,000 journal articles are currently available through MOA, free of
charge, to users around the world. In June and July 2001, as part of an
on-going evaluation of this resource, Cornell University Library surveyed
academic libraries that link to MOA to gather an impression of how and
why these institutions are integrating MOA into their collections. The
survey assessed the impact of the availability of the digital resource
on collection development and management decisions regarding print versions
of titles duplicated in the MOA collections. The report that follows presents
the survey results and situates them in relation to actual use of the
MOA collections as tracked over three weeks in Web logs. The author also
interviewed principals of the projects at Cornell and Michigan to determine
their reactions to the survey findings and future plans for MOA; the interview
follows the survey report.
MOA Institutional Use Survey
The MOA survey adapted and expanded on surveys conducted by JSTOR
in 1999 and
2000,
the results of which suggest that libraries in increasing numbers have
been willing to let go of print journal backruns and rely on JSTOR to
archive and provide access to these materials in digital form. Would libraries'
handling of the openly-accessible MOA collections show similar tendencies?
Librarians involved in administration, collection
development, reference, and acquisitions at approximately 250 institutions
were invited to respond to a Web-based survey on their institutions' use
of MOA; a single response was requested for each institution. Along with
a series of multiple-choice questions, survey participants had the option
of submitting open-ended comments on the MOA collections. We compiled
the list of invited participants from academic library Web pages containing
links to the URLs for the Cornell and Michigan MOA sites as identified
via commercial search engines (1).
Institutions of all sizes received the survey mailing: 58 of the 112 Association
of Research Librarians (ARL) member institutions were among the survey
recipients, including 22 of the libraries ranked among the top 25 in the
1999-2000 ARL Membership
Index. Approximately 10 percent of the survey mailings were sent to
institutions outside North America.
Librarians from 93 institutions answered the survey. Two responses came
from Canadian institutions and eight from libraries outside North America.
We received 28 responses from ARL member libraries, including 13 that
ranked among ARL's top 25. As figure 1 illustrates, around half of the
U.S. respondents represented Ph.D.-granting institutions, approximately
one-third Master's colleges and universities, and the rest undergraduate
institutions.

Figure 1: MOA Survey U.S. Respondents by Carnegie
Classification Category
Who uses MOA?
The MOA survey focuses on the integration of MOA into academic library
collections, but academic research is one among many uses of MOA. Sample
Web logs of the MOA site administered by the Cornell Library provide a
revealingalbeit lower-than-normal usesnapshot of the collection.
Recorded over three one-week periods in December 2001 and January 2002,
the logs indicate that over 90% of 97,378 distinct visits to the Cornell
MOA site originated from users of machines registered to the commercial
(e.g., ".com") and network (e.g., ".net") domains.
Presumably, private individual users of commercial Internet service providers
account for a considerable number of these visits. Visits originating
from U.S. academic institutionsthat is, from education domains (e.g.,
".edu"), whether from library computers or notaccounted
for approximately 7% of the total. Visits to the Cornell MOA site for
the period under review are broken down by domain type in figure 2. Logs
show 126,119 visits referred to MOA from other sites. Academic sites (including
Cornell Library pages, but not pages within the MOA site itself) comprised
approximately 33% of all such referrals.

Figure 2: Top-Level Domain Types by Visits to MOA
Though outside the scope of the present report, detailed study of non-academic
use of the MOA collections is needed. Meanwhile, Web usage statistics
provide necessary context for our findings regarding the status of MOA
in academic library collections.
The reports, generated from usage logs with WebTrends Web analytic software,
rank 200 organizations for each log period according to the number of
visits to MOA from machines registered to that organization. Of the 89
U.S. academic institutions among these recurrent visitors, Ph.D.-granting
universities represented a sizable majority at nearly 80%. Users at these
universities accounted for around 90% of academic visits in the three
weeks under consideration. The libraries at 61 of the 89 institutions
from which the visits originated are ARL members. Users from institutions
that responded to the MOA survey comprised 22% of academic visits. Figure
3 breaks down visits to the Cornell MOA site for the logged period according
to the Carnegie
typology.
Figure 3: MOA Visits from U.S. Academic Institutions by Carnegie Classification
Category
Why do academic libraries provide access to MOA?
By and large, libraries seem to regard MOA as a valuable enhancement to
their print holdings, but not as a suitable replacement for print collections.
85% of all librarians responding to the MOA survey reported that they
provide access in order "to add titles not held in the library's
print collection"; adding new titles was a motivation for 82% of
responding ARL institutions and 69% of the top-ranked ARL members. 69%
of all the libraries surveyed and 86% of ARL libraries reportedly link
to MOA in order to "provide text searchable alternative versions
to supplement titles already held in the library's print collection."
A number of librarians commented that the ability to access the collections
remotely is valuable for student and faculty users, particularly where
libraries are supporting a distributed learning curriculum. Only 4% of
respondents said that "replac[ing] titles held in the library's print
collection" was a motivation for providing MOA access.
Integration of MOA into library collections
We have taken the presence of MOA titles in OPAC records as one measure
of the degree to which libraries conceive of the resource as an integral
piece of their collections. Fourteen respondents reported that their libraries'
OPACs provide links to individual titles in the MOA collection at present.
Half of these were ARL member institutions (25% of the ARL members surveyed).
Nine of the 14 libraries are at U.S. Ph.D.-granting institutions, one
at a Master's university, two at undergraduate institutions, and two at
universities outside North America. In the majority of libraries surveyed,
access to the MOA collections is from a comprehensive electronic resources
page, a subject-based list of digital resources, or course-specific lists
maintained by the library.
We asked survey participants a series of questions about the implications
of access to the MOA titles for the management of their libraries' print
holdings. This part of the survey closely followed JSTOR's bound volume
survey, but the responses diverged markedly from those submitted by JSTOR
subscribers, as figure 4 illustrates below. Asked whether, "given
the availability of the titles in the MOA collection," their libraries
had moved bound volumes to remote storage, 6% respondents answered "yes,"
and another 6% answered that bound volumes had not been moved, but that
there were plans to move them in the future. 78% reported that no items
had been moved to remote storage and that their libraries had made no
such plans. Although two respondents said their libraries had "entered
into a group remote storage project with other institutions to consolidate
. . . print collections," only one of these reported that MOA access
had been a factor in the decision. Four percent noted future plans for
a group storage arrangement, but 81% did not foresee any such coordination
with other institutions. Only a single respondent reported that "bound
volumes of titles included in the MOA collection" had been "discarded
outright." A further 3% related that their libraries planned to discard
some of these volumes in the future, but 85% responded that no bound volumes
had been discarded in light of MOA accessibility and that there were no
plans to do so.
Management
of print titles offered in the digital collections |
MOA
2001 Institutional Use Survey
(93 total responses) |
JSTOR
2000 Bound Volume Survey
(138 total responses) |
JSTOR
1999 Bound Volume Survey
(214 total responses) |
Moved
bound volumes to remote storage? |
6%
(6 institutions) |
25%
(34 institutions) |
20%
(42 institutions) |
Made
plans to move bound volumes? |
6%
(6 institutions) |
20%
(27 institutions) |
24%
(52 institutions) |
Discarded
bound volumes? |
1%
(1 institution) |
22%
(31 institutions) |
13%
(28 institutions) |
Made
plans to discard bound volumes? |
3%
(3 institutions) |
22%
(30 institutions) |
25%
(54 institutions) |
Entered
into a group remote storage project with other institutions to consolidate
print collections? |
2%
(2 institutions) |
3%
(4 institutions) |
2%
(4 institutions) |
Made
plans to enter into group storage project? |
4%
(4 institutions) |
7%
(10 institutions) |
7%
(16 institutions) |
Figure
4: MOA and JSTOR Results Compared
Another series of questions offered examples of more restrained possible
actions affecting libraries' print holdings. Queried about "other
cost or shelf-space saving solutions" developed "as a result
of access to the MOA collection," participants were reluctant to
allow MOA to influence their management of print materials. Five percent
of all respondents answered that their libraries had "removed duplicate
items" and 7% had plans to do so. Nine percent responded that their
libraries had "stopped replacing lost or damaged print issues"
of journals represented in MOA, and 11% more reported plans to stop. Fourteen
percent said their institutions were planning to or had already discontinued
purchasing microfilm backruns. Five percent of respondents said that their
libraries have installed compact shelving (presumably reflecting a decrease
in the priority afforded to accessibility of MOA titles in print), and
5% indicated plans to do so. The rate of positive responses to this series
of questions was similar or lower for ARL institutions; however, these
gave a significantly higher number of unequivocally negative responses
("no, and no plans
").

In their comments, a number of librarians indicated that their institutions
have, in fact, withdrawn or remotely stored print materials that could
be replaced with electronic versions, but that the MOA collections have
not been factored into such decisions. That MOA's impact on collection
management policies has not approached that of JSTOR is not consistent
with the perceived usefulness of the MOA collections. Indeed, respondents
praised MOA as a "tremendous resource," an "excellent and
useful collection" that is "invaluable for small libraries,"
and a "fantastic service to the historical profession." Instead,
librarians' relative tentativeness likely has to do with perceptions of
the stability of the resource. Comments of some of the respondents suggest
that MOA may not be widely viewed as a permanent digital repository. Ruth
Dickstein, Subject Specialist for History and Women's Studies at the University
of Arizona Library, wrote: "we are removing JSTOR titles, and could
consider doing the same with the MOA titles, [but] just have never assumed
that the MOA titles had the stability of always being available."
Michael Stoller, Director of Collections & Research Services at the
New York University Libraries expressed similar reservations:
We
have treated Making of America as a supplement to our own holdings
and not as a replacement for any locally-held resources. In future we
might view MOA as a form of ready access to materials locally held offsite.
But we are not presently inclined to view it as a 100% reliable digital
archive, whose paper equivalents can or should be withdrawn from our collections.
In an August
2000 interview with RLG DigiNews, JSTOR's president, Kevin
Guthrie, emphasized that his organization's decision-making and communication
with stakeholders has at all times centered on JSTOR's core mission of
establishing a trusted digital archive. JSTOR has vigorously cultivated
relationships with libraries and sought to make its preservation policies
clear to librarians. Although MOA has been an important testing ground
for digital preservation techniques at both Cornell and Michigan, to date
neither university has forcefully articulated its policies and practices
regarding digital preservation of the MOA holdings. More active communication
with librarians could help clarify MOA's long-term strategies. A few survey
respondents proposed that MOA supply the usage statistics that commercial
vendors typically make available to libraries; this and other services
to libraries would enhance MOA's visibility.
As digital preservation and archiving projects multiply
and evolve, standards and oversight mechanisms are emerging that will
facilitate communication of preservation strategies (2).
Such communication should be central to MOA's outreach to academic libraries
as well as to other communities.
Interview
with Anne Kenney, Wendy Lougee, and John Wilkin
A number of respondents to our institutional use survey indicated
that MOA would weigh more heavily in collection management decision-making
if there were clearer assurance that these materials would be accessible
for the long term. How would you characterize the commitment of the
Cornell and University of Michigan libraries to maintaining this resource?
Anne Kenney (MOA 1 Project Director, Cornell): Cornell University
Library (CUL) is committed to maintaining and strengthening its digital
holdings including the MOA collection. To date, this commitment has
been de facto, but will be made explicit in the library's new
Master Plan, which will be adopted by early spring. Over the past
5 years, CUL has actively developed its digital preservation capabilities
to ensure the long-term accessibility of its digital content. Through
an IMLS-funded initiative, the library developed a digital
preservation strategy for its image-based collections and last
year assumed long-term responsibility for the arXiv.org
e-Print archive. The blueprint for creating a Central Depository
for digital content will be completed within the next two months,
with development planned for the summer and fall. Just recently Nancy
McGovern has been appointed CUL's first Digital Preservation Officer,
charged with developing digital preservation policies and coordinating
various digital archiving efforts library-wide. Cornell Library has
also participated heavily in research and development efforts focusing
on digital preservation, through such projects as the Mellon E-Journal
Archiving Project (Project
Harvest), the Digital Libraries Initiative Phase 2 project (Project
Prism), and the Risk
Management Study on the effects of format migration. Because the
online MOA collection serves our clientele so well, we will be moving
the bound volumes comprising the collection to off-site storage over
the next several years.
Wendy Lougee (MOA 1 Project Director, University of Michigan):
We have developed archiving procedures and policies (currently in
draft and undergoing internal review) and are committed to sustaining
our locally created digital collections. Current mechanisms include
methods to ensure the longevity and long-term access to the digital
master. Creation and conversion practices use standards-based methods
and storage on media with long-term viability. Access systems, where
possible, use the digital master as an access copy and rely on redundancy
(storage and multiple locations) and frequent backups.
We have recently adopted a policy to move our preservation reformatting
to digital methods as a default method. Consequently, in the future
we will be reviewing additional brittle and endangered volumes for
digital conversion and utilizing similar methods.
Since the original MOA project, we have made cataloging records (via
ftp) available for all items included in the MOA collection to facilitate
access at other institutions.
What has been the approach to date to publicizing the MOA project,
particularly with regard to establishing MOA's status as a stable,
reliable resource? Do you envision new features or services that might
increase MOA's value to its users or broaden its readership to new
communities? What can MOA learn from other digital library projects,
such as JSTOR, in this regard?
Anne Kenney: This survey revealed some very interesting trendsinstitutional
faith in JSTOR has steadily increased for good reasons but also because
JSTOR is overt about its commitment to its customers. I believe that
CUL could follow the lead of JSTOR and others in offering the same
commitment to current and future customersnot just within the
CUL community but beyond to the growing secondary clientele. In meeting
the needs of the former, we can also serve the latter with a manageable
overhead. We are particularly taken with the National Library of Australia's
"Safekeeping
Project" that is building a distributed and permanent collection
of digital resources in digital preservation through negotiations
with resource owners or their designees to provide long-term access
to their material. Those resources for which safekeeping strategies
have been put in place are marked
on the PADI Web site.
I also believe that in the next couple of years we will see various
strategies evolve for developing the business case for underwriting
the costs of digital archiving. CUL has already expended a great deal
of time and money in the care and feeding of this resource and will
continue to do so. The financial arrangements for doing so will inevitably
change, however. The extent to which we integrate our holdings into
the collections of other libraries will be closely monitored. The
future will lie in greater inter-institutional dependencies for maintaining
digital assets that are valued by all yet managed in a distributed
manner.
Wendy
Lougee: Our publicity has conveyed information about stability
and methods for long term access, as well as use and functionality.
While we have not, thus far, advocated collection management decisions
as a result of MOA, we anticipate that the planned digital registry
(under development through the Digital Library Federation) would be
an appropriate venue to communicate this information.
Finally,
can you describe how you perceive MOA's relationship with other digital
library projects, and how you would like to see such relationships
develop in the future? Is MOA involved in any plans to integrate access
to separate digital collections, or other collaborative projects that
would reduce redundancy among digital resources? What steps were taken
in the development of MOA to provide for future interoperability with
other databases?
Anne
Kenney: Wendy has already mentioned the DLF's registry initiative,
which is being designed in part to reduce duplication of effort in
digitization. Through various projects and initiativesnotably
the Open Archives
Initiative and collaborative efforts with other research institutions,
in particular the Library of Congress and MichiganCornell is
actively pursuing a program to integrate access across institutional
boundaries. We are also intrigued by the suggestion of the survey
respondents who asked whether we could provide them with statistical
data covering their institution's use of MOA materials.
John
Wilkin (Head, Digital Library Production Service, University of
Michigan): Formally, we are exploring integration of digital collections
through a National Science Foundation grant with Cornell, Goettingen,
and Michigan to extend the Dienst protocol to support full text access.
Michigan's OAI metadata harvesting project (supported by Mellon) will
bring together freely available digital collections. We have made
MOA cataloging records available via ftp to other institutions for
inclusion in local catalogs. Our local systems development using our
Digital
Library Extension Service (DLXS) incorporates support for cross-repository
searching. |
Footnotes
(1) Google
and Altavista were searched
for links to MOA, using the Michigan MOA URL and the two URLs for the
Cornell MOA site in our search stringsfor example a search for "link:moa.umdl.umich.edu/
-host:umich.edu" at Altavista yields links to the Michigan site,
excluding links at the University of Michigan host. Links from academic
library sites were selected "by hand" from the results. We compiled
an email address list of individual librarians from information available
at the library Web sites. (back)
(2) See for instance the draft
report of the RLG/OCLC Working Group, Attributes of a Trusted Digital
Repository: Meeting the Needs of Research Resources (Mountain View,
CA: RLG, 2001). (back)

Highlighted Web Site
METS:
Metadata Encoding and Transmission Standard
The METS project, sponsored by the Digital Library Federation, is
developing an XML document format for descriptive, structural and
administrative metadata for digital works. This site includes a new
version of the METS schema, released in December 2001. The site also
offers sample documents and other information on applying the schema,
technical documentation, and a useful introductory tutorial. This
site is the prime source for information on an important project that
promises to have a significant impact on the development of digital
libraries in the near future.
|
print this FAQ
FAQ
Lee
LaFleur
Cornell University
ljl26@cornell.edu
My institution
is interested in monitoring the use of our online resources. Is Web log
analysis an effective means of doing this?
Recently a
number of organizations, including the Digital
Library Federation (DLF) and the Association
of Research Libraries (ARL), have urged libraries to take responsibility
for documenting the use of the digital resources they manage. The ARL
New Measures Initiative has issued their Emetrics
Phase II report, offering guidelines for usage statistics that libraries
should collect in order to document changes in the use of Web-based resources.
It is recommended that libraries begin tracking the number of downloads,
page views, queries and search sessions by users. Statistics such as these
can be obtained by examining the data produced by the Web servers on which
these resources are stored. Each transaction on the Web consists of a
request issued through the client's browser and a corresponding response
from the Web server. These transactions are automatically recorded by
the server in files known as Web logs.
The data stored in Web log files consists of long strings of text and
numerical data, so reading them can be very difficult and unintuitive.
You will probably want to use a log file analysis program to interpret
the data. These programs are available as shareware or through various
commercial vendors. Some types of log analysis software run on the administrator's
desktop, in which case the log files must be transferred from the server
to the desktop before analysis is carried out. Other analysis programs
run on the server itself and can gather data from the log files directly,
either in "real time" or at scheduled intervals. Depending on
the size of the log file and the capacity of the hardware, the analysis
process can be labor intensive and time consuming. In general, the more
detailed the analysis desired, the more complicated and expensive the
software tends to be.
The software works by comparing different data sets from the logs and
making inferential calculations based on a number of factors. For example,
the length of a "visit" is generally determined by calculating
the difference between the date/time stamp of a user's arrival and departure
requests. "Visits" are determined by counting requests from
a single IP address over a period of time. After a preset period of inactivity
(e.g., 30 minutes) on a Web page or site, a visit is considered terminated.
Any activity that occurs after this time period by the same user would
then be counted as a new visit.
Some analysis software packages offer details on the geographic location
of users, even down to the city level. More expensive packages can analyze
log file types from a wide variety of servers (Apache, Microsoft IIS,
Netscape, etc.) Higher end software packages may provide hundreds of different
types of reports. Some offer customizable reports, generated on the fly
from databases in which the analyzed data is stored. Such reports may
list the top pages visited or the number of unique visitors, and some
provide reports that may be re-sorted for viewing along any number of
different variables. Most commercial analysis programs also offer some
type of graphing function through which the report data may be represented
visually. Select packages also allow the user to download data from the
report into PDF, Word documents or Excel spreadsheets.

Figure 1. Screenshot from WebTrends
Live showing the top twenty countries from which visitors came when
they visited the Cornell Department of Preservation and Conservation Web
site.

Figure
2. Screenshot from NetTracker
Log Analysis software package showing the
top ten visitors to the Louis
Agassiz Fuertes Web site.
"Live tracking" is another means of analyzing Web traffic that
is often used to obtain detailed information about online users. Live
trackers are typically third party services that monitor Web traffic by
requiring Web site administrators to place special JavaScript code into
each of their Web pages. Thus, live tracking doesn't rely on Web logs
at all. Instead usage data is derived through the JavaScript each time
a page on your site is loaded. Like Web logging, this method also requires
an analysis process, but in this case the work is usually done by the
live tracking service and you receive a finished report. Live tracking
results are usually available in real time, allowing up-to-the minute
reporting. Compared to Web log analysis, live tracking can provide an
equally in-depth analysis of Web traffic activity, while also offering
a more detailed profiling of users' system requirements. The JavaScript
employed in live tracking can identify a variety of user display properties,
including monitor resolution, pixel dimensions and bit depth, screen widths
and available color palettes, as well as information on whether or not
cookies (a technology for passing personalized data between Web clients
and servers), Java, and JavaScript are enabled. When combined with data
on users' Internet connection speeds, this information can help guide
decisions about the presentation of digital information, including images.

Figure 3.
This report lists the most common screen resolutions used by visitors
to Cornell's Preservation site. Screen resolutions are given in terms
of pixel dimensions.
Web traffic analysis can be a valuable asset to librarians who want to
understand current and potential users of their collections. User statistics
can help libraries gain continued funding and administrative support for
new and existing digital projects. By analyzing the contents of Web log
files, we can learn a great deal about online visitors and about which
resources are being used and which ones are not. Such data can assist
librarians and archivists in answering questions of whether users are
visiting expected pages, which sections they are spending the most time
on, and which types of content they appear to be most interested in. Web
server administrators can rely on user data to assess file structure and
server load over the network. Libraries can determine users' locations
(by IP address), and which referring Web pages (links), search engines,
and keywords (queries) are transporting them to the digital library. Log
files can also provide some indication of whether users are navigating
a Web site or resource properly based on click stream data that allows
us to see the paths (internal references) through which users are traveling,
as well as information on which pages they visit first, which pages they
exit from, how long they stay, and what files they choose to access. Log
file data allows libraries to assess the number of files that have been
downloaded and those for which the download was aborted. The logs also
identify any errors that may occur during an online transaction. Additionally,
log files contain technical information regarding the user's operating
system and Web browser, which is of interest in designing resources for
the system requirements of particular audiences.
There are many good reasons for libraries to use Web traffic analysis
software. However, there are a number of important factors to keep in
mind. Usage data does not provide rich qualitative information, such as
a user's overall satisfaction with resources, and it certainly won't explain
why people are searching for particular information. In this regard Web
traffic analysis is not a substitute for more qualitative studies (focus
groups, surveys, etc.) that the library should also be conducting.
Web traffic and log analysis is essentially an inferential process that
relies on heuristic specifications set up by the companies that design
the software. Although the reports provide a helpful view of user interaction
with library resources, much of the information may be inconclusive. Different
software packages use different methods for deriving their reports, and
the lack of documentation for many analysis programs makes some of their
specifications suspect. For instance, the prevalent use of robots or spiders
over the Internet may affect the accuracy of user statistics. Robots are
commonly used to comb the Web for data, and when doing so they make frequent
visits to each page on a Web site. Analysis programs attempt to control
for robots, but many of these visits still slip through the cracks, thereby
inflating the number of "actual" reported visitors.
Additional
Sources of Information
There
are many Web log analysis software packages and services available,
and the market is in much flux. The following sites may prove helpful
in evaluating the various products currently in use.
AWStats
Official Web Site
Dan
Grossman, "Analyzing
Your Web Site Traffic," iBoost Journal
Software
QA/Test Resource Center, "Web
Site Test Tools and Site Management Tools"
Makiko
Itoh, "Web
Site Statistics: How, Why and What to Count"
"Web
Site Analysis,"
from PC Magazine (June 27, 2000)
Elaine
Nowick, "Using
Server Logfiles to Improve Website Design," Library
Philosophy and Practice, Vol. 4, No.1 (Fall 2001)
|

Calendar
of Events
Digital
Resources for the Humanities: DRH 2002
Call
for Papers: Due March 1, 2002
To be held September 8-11, 2002, Edinburgh , Scotland
This annual conference is a forum for all those involved in the digitization
of cultural heritage materials.
Second
International Workshop on New Developments in Digital Libraries (NDDL2002)
April 2-3, 2002
Ciudad Real, Spain
This workshop will serve as a forum for researchers and practitioners
to discuss new developments in digital libraries. Topics include: Metadata
Issues, Digital Library Prototypes, Systems Interoperability, and New
Roles of Librarians in Digital Libraries.
Museums and the
Web 2002
April 17-20, 2002
Boston, MA
In its sixth year, the program addresses Web-related issues for museums,
archives, libraries, and other cultural institutions.
CLIR
Hosts International Workshop on Digital Preservation
APRIL 24-25, 2002
Washington, D.C.
The Council on Library and Information Resources will hold a workshop
entitled The State of Digital Preservation: An International Perspective.
The focus will be on international developments in digital preservation
and identifying the emerging challenges. Registration information may
be found here.
The
European Library-Milestone Conference
April 29 - 30, 2002
Frankfurt am Main, Germany
Nine European national libraries are working together with the Conference
of European National Librarians and are developing a portal for the project
The European Library (TEL). This conference will address topics such as:
National Libraries and Publishers; Business of Digital Libraries; and
Describing and Handling Digital Publications related to the portal.

Announcements
Digital-Copyright
Listserv
To
meet the developing application of copyright laws in the online environment,
The Center for Intellectual Property has initiated a new listserv. It
will provide a forum for the analysis of topics such as copyright law
and policy, technologies, and federal information law and policies that
impact higher education, particularly digital distance education.
The
International Federation of Library Associations and Institutions (IFLA)
and the International Publishers' Association (IPA) Establish a Joint
Steering Group
One of the goals of this alliance is to develop a joint statement on the
archiving and preserving of digital information and to make long-term
archiving and preservation a key agenda item internationally.
OSSNLibraries
Portal
This
portal is a prototype of open source software (OSS) in libraries. It is
a combination directory of OSS projects and information resources designed
for and useful in library settings.
National
Information Standards Organizations (NISO) /Book Industry Study Group
(BISG) Meeting Report on Digital Archiving
This report looks at three ongoing projects that examine cost-effective
business models for archiving, exploring rights issues, and identifying
needed standards.
Cedars
Project Evaluation for 1998-2001
The
evaluation of the first three years of the Cedars Project is now available.
Archival
Preservation of Smithsonian Web Resources: Strategies, Principles and
Best Practices
The Smithsonian
Institution Archives commissioned this study to assess the requirements
for the archival preservation of Smithsonian Institution Web sites, and
to develop a strategy, guidelines, and best practices that would facilitate
access to usable and trustworthy Web sites.
Colorado
Digitization Project Best Practice Document for Digital Audio
Feedback is
being requested on this document. The draft document provides guidelines
for the technical issues, and a set of best practices for converting analog
cassette tape recordings of oral histories into digital format.

RLG
News
RLG
Creates New Discussion List Related to Digital Preservation and Digital
Repositories
RLG has created oais-implementers@lists2.rlg.org, a new discussion list
which is intended for individuals and institutions who are actively working
with the Open
Archival Information (OAIS) Reference Model as a part of an overall
effort to model, build, and manage their own digital archive or repository.
Currently an International Organization for Standardization (ISO) draft
standard, the OAIS provides a common reference model, a common terminology,
and a common conceptual framework with which to work, enabling discussion
among the many types of organizations and institutions grappling with
digital preservation and digital repository creation and management.
It is expected that oais-implementers list members will come from a variety
of disciplines including (though not restricted to) libraries, archives,
space data centers, corporations, universities, and others. The list and
its supporting
web pages were created to enable communication and provide information
about OAIS reference model implementations, applications, and related
standards development. The list also provides a forum for discussion and
the opportunity for the exchange of information, ideas, and experience
among people engaged in similar activities. The supporting web pages will
alert researchers to OAIS activities occurring in similar disciplinary
or geographical areas, as well as provide links to further OAIS-related
standards development. List members are encouraged to contribute project
and contact information to be included in these resources.
To subscribe to the new list:
Send the following message to listmanager@lists2.rlg.org
Subscribe oais-implementers <FirstName LastName>

Publishing
Information
RLG DigiNews
(ISSN 1093-5371) is a newsletter conceived by the members of the Research
Libraries Group's PRESERV community. Funded in part by the Council on
Library and Information Resources (CLIR) 1998-2000, it is available internationally
via the RLG PRESERV
Web site (http://www.rlg.org/preserv/). It will be published six times
in 2002. Materials contained in RLG DigiNews are subject to copyright
and other proprietary rights. Permission is hereby given for the material
in RLG DigiNews to be used for research purposes or private study.
RLG asks that you observe the following conditions: Please cite the individual
author and RLG DigiNews (please cite URL of the article) when using
the material; please contact Jennifer
Hartzell, RLG Corporate Communications, when citing RLG DigiNews.
Any use other than for research or private study of these materials requires
prior written authorization from RLG, Inc. and/or the author of the article.
RLG DigiNews is produced for the Research Libraries Group, Inc. (RLG)
by the staff of the Department of Preservation and Conservation, Cornell
University Library. Co-Editors, Anne R. Kenney and Nancy Y. McGovern;
Production Editor, Barbara Berger Eden; Associate Editor, Robin Dale (RLG);
Technical Researchers, Richard Entlich and Peter Botticelli; Technical
Coordinator, Carla DeMello.
All links in this issue were confirmed accurate as of February 14, 2002.
Please send your comments and questions to preservation@cornell.edu.

|