RE: The Internet Archive robot

Denis McKeon (dmckeon@swcp.com)
Sat, 14 Sep 1996 11:38:04 -0600


Again, folks, there are good FAQs on copyright at:

<a href="ftp://rtfm.mit.edu/pub/usenet/news.answers/law/copyright/faq">copyright faq</a><br>
<a href="http://www.cis.ohio-state.edu/hypertext/faq/usenet/Copyright-FAQ/top.html">Copyright FAQ</a><br>
<a href="http://www.4patent.com/copy1.htm">Copyright Information</a>

and there are some new topics about robots at the bottom of this message.

In <c=US%a=_%p=InterWorld%l=INTERWORLD2-960913131951Z-177@www1.interworld.com>,
David Levine <David@InterWorld.com> wrote:
>...
[ quotes extracted for effect ]
>By "use" I meant utilize the information,
>not copy (i.e. perform research).
>...
>but I still can't see the technical difference.

Me neither, but I think it is because you are mixing concepts.
Copyright is on a representation of ideas, not on the ideas.

Eistein could have copyrighted "e=mc^2", but someone else could have
published "energy = mass * lightspeed squared" without infringing copyright.

Ownership of copyright on intellectual property and
ownership of a copy of intellectual property
are distinctly different concepts, just as a license to use a piece of
software and a copy of the piece of software are different things.

Copyright doesn't care if the owner or user of a book is reading it,
taking notes from it, loaning it out to people, selling it to a used
bookstore, using it to prop up a table, or eating it.

Copyright does care if anyone who is not the copyright owner makes a
copy of something that is copyrighted. "copyright" == "right to copy"

>I would assume that the same
>thing would be said of the Internet "library" that the
>Internet Archive hopes to achieve - you could go to
>their "library" to do the same kind of research you
>would in a real library. However, much of what they
>have available "in the stacks" (so to speak) would be
>information that users might have to pay for if they
>had retrieved it directly from the net (the same as
>purchasing a book with information). In a real library,
>when you borrow a book, you get the same
>information value from the book, but you have not
>purchased it. If one were to do this on the Internet,
>many of the content providers would balk, and I
>certainly can understand why, but I still can't see the
>technical difference.

Again, copyright doesn't care about possession or ownership of
copies, payment, information flow, or representation.
You're looking for a difference in areas where copyright does not apply.

>I guess, then, that the problem would be if one wanted
>to create something akin to a library on the Internet,
>how would it be done? In many cases, there is no
>equivalent to purchasing a book. Of course, one
>could argue that the Internet itself -is- the library, and
>there is no need for such an institution
>
>David Levine
>david@interworld.com

Let's imagine a world of hard-copy publishing where -

the availability of any book can vary from minute to minute,

books can be printed on demand,

every printed copy of a book might be different,

new editions might come out anywhere from every few minutes to never,

printed copies are recycled after you read them,
or whenever you turn out the light in your reading room, and

old editions are hardly ever available.

It seems like a poor fit to use the idea of "library" in such a world,
if library is "collection of copies of current information."

OTOH, if someone wants to know "how did attitudes about unsolicited
commercial e-mail (UCE) change on the Internet between June 1 1996
and October 1 1996?" and wants to see when that acronym was mentioned,
and which web pages or Usenet posters began using it when, and where -
it would be hard to use the Web, as it is now, to research Web data.

An archive robot that checked every page on the Web once a week,
and made a copy of any changed page could, with enough time and
other resources, answer that question and others like it.
It might be more like a library of newspapers than of books.

Whether such a search would be cost-effective for any purpose,
and whether visiting Web pages that frequently would be cost-effective
is an open question. (Alta-Vista comes every two months, I believe.)

Whether making copies of Web pages is okay is a different question.

Certainly, the way HTTP works is to "copy" a page (more or less) and to
present it for viewing, but this would seem to be a copy implictly
licensed by the Web page creator - how else could someone view a page if
the server doesn't send them a copy of the page?

Temporarily cacheing that copy seems a reasonable thing, in a hypertext
environment, but making a persistent copy in a disk cache, or using a
tool like Web-Whacker to make copies of many of the pages on a site
seems to reach to the limits of or even beyond implicit license.

Does "whacking" a site reach beyond fair use? Is there a difference
between whacking a few hundred pages overnight so you can speed-browse
them the next day, erasing the copies afterward, and whacking the top
100 pages you like to use in demos onto your laptop so you can do
demonstrate a new Web browser without using a network conection?

Some popular Web sites are "mirrored" by mutual agreement, paralleling the
pattern of popular FTP sites. Does the Internet Archive amount to a bootleg
mirror of many sites? Or is it an extension of cacheing and whacking?

More to the point, is it bound by copyright? And, more on topic for a
robots list, is there a way for a robot which is trying to respect
copyright to see if it has permission to copy a Web page?

Do we need something like:

<META NAME="license" CONTENT="{all,none,copy,index,...">

or will the Web use something unrelated to the paper copyright model?

Perhaps we will see a music model, like BMI and ASCAP use for
copyrighted music played by radio stations (and even in elevators.)

-- 
Denis McKeon 
dmckeon@swcp.com