robots.txt (A *little* off the subject)

Thaddeus O. Cooper (tcooper@mitre.org)
Fri, 22 Nov 1996 09:42:09 -0500


I have been thinking about the Robot Exclusion Standard that this group
has been proposing. There are a number of common threads that have been
running through this list:

1. Exactly what is a robot? (varying opinions)
2. What is the standard trying to accomplish? (varying opinions)
3. What features should it have? (varying opinions)
4. Who should have to use robots.txt? (varying opinions)
5. How do we make people use robots.txt?

It's the last one that made me start thinking about what is trying to be
accomplished here. The only way, at least that I have ever found, to get
people to do or use anything is to have a perceived benefit from its
use. We all use the Web and Web Browsers because they are easy to use,
and we can get information quickly and easily. I think that one of the
reasons that the robots.txt issue is becoming sticky is because the only
people that it seems to benefit are the people that administrate web
servers. If you want the page watchers/agents/robot/etc. community to
embrace the use of this file then there must be a perceived benefit to
them (since they are the people that are going to use it), otherwise
they will just ignore it.

It turns out that there are some extremely useful things we could do for
page watchers/agents/robot/etc. writers that would reduce traffic at
sites and make them want to use the *concept* of this file. Two things
that almost all of these tools do are:

1. Capture a list of URL's at a given site.
2. Capture the titles of the URL's at a given site.

Both of these activities are time-consuming, and resource intensive. Now
imagine if there was a simple well-defined standard that allowed us to
go and get this information. The file could include other useful
information about what was the major category of the site (search
engine, ISP, Moved-To, etc.) These are things that would encourage page
watchers/agents/robot/etc. writers to want to use this file. In addition
there could be information about how often the file is updated, the last
time it was updated, and whether or not you would like to see your site
added to the ever growing indices that are available on the Internet.

Anyhow, this is just my two-cents about this whole thing.

--Thaddeus O. Cooper (speaking only for myself)
Senior Staff
The MITRE Corporation
(tcooper@mitre.org)
_________________________________________________
This messages was sent by the robots mailing list. To unsubscribe, send mail
to robots-request@webcrawler.com with the word "unsubscribe" in the body.
For more info see http://info.webcrawler.com/mak/projects/robots/robots.html