Re: robots.txt (A *little* off the subject)

Erik Selberg (selberg@cs.washington.edu)
22 Nov 1996 15:24:10 -0800


"Thaddeus O. Cooper" <tcooper@mitre.org> writes:

> 5. How do we make people use robots.txt?
>
> It's the last one that made me start thinking about what is trying to be
> accomplished here. The only way, at least that I have ever found, to get
> people to do or use anything is to have a perceived benefit from its
> use. We all use the Web and Web Browsers because they are easy to use,

My own take is that 5 is two-fold:

5a. How do we make sysadmins AND USERS use / admin robots.txt?
5b. How do we enfore robots / whatever to follow robots.txt?

To solve 5a, we need to make a system that's easy for most sysadmins
to administer, and hopefully make it easy for individual users to
use. I personally think robots.txt is useless for a lot of systems,
because the sysadmin doesn't control the content. Take our department
for example. I may have some stuff I don't want robots to come look
at, but it's a hassle to try and get that into a global robots.txt
file. And random "sysadmin runs a find script which incorporates
things" are solutions I think are somewhat simplistic.

Next is getting folks to follow it. Providing what robots want is one
way to get them off your back. However, there may be other robots
which want something you can't easily provide. For example, a page
watcher may want notification. But do you notify 5 million people if
say your ESPN sports page changes? Or your stock quote changes? Then
there are others whom you may not want to deal with --- Rob mentioned
an e-mail gatherer to create junk mailing lists. How do you get those
off your back?

I suspect a real solution (DANGER! Erik's going back to his crypto + kernel
hacking days!) would be to have a client request a capability from the
server. The server would then send a particular capability back (say
"this capability good for downloading N pages until midnight" or
something). That way, ANYONE who wants to talk with that server must
support that standard, and the sysadmin could adjust how capabilities
are granted. Assuming lots of servers use this kind of security, you
could then enfore robots to use this as well, otherwise they couldn't
do anything.

Hmm.... maybe I'll suggest this as a potential quals project to
someone over here.

-Erik

-- 
				Erik Selberg
"I get by with a little help	selberg@cs.washington.edu
 from my friends."		http://www.cs.washington.edu/homes/selberg
_________________________________________________
This messages was sent by the robots mailing list. To unsubscribe, send mail
to robots-request@webcrawler.com with the word "unsubscribe" in the body.
For more info see http://info.webcrawler.com/mak/projects/robots/robots.html