Re: agents ignoring robots.txt

Erik Selberg (selberg@cs.washington.edu)
17 Oct 1996 13:21:41 -0700


> I ask this because it isn't that inconceivable to see a plug-in (or a
> separate program) being written that does what Cyber411 does (maybe without
> the ads). At what point does an agent NEED to follow the (n)robots.txt
> convention? Since I think current versions of Lynx allow the following:
>
> lynx -traverse http://www.cyber411.com/

Point 1: while tools like MetaCrawler and Cyber411 make it easier to
get lots of stuff, they aren't abusive in their own right.

Point 2: if someone does something to make them abusive (like several
repeated queries, etc.) then they can be equally as obnoxious
directly.

Because the Web is this anonymous thing, there's only so much control
an admin can have over who uses the system. User-Agents can be
spoofed quite easily. So, we need some way of ensuring that robots can
be guided appropriately. When people abuse things, then e-mail and
discourse should be the answer. Otherwise we go down the pipe of ways
to authenticate users and stuff.... rat hole!

-Erik

-- 
				Erik Selberg
"I get by with a little help	selberg@cs.washington.edu
 from my friends."		http://www.cs.washington.edu/homes/selberg