Re: nastygram from xxx.lanl.gov

Roy T. Fielding (fielding@liege.ICS.UCI.EDU)
Tue, 09 Jul 1996 18:13:05 -0700


>> He accused my "robot" of violating the "robot guidelines". He didn't
>> enumerate which I violated. I'm guessing he may have been upset that the
>> test was ignoring his robot.txt, but since the test wasn't traversing the
>
> Reading robots.txt is a good idea, but by the sounds of your "robot"
> probably not very practical. Checking validity of document URL's is
> hardly the function of a true robot.

Brilliant deduction, aside from the fact that MOMspider does exactly that
and was one of the first robots to implement the robot guidelines and
the first to distribute code for others to do the same. /robots.txt must
be read because it can steer the robot (any robot) away from URLs that
have problematic side-effects (as do many old CGI scripts).

The information can be cached (for a reasonable period of time) to reduce
load, but it cannot be safely ignored. Running a program that ignores
the /robots.txt is equivalent to running an unsafe program, and you are
responsible for ANY detrimental effects of that program, including lost
bandwidth and the time/personnel cost of the webmasters who track you down.

Think about that before running your program on someone else's site
without permission.

...Roy T. Fielding
Department of Information & Computer Science (fielding@ics.uci.edu)
University of California, Irvine, CA 92697-3425 fax:+1(714)824-4056
http://www.ics.uci.edu/~fielding/