Re: nastygram from xxx.lanl.gov

Istvan (simon@mcs.mcs.csuhayward.edu)
Tue, 9 Jul 1996 08:56:53 +0800


Aaron Nabil, Mon, 08 Jul 1996 20:09:25 -0700 says:

>
>I just got a nastygram from the web admin at xxx.lanl.gov accusing
>my "robot" of "attacking" him. This "attack" consisted of HEAD's on
>459 URL's, with a mean pause of about 2 minutes.

It seems to me that your robot did not act improperly, and I strongly
feel that, if your description of the behavior of your robot
is accurate, it is the Web administrator's behavior that is objectionable.

500 accesses with 2 minute intervals between
accesses cannot be construed as excessive. Such an access pattern could easily have been generated by a human being, and therefore, in my opinion, cannot
be objected to.

The only thing that I can see that could be objectionable with such
a pattern of accesses is if the pattern were to be repeated too
frequently.

>The total data set
>was (all sites) 653k URLs, and yes I probably should have filtered the test
>set to
>limit the number of accesses to any one site. Mea culpa.
>

You DID limit the number of accesses to any one site by putting in
a 2 minute delay between accesses. This limits it to at most
720 accesses in a day. You may want to further limit
it to some overall maximum per site, or simply by running the robot for
a limited time.

If this was a test set, then maybe I'd have not run it on 653K URL's,
but a much smaller set. (But you may have had good reason to
use the large set that you used.)

>He accused my "robot" of violating the "robot guidelines". He didn't
>enumerate which I violated. I'm guessing he may have been upset that the
>test was ignoring his robot.txt, ...

I think that a Website is a public place by definition,
and should expect to be visited by both humans and robots.
Furthermore, I think that it
is way out of line for a Web administrator to presume that
just because a robots.txt file exists on a site all robots MUST
follow it.

One can reasonably assume that if a robots.txt file is present,
it is a strong suggestion by the Web administrator for the desired
behavior from robots. Undoubtedly it is wise, and almost always
good practice to respect such a request. But no Web administrator
should expect that it will be followed in all cases.

I think that the Robots Exclusion Protocol is
a very valuable contribution that MOST robots should follow, just as
most robots should follow the other sensible guidelines that have been
published for robot writers. But I for one will adamantly oppose
the interpretation that any of this MUST be followed by all robots under
any circumstances. I believe that such an interpretation was never
intended for it, and in my opinion would be an intolerable intrusion
and restriction on other people's freedoms and behavior.

>Comments? Should such accesses as mine also test robots.txt?

YOU should decide, (perhaps after due consideration of the variety of
opinions expressed by this group and others that you may wish to consult).

The Constitution guarantees you the right to create programs
that will do useful things for you and/or the Web community
at large. If your program violates no laws, and in YOUR opinion
does nothing improper or unethical, go ahead, create it and run it,
no matter what anybody else thinks.

--Steve Simon