Re: nastygram from xxx.lanl.gov

Paul Francis (francis@cactus.slab.ntt.jp)
Tue, 9 Jul 96 17:22:03 JST


>
> Well, would you then want that program to obey the exclusion standard? You
> wouldn't want it zapping servers with thousands of requests in a few seconds,
> but isn't one HEAD every couple minutes slow enough? Does such an application
> really necessitate any 'exclusion' at all?
>

I'm not sure it is a good example, because, off the
top of my head, I don't think I'd try to find out if
everything in the cache was up-to-date at a given time.

But, anyway, lets assume the following:

1. You are doing a legitimate experiment.
2. It gets no more than one HEAD every
couple minutes, and no more than 400
HEADs for any given server.

Strictly speaking, I think you are ethically obligated,
within the limits of practicality (and detectability)
to get permission to do even this to somebody's site.
I would not consider somebody who sent a nasty-gram
to be out of line (though I might personally consider
him to be rather uptight and wouldn't go out of my
way to have a beer with him).

Practically speaking, to save myself time and trouble,
I would probably consider a robots.txt file that
allowed access to be tacit permission to run the
experiment *even though the administrator of that
site might in fact not want me to run the experiment
on their site* (maybe they put the robots.txt there
because they want to be indexed, but don't want to
be pinged otherwise). (By no means do I always do
what I strictly speaking consider myself ethically
obligated to do. To compensate for that, I use guilt.)

By the same token, I would probably consider a robots.txt
file that disallowed access to be tacit denial of
permission to run the experiment (even though the
administrator of that system might be perfectly happy
to assist me in my experiment, but just happens to
hate indexing robots).

I guess this is a much-too-long-winded way of saying
that I personally think that people that build
automatic-url-pinging-boxes of pretty much any kind
should take a broad view of the definition of robot
and honor the robots.txt file.

I'll shut up now.

PF