RE: nastygram from xxx.lanl.gov

Istvan (simon@mcs.mcs.csuhayward.edu)
Wed, 10 Jul 1996 10:40:06 +0800


In reply to

>>
>>Furthermore, I think that it
>>is way out of line for a Web administrator to presume that
>>just because a robots.txt file exists on a site all robots MUST
>>follow it.
>

that I wrote, Frank Wales said:

>Except that such an approach countenances abuse, and
>it-doesn't-apply-to-me-ism.

True. But this is implicitly the case anyway.

If the robot's author KNOWS about the Robot Exclusion
Protocol (REP), compliance is by its very nature voluntary:
it is the programmer that needs to include code (or not) that
will enforce the protocol. So there is no substitute for
good judgement, in this case.

On the other hand, if the author is not aware of REP, then clearly
no such code will be in the robot. It is my opinion that it is
unwise and even absurd to assume that every programmer in the world
that will write a robot can be expected to be aware of REP.

>Better, to me, to have
>a robots.txt standard that allows for all reasonable cases,
>with well-defined exceptions.

Your faith in the power of foreseeing every reasonable case and
exception by a small number of people, no matter how wise,
who are responsible for REP is touching, but perhaps not entirely
justified.

>Then the gulf between the
>expectations of the administrators and the aims of the
>robot-writers isn't filled with bad feeling and abuse.

It is not the administrators that proposed REP -- it is a small
group of intelligent and far-seeing robot writers that realized
that it would be a good thing that would make everybody's life easier.
It is unfortunate that some administrators then took this to
imply that if they put a robots.txt file on their site, that act
alone allows them to expect that ALL robots will follow it.
As I argued above, I think this expectation is wrong and absurd.

> Just because
>some robot writers are ignorant or bloody-minded doesn't mean
>that we should accept that's how things should be.
>

You need not accept it. You could try to educate non-complying robot writers
by writing them a NICE message, in which

a) you call their attention to REP;

b) explain to them PATIENTLY why you think that every robot
should follow it, and

c) make your case much more persuasive by also explaining
what unfortunate consequences, if any, their non-complying robot
caused on your site.

I believe that for most reasonable people this approach will work
much better than the nastygram of threats and abuse
that the person administering xxx.lanl.gov sent.

>> [ ...] But no Web administrator
>>should expect that it (the REP) will be followed in all cases.
>

>So it's sometimes good practice to ignore the request, then?

Yes, I believe that it may be. First of all there is the legitimate
issue to define just what is a robot -- a sub-thread already started
by this discussion. I do not wish to address that issue right now,
except to say that I dont think that it has an easy answer,
(as an illustration of this difficulty I'd point out that by some of the
definitions that were already proposed in that subthread EVERY request
to a Web site would qualify as that of a robot, therefore by those
definitions Netscape Navigator ought to follow REP.)

>
>>But I for one will adamantly oppose
>>the interpretation that any of this MUST be followed by all robots under
>>any circumstances. I believe that such an interpretation was never
>>intended for it, and in my opinion would be an intolerable intrusion
>>and restriction on other people's freedoms and behavior.
>[...]
>>The Constitution guarantees you the right to create programs
>>that will do useful things for you and/or the Web community
>>at large. If your program violates no laws, and in YOUR opinion
>>does nothing improper or unethical, go ahead, create it and run it,
>>no matter what anybody else thinks.

>
>Now, I have to say that I think these two paragraphs
>are unmitigated guff of the highest order. I don't believe that
>Ben Franklin et al were countenancing automated disregard
>for the published opinions of one's peers when they wrote
>the Bill of Rights, nor do I think this is an issue that the
>ACLU would spend more than ten seconds thinking about.
>

I believe that you are wrong on both counts. While you are obviously
right that Ben Franklin et al were not specifically thinking of
the current issue, what they were thinking about applies to it
nonetheless.

There is already ample jurisprudence in the United States that the
First Amendment DOES apply to creating programs. So the administrator
of xxx.lanl.gov, an American Governmental site, CANNOT revoke that
right or restrict it to the creation of only certain kinds of
programs, by creating a robots.txt file on his site.

Furthermore, the REP does not have the force of law.
Therefore, ignoring it is not a suable offense, nor does it give
anyone any particular rights to harass offenders that ignore the REP
by retaliating with mail bombs etc., as the administrator of xxx.lanl.gov,
apparently thinks he is entitled to do.

(BTW, I am no lawyer, but deliberate harassment IS against
the law in the United States, so HE could easily lose his job and get
into other very unpleasant legal consequences, should he follow up on
those threats.)

>Whether or not I agree that someone can put up a web site
>that only accepts human-initiated requests, some people seem
>to be doing it, and seem to believe
>that the robots.txt file gives
>them the ability to exclude automated processing of their site.
>If you want to disabuse them of that notion, then do it through
>education or persuasion, not by continuing to automatically
>process their site.
>

I agree with you, and that is what I am doing by writing these
messages. Furthermore, I have not ever automatically processed their
site nor, to my knowledge, has anyone else that participated in this
discussion so far.

--Steve Simon
Professor of Mathematics and Computer Science
California State University, Hayward