RE: nastygram from xxx.lanl.gov

Frank Wales (frank@limitless.co.uk)
Thu, 11 Jul 1996 20:44:08 +-100


[One more message from me to put a few things into
context, then I'm bailing out of these sub-threads publicly
before the subject drift becomes too severe. I'm happy to
continue to debate things off the list.]

Steve Simon:

>Furthermore, I think that it
>is way out of line for a Web administrator to presume that
>just because a robots.txt file exists on a site all robots MUST
>follow it.

Me:

>Except that such an approach countenances abuse, and
>it-doesn't-apply-to-me-ism.

Steve:
>True. But this is implicitly the case anyway.

>[...] It is my opinion that it is unwise and even absurd to
>assume that every programmer in the world that will write
>a robot can be expected to be aware of REP.

Of course, but as time goes by, and REP evolves into something
that opinion-leading developers and administrators agree on
(and the admins do have a positive role to play), site administrators
might expect that the number of ignorant robots will decline,
especially as most robots will probably be the product of kits and
standard libraries. Such kits would need to incorporate the REP
rules, though, and I believe this is more likely if they make it
into a recognised RFC or some such accepted document.

Me:
>Better, to me, to have
>a robots.txt standard that allows for all reasonable cases,
>with well-defined exceptions.

Steve:
>Your faith in the power of foreseeing every reasonable case and
>exception by a small number of people, no matter how wise,
>who are responsible for REP is touching, but perhaps not entirely
>justified.

I think that it is: the internet is built on a set of evolving standards
that deal with the majority of cases well enough to give a stable,
workable system that is still flexible enough to adapt quickly to
change. I see no reason why creating a taxonomy of robots
should be any different, nor a system of protocols for permitting
them to access the resources of the net in an organised and
generally-acceptable manner. If you have reasons for your
apparent disdain, I'd be interested in hearing them (probably
off the list).

Me:
>Then the gulf between the
>expectations of the administrators and the aims of the
>robot-writers isn't filled with bad feeling and abuse.

Steve:

>It is not the administrators that proposed REP -- it is a small
>group of intelligent and far-seeing robot writers that realized
>that it would be a good thing that would make everybody's life easier.
>It is unfortunate that some administrators then took this to
>imply that if they put a robots.txt file on their site, that act
>alone allows them to expect that ALL robots will follow it.
>As I argued above, I think this expectation is wrong and absurd.

Phrased like that, of course it's silly. But there's the obvious
corollary to your statement, given your earlier comment that we
can never reasonably expect every robot to be written by a
knowledgeable programmer, that administrators will never be
able to assume that robots.txt files ought to be obeyed. If so,
will they ever be entitled to be annoyed by automated
processing? I would expect that, in a year or so, that they
should be.

Another corollary is that putting up a robots.txt
file is *all* they need to do; certainly, at the moment that
would be naive. Although I think it is strictly irrelevant
to the discussion about whether or when the REP should
be followed, I do find it interesting to note the volume of
postings on whether the xxx.lanl.gov site is reasonably
constructed for use as a public web site, given the kinds of
requests that are apparently prevalent on the web today.

(Although I should also state for the record here that I think
automated or malicious attacks on remote robot-serving
sites is the wrong response, as Steve has indicated.)

Steve:
>> [ ...] But no Web administrator
>>should expect that it (the REP) will be followed in all cases.

Me:
>So it's sometimes good practice to ignore the request, then?

Steve:
>Yes, I believe that it may be.

I think this definitely needs enlarging upon. Can you
suggest where I can find some examples?

Steve:
>First of all there is the legitimate issue to define just what is a
>robot -- a sub-thread already started by this discussion.

As I hinted at above, I think a taxonomy of automated net-
crawling thingies is essential to provide a basis for further
work. Much of the friction seems to be based on differing
interpretations of what a 'robot' is, some almost stupidly
childish or petulant.

Steve:
>>The Constitution guarantees you the right to create programs
>>that will do useful things for you and/or the Web community
>>at large. If your program violates no laws, and in YOUR opinion
>>does nothing improper or unethical, go ahead, create it and run it,
>>no matter what anybody else thinks.

Me:
>I don't believe that
>Ben Franklin et al were countenancing automated disregard
>for the published opinions of one's peers when they wrote
>the Bill of Rights, nor do I think this is an issue that the
>ACLU would spend more than ten seconds thinking about.

Steve:
>I believe that you are wrong on both counts. While you are obviously
>right that Ben Franklin et al were not specifically thinking of
>the current issue, what they were thinking about applies to it
>nonetheless.

Let me be clear about what I take to be the current issue:
writing software that knowingly overrides or ignores well-defined requests
for limitation to that software's behaviour, where such requests are
made by those people responsible for the computing resources
being consumed by the software's non-compliance. Hence my
phrase: "automated disregard for the published opinions of one's
peers", in which I intended a site's robots.txt file to represent
the administrators' published opinions about how programs should
behave when encountering their site, and the 'automated disregard'
to refer to the execution of the software that knowingly ignores it.

Steve:
>There is already ample jurisprudence in the United States that the
>First Amendment DOES apply to creating programs. So the administrator
>of xxx.lanl.gov, an American Governmental site, CANNOT revoke that
>right or restrict it to the creation of only certain kinds of
>programs, by creating a robots.txt file on his site.

You seem to be arguing that a robots.txt file should inhibit the
creation of software, which is certainly not what I am debating,
nor something I believe, even if it were possible. Of course the
Bill of Rights protects the creation of software, but it
definitely does not extend to protecting all uses of such software,
for example where such uses tangibly affect the freedoms or
property of others.

Steve:
>Furthermore, the REP does not have the force of law.
>Therefore, ignoring it is not a suable offense, nor does it give
>anyone any particular rights to harass offenders that ignore the REP
>by retaliating with mail bombs etc., as the administrator of xxx.lanl.gov,
>apparently thinks he is entitled to do.

They may conceivably be able to argue in court that the REP,
plus their robots.txt file, constituted certain warnings that the
real costs involved would be reclaimed from sites that hammered
their systems, but that could only be part of a case argued
mostly between peers such as other administrators and
knowledgeable net-types, and would probably be severely
undermined by their apparent vigilantism. Of course, throwing
stones is easier than convincing a jury, at least in the short term.
With luck, and a less fuzzy, better understood REP, that will change
for the better.

--
Frank Wales [frank@limitless.co.uk]