Re: Suggestion to help robots and sites coexist a little better

Nick Arnett (narnett@verity.com)
Wed, 17 Jul 1996 09:25:38 -0700


>Nick Arnett wrote:
>>
>>This is not a practical approach. First, it's a serious uphill battle to
>>get widespread acceptance of changes on servers. It can be done, of
>>course, but it takes time and political action...
>
>I can push hard to get it into Apache. It also lets CGI scripts protect
>themselves. CGI is probably the most vulnerable type of URL.

With all due respect to Apache (and plenty is due), that's not quite the
same as Netscape or another major commercial server company agreeing to
make the change. This is not to say that you shouldn't; Apache should
continue to be a leader.

>>Second, this doesn't solve one of the really big problems, which is that
>>many people who publish information on the Web don't have any kind of
>>administrative access to the server. They can't modify robots.txt, they
>>can't set any server parameters... nor should they be able to.
>
>Servers can implement robot protection with more than robots.txt.. e.g.
>a server could allow robots.txt (or something equivalent) in various
>directories (allowing more people write access).
>
>The fact that it won't be useful to everyone shouldn't be used as an
>excuse to reject the idea.

True, but the fact that a second solution would be required *is* a valid
reason. Getting the market to adopt any change is difficult; the simpler
it is, the more chance that it'll be adopted and sooner.

>This is useful, but it doesn't *protect* people from robots. By the
>time my CGI has output HTML containing "don't index me" messages, it's
>too late.. I've been hit with an unwanted CGI request.

Nothing protects people from robots except ethical robot design. Anybody
can spoof a Web server by pretending to be a browser.

Nick