Re: Suggestion to help robots and sites coexist a little better

Rob Hartill (robh@imdb.com)
Wed, 17 Jul 1996 21:16:28 -0600 (MDT)


>>I can push hard to get it into Apache. It also lets CGI scripts protect
>>themselves. CGI is probably the most vulnerable type of URL.
>
>With all due respect to Apache (and plenty is due), that's not quite the
>same as Netscape or another major commercial server company agreeing to
>make the change.

You're right, it's not the same... it's better :-)

1/3 of all servers are Apache now. CERN and NCSA are on the way out
(according to the stats), and the users are moving to Apache. 50% of
servers could be Apache by the end of the year.

>>The fact that it won't be useful to everyone shouldn't be used as an
>>excuse to reject the idea.
>
>True, but the fact that a second solution would be required *is* a valid
>reason. Getting the market to adopt any change is difficult; the simpler
>it is, the more chance that it'll be adopted and sooner.

Your suggestion (like this one) addresses part of the problem. Neither
address the whole problem on their own. The explicit robot identification
is a simple system to implement and it can be used by CGI as soon as
1 robot starts sending it.

>>This is useful, but it doesn't *protect* people from robots. By the
>>time my CGI has output HTML containing "don't index me" messages, it's
>>too late.. I've been hit with an unwanted CGI request.
>
>Nothing protects people from robots except ethical robot design. Anybody
>can spoof a Web server by pretending to be a browser.

The idea is not to make a spoof-proof system, it's to find ways to make
reasonable robot and web admins help each other. If the system is
simple enough for a clueless robot owner to implement then it stands
a good chance of providing a valuable service to everyone.

The hardest part of robot identification is picking a name for the
string to embed into the USER_AGENT.

Will some of the robot owners out there please make some public commitment
to this idea and perhaps suggest the string identifier to be used.

rob

-- 
Rob Hartill (robh@imdb.com)
The Internet Movie Database (IMDb)  http://www.imdb.com/
           ...more movie info than you can poke a stick at.