Re: Suggestion to help robots and sites coexist a little better

Dirk.vanGulik (Dirk.vanGulik@jrc.it)
Mon, 15 Jul 1996 10:30:19 +0200


> (sent to robots@webcrawler.com and the apache developers list)
>
> Here's my suggestion:
>
> 1) robots send a new HTTP header to say "I'm a robot", e.g.
> Robot: Indexers R us
>
> 2) servers are extended (e.g. an apache module) to look for this
> header and based on local configuration (*) issues "403 Forbidden"
> responses to robots that stray out of their allowed URL-space
>
> 3) (*) on the site being visited, the server would read robots.txt and
> perhaps other configuration files (such as .htaccess) to determine
> which URLs/directories are off-limits.
>
> Using this system, a robot that correctly identifies itself as such will
> not be able to accidentally stray into forbidden regions of the server
> (well, they won't have much luck if they do, and won't cause damage).
>
> Adding an apache module to the distribution would make more web admins
> aware of robots.txt and the issues relating to it. Being the leader, Apache
> can implement this and the rest of the pack will follow.
>
> rob

You might want to make this more attractive to robot developers by for
example adding a lines to the header like

Index-Of-URLs: http:/asda/index.url.txt
Index-Of-Metadata: http:/asada/index.metadata.txt

With various descriptions; such as url, line of keywords, line of description and blank
line. We found this quite usefull internally.

Dw.