USER_AGENT and Apache 1.2

Rob Hartill (robh@imdb.com)
Mon, 25 Nov 1996 22:03:18 +0000 (GMT)


an old wish revisited..

Apache 1.2 is almost ready to unleash on the world and it has a
feature to allow access denial based on USER-AGENT.

It'd be useful if any changes to the guidelines would recommend a string
to be added to USER-AGENT that would announce the software as wanting/trying
to abide by robots.txt (or the next version of it). That would let site
admins quickly and SUCCESSFULLY block access to any region of their
server and limit the damage that could be caused by the all too
frequent "accidents".

And on the same theme, the robots should be taught how to handle 400 responses
so that they know that a "403 Forbidden" isn't worth retrying and might
be indicating a problem with the way robots.txt has been processed.

Oh, and BTW, Apache 1.2 is going to pump out HTTP/1.1 responses, so make
sure you're not hardcoded to look for HTTP/1.0 in the headers.

rob

-- 
Rob Hartill.       Internet Movie Database Ltd.    http://www.imdb.com/  

ps - make sure your robots send a "HOST" header (even for 1.0 requests) so that you index the right site !!) _________________________________________________ This messages was sent by the robots mailing list. To unsubscribe, send mail to robots-request@webcrawler.com with the word "unsubscribe" in the body. For more info see http://info.webcrawler.com/mak/projects/robots/robots.html