Re: Cron <robh@us2> /usr/home/robh/show_robots (fwd)

olly@muscat.co.uk
10 Jan 1997 11:04:54 -0000


Rob Hartill <robh@imdb.com> writes:
>Somehow I expected better from w3c, or perhaps this agent is poorly named.
>The requests are for robots.txt "protected" urls. My check for "Robot"
>in the User-Agent intercepted these:
>
>131.220.162.18 = CIPS02.physik.uni-bonn.de
>
>W3CRobot/4.0D libwww/4.0D via Squid Cache version 1.0.17 @ 131.220.162.18 [970108 12:37:57 GMT] "GET /M/person-exact?+Sheppard,+Morgan"
>[...]

libwww is a perl library which has its own robots.txt handling code,
although that doesn't mean it's being used in this case. I think 4.x is a
version for Perl 4 and 5.x for Perl 5.

I've just had a look through the code which handles robot exclusions in
libwww 5.04 and at what I assume is the robots.txt you refer to
(http://www.imdb.com/robots.txt) and I can't see any obvious problems, and a
quick test program confirms this.

Olly
_________________________________________________
This messages was sent by the robots mailing list. To unsubscribe, send mail
to robots-request@webcrawler.com with the word "unsubscribe" in the body.
For more info see http://info.webcrawler.com/mak/projects/robots/robots.html