> Should a well behaved robot do this:
>
> kbackdraft-bbn.infoseek.com - - [04/Nov/1996:01:27:35 -0500] "GET /robots.txt
> HTTP/1.0" 404 207
> backdraft-bbn.infoseek.com - - [04/Nov/1996:01:27:40 -0500] "GET
> /list_archives/webph/0066.html HTTP/1.0" 200 2642
> backdraft-bbn.infoseek.com - - [04/Nov/1996:01:28:04 -0500] "GET
> /list_archives/webph/0069.html HTTP/1.0" 200 3181
>
First of all, note that your site doesn't have 'robots.txt' :-)
In my opinion....I don't think infoseek's robot is dump.....I just looked
at my access_log and saw the same from other search engines such as
opentext, lycos, atext...etc....if a document has a link to one document
in your server then infoseek will have to try to get robots.txt. If there
is another link from another document then I would say it is natural
that it will try to get robots.txt again.
The question is: if a search engine or a robot doesn't find 'robots.txt'
(return code from HTTP is 404), should it try to request 'robots.txt'
again? Certainly not, but this might make the life of a robot writer
harder!
Have fun,
Qusay H. Mahmoud
_________________________________________________
This messages was sent by the robots mailing list. To unsubscribe, send mail
to robots-request@webcrawler.com with the word "unsubscribe" in the body.
For more info see http://info.webcrawler.com/mak/projects/robots/robots.html