yup, but that shouldn't make a difference here.
> In my opinion....I don't think infoseek's robot is dump.....I just looked
> at my access_log and saw the same from other search engines such as
> opentext, lycos, atext...etc....
hmm, I didn't:
crawl3.atext.com - - [04/Nov/1996:11:59:14 -0500] "GET /robots.txt HTTP/1.0"
407
crawl3.atext.com - - [04/Nov/1996:12:02:19 -0500] "GET /~sports/facilities/inde1
crawl3.atext.com - - [04/Nov/1996:12:07:13 -0500] "GET /~heikkone/search.html
H2
crawl3.atext.com - - [04/Nov/1996:12:09:31 -0500] "GET /~psych/ HTTP/1.0" 200
49
crawl3.atext.com - - [04/Nov/1996:12:27:20 -0500] "GET /~jinglis/travels/europe5
crawl3.atext.com - - [04/Nov/1996:12:27:55 -0500] "GET /~ru351/discussion.html
7
crawl3.atext.com - - [04/Nov/1996:12:30:28 -0500] "GET /~lien/ HTTP/1.0" 200
202
crawl3.atext.com - - [04/Nov/1996:12:32:10 -0500] "GET /~publish/catalog/studen9
demonet.opentext.com - - [05/Nov/1996:03:41:01 -0500] "GET /robots.txt
HTTP/1.07
demonet.opentext.com - - [05/Nov/1996:03:45:34 -0500] "GET
/~dickerso/research/7
demonet.opentext.com - - [14/Nov/1996:07:40:52 -0500] "GET /robots.txt
HTTP/1.07
demonet.opentext.com - - [15/Nov/1996:03:46:11 -0500] "GET /robots.txt
HTTP/1.07
this one obviously tried at very different times....so it's excused...
> if a document has a link to one document
> in your server then infoseek will have to try to get robots.txt. If there
> is another link from another document then I would say it is natural
> that it will try to get robots.txt again.
not if it remembers that it already tried to retrieve it. why repeat the same
mistake twice ?
> The question is: if a search engine or a robot doesn't find 'robots.txt'
> (return code from HTTP is 404), should it try to request 'robots.txt'
> again? Certainly not, but this might make the life of a robot writer
> harder!
not that hard :) this makes it hard on a web server, which is worse.
Otis
-- eZines Database - <URL:http://www.dominis.com/Zines/> eBooks Dominis Bookstore - <URL:http://www.booksite.com/cgi-bin/zines> _________________________________________________ This messages was sent by the robots mailing list. To unsubscribe, send mail to robots-request@webcrawler.com with the word "unsubscribe" in the body. For more info see http://info.webcrawler.com/mak/projects/robots/robots.html