RE: robots.txt unavailability

Louis Monier (monier@pa.dec.com)
Tue, 9 Jul 1996 11:29:12 -0700


Scooter used to do this, with the idea that a 403 is a "keep out" sign,
and that getting this for robots.txt would be a good hint of what the
rest of the site would do. Then I discovered that several servers
(including Apache) do not make a clean distinction between 404 and 403,
so I had to treat a 403 like a 404. The last crawl should have behaved
this way, unless of course I have a bug...

--Louis

>----------
>From: Daniel T. Martin[SMTP:MARTIND@carleton.edu]
>Sent: Monday, July 08, 1996 7:17 PM
>To: robots@webcrawler.com
>Subject: robots.txt unavailability
>
>Just a short message (sorry)
>I know what the standard is when a site reports (code 404) that the
>/robots.txt URL is not found; what is the standard behavior when other
>responses are received (say if a site reports code 403 - Forbidden)?
>
>I ask because it appears that this is handled inconsistently; opentext
>and
>lycos apparently treat this response as though it were a "not found"
>response - altavista's scooter, however, treats the response as though
>it
>were a file consisting of:
>Disallow: *
>That is, it assumes that the site is completely off-limits.
>
>Based on my own experience, I'm inclined to say that it should be
>treated
>as a "Not Found" response, as should other server errors.
>
>-=-
>Daniel Martin * your sig here *
>