Oh, yeah - that's what I meant.
> I agree, as some servers may be configured to return wrong responses.
>Are there any? Some sites masquerade 404 and 403 to same error for
>security reasons, but I have so far seen only 404 as the error they
>return in both cases.
For an example of an often-returned 403 error, visit any site with the OSU
DecThreads-based server (I know of only three sites; undoubtedly there are
more; however, I only feel comfortable to mention the one I control) and ask
for some top-level document that's not there; for example,
http://public.carleton.edu/gonzo.html
The error returned will be 403. This is because the default configuration
of the OSU server, unlike many others, does not simply map /* to
$document_root/* - instead, it allows access only to certain top-level
directories; this can be most confusing and I won't go into it as I still
can't find a decent way to explain it thoroughly. Anyway, the upshot of
this is that if a web-master did not explicitly plan for someone asking for
/robots.txt, then a code 403 is returned. While versions since 1.9c all
contain a default /robots.txt (so that something is returned, instead of
code 403), unless the webmaster of a site has explicitly planned for it,
sites running earlier versions will return error 403, in a case where really
the error should be "not found."
This isn't quite what you asked for, since it is still very possible to get
a 404 error from these sites; just try:
http://public.carleton.edu/www/gonzo.html
Also, how should robots handle other responses from the server? Handling a
redirect seems obvious, but what of other responses? I personally would
think that the best solution would be the simplest; that is, to treat all
unexpected response codes as "not found".
-=-
Daniel Martin * your sig here *