>Suggested change: the robot should access robots.txt using the same method as
>for the document it tries to fetch if applicable (i.e., for HTTP, HTTPS,
>SHTTP, FTP), or HTTP if that method fails or is not applicable.
As mentioned, I'm rather uncomfortable with assuming that is a sufficient
specification in all situations.
It is the obvious thing though, and I expect time will settle the
desirability and details.
>Hmmm, robots should also be required to send the Host: information
>defined for HTTP/1.1 probably for non-virtual servers.
Agreed, though this spec is not the right place to define that.
FYI, we have recently changed the WebCrawler robot implementation to
send the Host header.
>There is no need for HTML code to send an Expires header, as the
>HTTP-EQUIV already says this is part of the HTTP protocol,
Right, but the /robots.txt isn't HTML, so that's irrelevant :-)
>any web server should be configurable to send an Expires:
>header (or, in absense of such a header, the robot would assume 7 days,
>which should be okay for most applications anyway).
Agreed.
-- Martijn
Email: m.koster@webcrawler.com
WWW: http://info.webcrawler.com/mak/mak.html
_________________________________________________
This messages was sent by the robots mailing list. To unsubscribe, send mail
to robots-request@webcrawler.com with the word "unsubscribe" in the body.
For more info see http://info.webcrawler.com/mak/projects/robots/robots.html