Re: RFC, draft 1

Martijn Koster (m.koster@webcrawler.com)
Tue, 19 Nov 1996 11:29:46 -0800


At 2:24 PM 11/18/96, Klaus Johannes Rusch wrote:

>Suggested change: the robot should access robots.txt using the same method as
>for the document it tries to fetch if applicable (i.e., for HTTP, HTTPS,
>SHTTP, FTP), or HTTP if that method fails or is not applicable.

As mentioned, I'm rather uncomfortable with assuming that is a sufficient
specification in all situations.

It is the obvious thing though, and I expect time will settle the
desirability and details.

>Hmmm, robots should also be required to send the Host: information
>defined for HTTP/1.1 probably for non-virtual servers.

Agreed, though this spec is not the right place to define that.

FYI, we have recently changed the WebCrawler robot implementation to
send the Host header.

>There is no need for HTML code to send an Expires header, as the
>HTTP-EQUIV already says this is part of the HTTP protocol,

Right, but the /robots.txt isn't HTML, so that's irrelevant :-)

>any web server should be configurable to send an Expires:
>header (or, in absense of such a header, the robot would assume 7 days,
>which should be okay for most applications anyway).

Agreed.

-- Martijn

Email: m.koster@webcrawler.com
WWW: http://info.webcrawler.com/mak/mak.html

_________________________________________________
This messages was sent by the robots mailing list. To unsubscribe, send mail
to robots-request@webcrawler.com with the word "unsubscribe" in the body.
For more info see http://info.webcrawler.com/mak/projects/robots/robots.html