Re: RFC, draft 1

Klaus Johannes Rusch (e8726057@student.tuwien.ac.at)
Mon, 18 Nov 1996 22:24:46 CET


In <328E3652.5E65@netscape.com>, Darren Hardy <dhardy@netscape.com> writes:
> > 3.1 Access method
> >
> > The instructions must be accessible via HTTP [2] from the site that
> > the instructions are to be applied to, as a resource of Internet
> > Media Type [3] "text/plain" under a standard relative path on the
> > server: "/robots.txt".
>
> Works with HTTPS too, or is that implied?

Suggested change: the robot should access robots.txt using the same method as
for the document it tries to fetch if applicable (i.e., for HTTP, HTTPS,
SHTTP, FTP), or HTTP if that method fails or is not applicable.

> Following redirects really should be required. It would make life for
> servers which serve out documents for multiple web hosts much easier. Big
> servers are often the ones most sensitive to robots. For example,
> a server which serves out dozens of vantiy domains could more easily
> implement /robots.txt per domain using redirection like so:
>
> http://www.vanity1.com/robots.txt -> redirect -> /robots/vanity1.txt
> http://www.vanity2.com/robots.txt -> redirect -> /robots/vanity2.txt

Hmmm, robots should also be required to send the Host: information defined for HTTP/1.1
probably for non-virtual servers.

> It might be very tough for content providers to use the "standard HTTP
> cache-control"
> mechanisms to specify Expires headers, for example, since robots.txt
> uses
> the text/plain type, not HTML. Typically, you would use HTML to do
> this:
> <META HTTP-EQUIV="Expires" CONTENT="blah">
> or whatever. Many Web servers have poor support for expiration in HTTP.
>
> So, I'd suggest explicitly adding an Expiration field to the robots.txt
> format.
> Using the HTTP date format, IMHO.

There is no need for HTML code to send an Expires header, as the HTTP-EQUIV already says
this is part of the HTTP protocol, any web server should be configurable to send an Expires:
header (or, in absense of such a header, the robot would assume 7 days, which should be okay
for most applications anyway).

Klaus Johannes Rusch

--
e8726057@student.tuwien.ac.at, KlausRusch@atmedia.net
http://www.atmedia.net/KlausRusch/
_________________________________________________
This messages was sent by the robots mailing list. To unsubscribe, send mail
to robots-request@webcrawler.com with the word "unsubscribe" in the body.
For more info see http://info.webcrawler.com/mak/projects/robots/robots.html