Re: robots.txt , authors of robots , webmasters ....

Reinier Post (reinpost@win.tue.nl)
Wed, 17 Jan 1996 11:54:42 +0100 (MET)


You (savron@world-net.sct.fr) write:
>
>A few thoughts about the robots stuff :
>
>-- there should be no need to include a line such as :
> /cgi-bin/
> in robots.txt
> because it should come as a standard of indexer robots

That would be a kludge. It doesn't identify CGI scripts exactly
(I do not usually include /cgi-bin/ in references to my CGI scripts)
and it is not necessary tp exclude CGI scripts categorically
(I sometimes serve a set of files through a CGI script). Furthermore,
netter heuristics exist (eg. don't follow forms/POST requests).

>-- Webmasters complaining about robots indexing partially built
>document trees . So why are they linked to the main tree ???

Well, it would help if WWW servers took more pains to send accurate
Expires: and Last-modified: headers.

>-- I agree with the proposed 'positive' extension of robots.txt to
>include 'these pages should score more than the others of my site'

Perhaps, but once you're on that road, ALIWEB may be a better approach.

>-- I don't understand why , if a web site is publicly accessible it
>shouldn't be indexable and so why there is a need for such a thing as
>robots.txt .

Neither do I (see separate message).

>-- Correct me if I'm wrong on this : If webmasters want to reserve
>access to certain pages to certain specific users they can do it ,
>without needing to passwording it , by giving the pages names to
>these users and not linking them to the main tree .

Wrong (see that message): third parties have the right to poke for URLs, IMHO.
Access restriction (password-based or otherwise) will do the job.

>-- Why in the HTTP protocol there is not such an info about the
>required delay between to successive queries to the same server ( see
>the webmasters complaining about rapid fire queries from robots )
>that the webserver should send in the header of each answer .

There is an HTTP response meaning "please don't return for a while, I'm busy".

http://www.w3.org/pub/WWW/Protocols/HTTP1.0/draft-ietf-http-spec.html#Code503

-- 
Reinier Post						 reinpost@win.tue.nl