That would be a kludge. It doesn't identify CGI scripts exactly
(I do not usually include /cgi-bin/ in references to my CGI scripts)
and it is not necessary tp exclude CGI scripts categorically
(I sometimes serve a set of files through a CGI script). Furthermore,
netter heuristics exist (eg. don't follow forms/POST requests).
>-- Webmasters complaining about robots indexing partially built
>document trees . So why are they linked to the main tree ???
Well, it would help if WWW servers took more pains to send accurate
Expires: and Last-modified: headers.
>-- I agree with the proposed 'positive' extension of robots.txt to
>include 'these pages should score more than the others of my site'
Perhaps, but once you're on that road, ALIWEB may be a better approach.
>-- I don't understand why , if a web site is publicly accessible it
>shouldn't be indexable and so why there is a need for such a thing as
>robots.txt .
Neither do I (see separate message).
>-- Correct me if I'm wrong on this : If webmasters want to reserve
>access to certain pages to certain specific users they can do it ,
>without needing to passwording it , by giving the pages names to
>these users and not linking them to the main tree .
Wrong (see that message): third parties have the right to poke for URLs, IMHO.
Access restriction (password-based or otherwise) will do the job.
>-- Why in the HTTP protocol there is not such an info about the
>required delay between to successive queries to the same server ( see
>the webmasters complaining about rapid fire queries from robots )
>that the webserver should send in the header of each answer .
There is an HTTP response meaning "please don't return for a while, I'm busy".
http://www.w3.org/pub/WWW/Protocols/HTTP1.0/draft-ietf-http-spec.html#Code503
-- Reinier Post reinpost@win.tue.nl