Re: robots.txt , authors of robots , webmasters ....OMOMOM[D

Wayne Lamb (wlamb@walnut.holli.com)
Wed, 17 Jan 1996 09:43:28 -0500 (EST)


Reinier Post wrote:
>
> You (savron@world-net.sct.fr) write:
> >
> >A few thoughts about the robots stuff :
> >
> >-- there should be no need to include a line such as :
> > /cgi-bin/
> > in robots.txt
> > because it should come as a standard of indexer robots
>
> That would be a kludge. It doesn't identify CGI scripts exactly
> (I do not usually include /cgi-bin/ in references to my CGI scripts)
> and it is not necessary tp exclude CGI scripts categorically
> (I sometimes serve a set of files through a CGI script). Furthermore,
> netter heuristics exist (eg. don't follow forms/POST requests).
>
> >-- Webmasters complaining about robots indexing partially built
> >document trees . So why are they linked to the main tree ???
>
> Well, it would help if WWW servers took more pains to send accurate
> Expires: and Last-modified: headers.
>
> >-- I agree with the proposed 'positive' extension of robots.txt to
> >include 'these pages should score more than the others of my site'
>
> Perhaps, but once you're on that road, ALIWEB may be a better approach.
>
> >-- I don't understand why , if a web site is publicly accessible it
> >shouldn't be indexable and so why there is a need for such a thing as
> >robots.txt .
>
> Neither do I (see separate message).
>
> >-- Correct me if I'm wrong on this : If webmasters want to reserve
> >access to certain pages to certain specific users they can do it ,
> >without needing to passwording it , by giving the pages names to
> >these users and not linking them to the main tree .
>
> Wrong (see that message): third parties have the right to poke for URLs, IMHO.
> Access restriction (password-based or otherwise) will do the job.
>
> >-- Why in the HTTP protocol there is not such an info about the
> >required delay between to successive queries to the same server ( see
> >the webmasters complaining about rapid fire queries from robots )
> >that the webserver should send in the header of each answer .
>
> There is an HTTP response meaning "please don't return for a while, I'm busy".
>
> http://www.w3.org/pub/WWW/Protocols/HTTP1.0/draft-ietf-http-spec.html#Code503
>
> --
> Reinier Post reinpost@win.tue.nl
>

--