Re: Broadness of Robots.txt (Re: Washington again !!!)

Hrvoje Niksic (hniksic@srce.hr)
21 Nov 1996 05:02:20 +0100


Brian Clark (bclark@radzone.org) wrote:
> I'd still maintain that even a "Page Watcher" is a robot (anything that
> doesn't think about what they are doing and does it either repeatedly or
> broadly.) I don't think "a user told it to do it" fairly qualifies the
> extent of this protocol's usage.

Exactly. My program Wget is also `told' by the user where it should
go and what it should do, but it still honors robots.txt.

On a site (where I test the program) we have a CGI Perl script that
converts info pages to HTML, on the fly (via info2html). This
procedure is *very* time-consuming, but it's not much of a problem,
since few people know about it. However, when my program found the
link to /cgi-bin/info2html, the machine was halted by exertion.

The solution was, of course, to put "Disallow: /cgi-bin/info2www", and
that was it. Another case are various temporary files, that change
every few seconds or so.

Any utility that doesn't respect robots.txt will fail utterly in all
of these cases, which is why "robot" should always be defined as
"non-human", not as "completely automated".

The people on this list should take care not to put too many things to
robots.txt, because it will make authors of such utilities skeptical
whether they should implement the support at all ("_my_ utility is
_not_ a robot").

-- 
Hrvoje Niksic <hniksic@srce.hr> | Hocemo 101-icu!
--------------------------------+--------------------------------
* Q: What is an experienced Emacs user?
* A: A person who wishes that the terminal had pedals.
_________________________________________________
This messages was sent by the robots mailing list. To unsubscribe, send mail
to robots-request@webcrawler.com with the word "unsubscribe" in the body.
For more info see http://info.webcrawler.com/mak/projects/robots/robots.html