Exactly. My program Wget is also `told' by the user where it should
go and what it should do, but it still honors robots.txt.
On a site (where I test the program) we have a CGI Perl script that
converts info pages to HTML, on the fly (via info2html). This
procedure is *very* time-consuming, but it's not much of a problem,
since few people know about it. However, when my program found the
link to /cgi-bin/info2html, the machine was halted by exertion.
The solution was, of course, to put "Disallow: /cgi-bin/info2www", and
that was it. Another case are various temporary files, that change
every few seconds or so.
Any utility that doesn't respect robots.txt will fail utterly in all
of these cases, which is why "robot" should always be defined as
"non-human", not as "completely automated".
The people on this list should take care not to put too many things to
robots.txt, because it will make authors of such utilities skeptical
whether they should implement the support at all ("_my_ utility is
_not_ a robot").
-- Hrvoje Niksic <hniksic@srce.hr> | Hocemo 101-icu! --------------------------------+-------------------------------- * Q: What is an experienced Emacs user? * A: A person who wishes that the terminal had pedals. _________________________________________________ This messages was sent by the robots mailing list. To unsubscribe, send mail to robots-request@webcrawler.com with the word "unsubscribe" in the body. For more info see http://info.webcrawler.com/mak/projects/robots/robots.html