Always use it.
>Sysadmins probably would like their sites to be as fast as possible, but
>the data on those site will not be in Ilse, so people will blame Ilse for
>being incomplete...
Well, why don't you provide a page (automatically generated if you wish) of
'stubborn sites' that use this construct to keep all robots out? Providing
more statistics on the usage of robots.txt will be interesting for the
initiated, but this simple measure will already catch the complaints.
If site maintainers refuse access to robots, they don't realise that
(a) indexing robots cause no more than a very small percentage of traffic
(b) search engines are probably by far the most popular method of finding
pages; I wouldn't be surprised if they turn out to multiply the number
of hits on the average site by a factor five or so;
(c) the maintainer is not paying his moral duty to the Internet community
at large, assuming that users *from* the site use WWW search
engines frequently;
(d) the maintainer is not paying his moral duty to his own information
providers, who probably intend their own pages to be found and visited;
All this, of course, assuming that the information on the site is open for
public access.
Actually, (c) suggests a more devious strategy: lock out all users from using
your search engine whose host machine is in the same domain as the stubborn
server. Obviously, many innocent users will be affected, and you may or may
not be prepared to cope with the resulting email :)
>I can't go contacting all the webmasters/roots at these machines...
Your script can; it's easy to send an automatic message to webmaster@domain
upon finding a 'closed' robots.txt at http://www.domain/robots.txt.
>PS - I'm new to this mailinglist, so if I've asked a faq, don't blame me :-)
I've seen discussion on this issue, but the exact questions were different.
-- Reinier Post reinpost@win.tue.nl