agents ignoring robots.txt

Rob Hartill (robh@imdb.com)
Wed, 16 Oct 1996 17:49:13 +0100 (BST)


I've started logging user agents that attempt to access areas of my server
that robots.txt are supposed to keep them away from.

Autonomy/1.0 showed up many times coming from somewhere in the
stjohns domain (Autonomy home).

WebCrawler/2.0 and Scooter/1.0 made single appearences whereas
MetaCrawler/1.2b made hundreds from various sites, but mostly
from the cs.washington.edu domain.

Despite assurances from Micorsoft that their crappy proxy server beta
(MS-Catapult/0.9) should expire, I still see one from "proxy.prebon.co.uk"
doing it's best to fill my logfiles with junk.

Apart from that, every other robot/crawler/.. has behaved.