Re: Washington again !!!

Rob Hartill (robh@imdb.com)
Wed, 20 Nov 1996 02:42:19 +0000 (GMT)


Erik Selberg wrote:

>You're right Rob, I do feel differently about your apology after
>having read this. But back to the matter at hand:

More of the story:

Last week a MetaCrawler robot (or robots) malfunctioned while visiting
xxx.lanl.gov. Everyone knows that's not the best place to send a well
behaved robot/agent (or whatever you want to call it). A broken one is
just asking for trouble.

xxx.lanl.gov had been the victim of an earlier incident with MetaCrawler
and had contacted the folks at cs.washington.edu. After the most recent
incident, someone at xxx.lanl.gov posted a robot alert to the alert mailing
list (robot-alert@zyzzyva.com). I followed up to say that this wasn't the
first problem with MetaCrawler that had been reported.

After I posted that, Erik contacted me and for a while we talked about
the problems. Erik defended MetaCrawler (vigorously) and appeared to be
talking on behalf of the project. I commented that if MetaCrawler became a
nuisance at my site there would be trouble. Within days I noticed MetaCrawler
rapid-firing requests to my CGI (yes, they even had a "?" in the URL).
Thousands of requests from at least 4 MetaCrawler locations in
cs.washington.edu, arriving as fast as the network allowed. Each time the
server responded with a "403 Forbidden" response and shipped a message
saying exactly why it was forbidden.

After mailing Erik I posted my warning/rant to the alert list and
robots@webcrawler.com . Soon after that Erik replied to say this was
nothing to do with him. I apologised to Erik and will do so again now,
on the understanding that this has nothing to do with him. I still think
Erik's defence of MetaCrawler's methods, even after they have been shown
to fail big time, defies belief.

Now will the real culprit who knowingly let broken software run from
cs.washington.edu please step forward. You must have a real good explanation
for this one.

As is typical (from responses I've seen from other broken robot managers),
Erik says that these incidents are rare and that he's not aware of any
other serious problems. These problems are not rare, if they were, then
the sites which actively track these 'offenders' down (xxx.lanl.gov and
imdb.com being the better known cases) wouldn't keep seeing problems on a
regular basis. The problems are not just on those two sites; they're not
special in any way other than the people who run them are watching for
problems.

In recent months I've mellowed somewhat w.r.t robots, I've read the robot
mailing list and I can see what you're trying to achieve, then bang along
comes another dumb robot from a site that should know better (after earlier
incidents), and we get the same old story. "it's a one off", "it's fixed",
"we don't need to read robots.txt on Wednesday's because...", "how did
you let your URLs become so easy to misuse ?", "we were testing some new code",
"a network error must have confused it", "you're using a new protocol we
don't support yet", "you need passwords", "use POST for everything",
"don't use mailto: if you don't want a Russian mail order bride".

sigh,
rob
_________________________________________________
This messages was sent by the robots mailing list. To unsubscribe, send mail
to robots-request@webcrawler.com with the word "unsubscribe" in the body.
For more info see http://info.webcrawler.com/mak/projects/robots/robots.html