Re: Defenses against bad robots

Martijn Koster (m.koster@webcrawler.com)
Fri, 17 May 1996 17:47:47 -0700


At 1:04 PM 5/17/96, kathy@accessone.com wrote:

>I welcome your frank criticisms and flames about these
>techniques.

>(A) Alert webmaster about excessive requests by a suspected
>robot.

That makes sense.

>(B) Alert webmaster about excessive error status codes being
>generated in response to imaginary requests by a robot.

Or user; it happens often someone puts a wrong link out there.
It'd be good if a webmaster tried to do something about those...

>(C) Alert webmaster about a url being requested that no human
>would request. A link not easily visible to a human being is
>included on a page.

This depends on your browser of course.

>This link leads to a warning that the
>requester has stumbled upon a robot defense, and should
>immediately exit, and should not delve further, or risk having
>the requestor's host being banned from the website. In the
>event that the robot does delve further, a message is printed
>to a file, etc.

Well, that identifies a robot; not necesarrily a bad one.

>(D) Capture the robot in an infinite loop trap that does not
>use too much resources. In the event that the infinite loop
>trap is tripped, a message is printed to a file, etc. Please
>reply with examples of simple infinite loops.

A recursive cgi script with sleeps in it?

>(E) Trap the robot into retrieving "a gigabyte-size HTML
>document generated on-the-fly" (1). Please reply with
>examples of this technique.

So how does this help you? It just ties up resources either end.

>Of course robots_deny.txt will not keep out deliberately
>misconfigured robots

/robots.txt you mean...

-- Martijn

Email: m.koster@webcrawler.com
WWW: http://info.webcrawler.com/mak/mak.html