>I welcome your frank criticisms and flames about these
>techniques.
>(A) Alert webmaster about excessive requests by a suspected
>robot.
That makes sense.
>(B) Alert webmaster about excessive error status codes being
>generated in response to imaginary requests by a robot.
Or user; it happens often someone puts a wrong link out there.
It'd be good if a webmaster tried to do something about those...
>(C) Alert webmaster about a url being requested that no human
>would request. A link not easily visible to a human being is
>included on a page.
This depends on your browser of course.
>This link leads to a warning that the
>requester has stumbled upon a robot defense, and should
>immediately exit, and should not delve further, or risk having
>the requestor's host being banned from the website. In the
>event that the robot does delve further, a message is printed
>to a file, etc.
Well, that identifies a robot; not necesarrily a bad one.
>(D) Capture the robot in an infinite loop trap that does not
>use too much resources. In the event that the infinite loop
>trap is tripped, a message is printed to a file, etc. Please
>reply with examples of simple infinite loops.
A recursive cgi script with sleeps in it?
>(E) Trap the robot into retrieving "a gigabyte-size HTML
>document generated on-the-fly" (1). Please reply with
>examples of this technique.
So how does this help you? It just ties up resources either end.
>Of course robots_deny.txt will not keep out deliberately
>misconfigured robots
/robots.txt you mean...
-- Martijn
Email: m.koster@webcrawler.com
WWW: http://info.webcrawler.com/mak/mak.html