Re: Defenses against bad robots

Benjamin Franz (snowhare@netimages.com)
Fri, 17 May 1996 17:21:03 -0700 (PDT)


On Fri, 17 May 1996, Mordechai T. Abzug wrote:

> "K" == kathy@accessone.com spake thusly:
>
> K> (D) Capture the robot in an infinite loop trap that does not
> K> use too much resources. In the event that the infinite loop
> K> trap is tripped, a message is printed to a file, etc. Please
> K> reply with examples of simple infinite loops.
>
> Easily done with a standard server. Set up a CGI script that's pointed to
> by a URL that doesn't indicate it's a CGI. Now, note that many servers let
> you add on more path info to a CGI URL and pass it to the CGI script in the
> env. For instance, http://www.mydomain/script/111 and
> http://www.mydomain/script/112 can activate the same script with a
> different argument (which happens to be a number). Set up the CGI so it
> reads the rest of the URL after it's own location as a sequence number, and
> returns a text/html response that starts with the next number in the
> sequence. No human should have patience to get to 100, particularly if the
> link is the only thing there.

It would be better to use a directory of static web pages with static
links. A few hundred chained together with the last pointing to a CGI
script to notify you of the trip. That way you don't pay any CGI
load penalty unless the trip actually happens. Have the CGI record all the
pertinant info and mail it to you. A short script could easily generate
a few hundred chained static pages in a matter of seconds. Add that
directory to your robots.txt file and the only thing you should see is
rogue bots.

--
Benjamin Franz