Re: interactive generation of URL's

Benjamin Franz (snowhare@netimages.com)
Mon, 15 Jul 1996 08:18:58 -0700 (PDT)


On Mon, 15 Jul 1996, Fred Melssen wrote:

> On one of my sites, people can access a database, by way of a perl
> cgi/method=post form. I'm adding some search utilities to this script.
> The first search is still submitted by a POST method. The results will
> contain URL's, (as if they were submitted by a GET method). This means
> that in theory, an unlimited amount of URL's can be generated.
>
> I can imagine, when robots (altavista) are querying my page, this will
> result in a generation of a *huge* amound of 'new' URL's on the spot, and
> an indexing of these 'URL's. This is not my problem, of course ;)
>
> But what is the best method to avoid this? My CGI-scripts trying to
> identify the robot? Use of robots.txt?...

The last. But it doesn't really matter unless the generated page generates
more URL's via HREFs in the infinite URL space. It is reasonably safe to
let the search engines look at result pages that do not link out of
themselves. The robots do not submit 'POST's, so they cannot generate new
pages unless the above condition is met. It isn't possible in general to
detect unknown robots unless they machine gun your server.

-- 
Benjamin Franz