Re: alta vista and virtualvin.com

Benjamin Franz (snowhare@netimages.com)
Fri, 17 May 1996 17:07:46 -0700 (PDT)


On Fri, 17 May 1996, Larry Gilbert wrote:

> At 8:56 PM 5/5/96, chris cobb wrote:
> >- What comments do others have about indexing sites of this nature?
>
[...]
> if ($url =~ m|^http://www.psiloveyou.com/|) {
> $url =~ s|cookie=[0-9&]*||;
> $url =~ s|\?$||;
> }

I really wouldn't recommend that. If you ignore my robots.txt and use that
rule while indexing www.psiloveyou.com - you have pretty good odds of
falling down a black hole. If you absolutely insist on trying to index it
use:

if ($url =~ m|^http://www.psiloveyou.com/|) {
$url =~ s|cookie=[0-9&]*||o;
$url =~ s|\?$||o;
$url =~ s|/dcache\d+||o;
$url =~ s|returnto=[A-Za-z0-9]*&?||o;
}

instead. I can't guarantee you won't still find a way to fall down a
black hole, but your odds are considerably better. The site
*intentionally* generates a practically infinite URL space in normal
operation in several different ways. My robots.txt file is for *your*
benefit.

Attempting to index a CGI generated site when the robots.txt file has
already warned against it is a dangerous game. If a robots.txt warns
against it - they probably know better than you why it is a really bad
idea.

--
Benjamin Franz