Re: Is a robot visiting?

Hallvard B Furuseth (h.b.furuseth@usit.uio.no)
Mon, 4 Nov 1996 20:53:46 +0100 (MET)


Aaron Nabil writes...
>Tim Freeman writes...
>>
>> The purpose is what I said: my software normally gives different URL's
>> to different folks at the main entry to the site, so I don't want a
>> search engine causing a bunch of different folks to have the same URL.

I'e seen serveral sensible uses of this -- as user or session IDs, or to
force users to make links to a service's entry page instead of further
into the service, or simply as another way to write the "input" part of
"/cgi-bin/foo?input".

I may write a service which returns different pages to users and robots,
because robots.txt is too limited (lacks allow: and regexps). Some of
the links won't be given to robots, only to humans. Possibly a few
"robot bait" links will not be visible to humans, and probably just lead
to an error response. Used to automatically identify new robots.
Should this service do something in order to not offend robots?

> You need to say things like "if we detect what we think is score inflation,
> we may delete your page or your entire site from our index. We decide
> what 'score inflation' is,

Quite so. Agree on it, decide how you want a nice-inflating WWW service
to behave, and publish it -- preferably in the HTML standard, which is
where we can see how to write proper www pages. (Well, I admit that'd
be kind of hard until you have enough experience with robot exclusion:-)

> and our criteria may change anytime.

that makes it hard for WWW authors avoid unintentional score inflation.

> If you try it, you might sneak by, but if we catch you..."

...then I hope you'll at least inform the service owner, in case he
didn't mean to be rude.

Regards,

Hallvard