RE: Is a robot visiting?

Greg Fenton (gregf@netcom.ca)
Sat, 2 Nov 1996 19:10:22 -0500


----------
From: Tim Freeman[SMTP:tim@infoscreen.com]
Sent: November 1, 1996 2:52 PM
To: e8726057@student.tuwien.ac.at
Cc: robots@webcrawler.com
Subject: Re: Is a robot visiting?

Looking at the robot list, I see these user agents:

BlackWidow,BackRub/*.*,root/0.1,Deweb/1.01,Hamahakki/0.2

Therefore I don't think searching for keywords in the USER_AGENT would
work. I don't want to try to update the robot list on a deployed
system.

Given a choice among heuristics, I prefer simpler ones. So here's
what my plan is at the moment: if I find an instance of a browser of
measurably positive popularity that sends HTTP_FROM, use this test:

If it takes image/*, it's a browser.
elsif it has an HTTP_FROM, it's a robot.
else it is a browser.

otherwise simply go by the HTTP_FROM field.

How about if the USER_AGENT matches one of the list of known browsers, =
then it is a person. Else, it is a robot. I know that there are a =
number of browsers out there, but I think it would be fairly easy to =
check the server logs to pull out the vast majority of them and to =
adjust the list as necessary.

gregf.