Looking at the robot list, I see these user agents:
BlackWidow,BackRub/*.*,root/0.1,Deweb/1.01,Hamahakki/0.2
Therefore I don't think searching for keywords in the USER_AGENT would
work. I don't want to try to update the robot list on a deployed
system.
Given a choice among heuristics, I prefer simpler ones. So here's
what my plan is at the moment: if I find an instance of a browser of
measurably positive popularity that sends HTTP_FROM, use this test:
If it takes image/*, it's a browser.
elsif it has an HTTP_FROM, it's a robot.
else it is a browser.
otherwise simply go by the HTTP_FROM field.
How about if the USER_AGENT matches one of the list of known browsers, =
then it is a person. Else, it is a robot. I know that there are a =
number of browsers out there, but I think it would be fairly easy to =
check the server logs to pull out the vast majority of them and to =
adjust the list as necessary.
gregf.