Re: Is a robot visiting?

Daniel T. Martin (MARTIND@carleton.edu)
Fri, 01 Nov 1996 08:31:54 -0600 (CST)


> Are there any browsers out there that pass something for HTTP_FROM?
> Is there a better idea than assuming that the accessor is a web
> crawler if there is an HTTP_FROM field?

Actually, I'm a bit surprised that Netscape and Mosaic DON'T send along From:
lines. As an example of one browser that does, look at lynx, which is
generally the best-behaved browser when it comes to the older HTTP spec.
revisions.

Since not all robots send a From: field, you would only be getting those that
are well behaved. However, since you don't seem to mind that you might only
affect the behavior with regard to the larger search engines, why not make up a
list of robot User-Agent lines, and then simply match against the list? (Be
sure not to include the version number in the match). If you wanted to get
fancy, this list could even be modified based on who accesses the /robots.txt
URL on your machine. I'm not certain how to implement this so that there is
the smallest amount of lag between the time your cgi script starts and the time
your script makes a decision as to whether or not the requestor is a robot, but
I can think of a couple of ways it might be done.

DANIEL MARTIN