Allow/deny robots from major search services

boogieoogie goobnie (kathy@accessone.com)
Mon, 17 Jun 96 21:51:20 PDT


The robots deny script (norobot.pl) is very useful. I would like to
configure the robots deny script to allow robots only from the major
search services into my site. (Every acne covered teenager running
Linux on a ppp link does not need to hammer my site.)

Here is a list of the major search services I copied from Netscape today :
(My apologies if your 'bot is not on this list. I merely copied the
list from Netscape for illustrative purposes.)

INFOSEEK GUIDE
EXCITE
OPEN TEXT INDEX
POINT
HOTBOT
IBM INFOMARKET
LYCOS
YAHOO!
ALTA VISTA
C|NET'S SHAREWARE.COM
100HOT WEBSITES
MAGELLAN
THE ELECTRIC LIBRARY
ACCUFIND

Has anyone written a robot faq, identifying the robots used by the
major search services? My thanks to M. Koster's List of Active Robots.

Do any of the major search services use the same robots, and hence will
identify themselves using the same user-agent value?

Do the major search services use the "From :" field when operating their
robot?

How can I configure robots deny so that only Mozilla from an active Netscape
"From :" address is allowed into my site, and deny another Mozilla robot
with a "From :" address of teenage_slacker@netcom.com?

Do these major search services always run their robots from the same "From :"
address? If they run their robots from different from addresses, how can I
configure robots deny to allow their robot from all their addresses?

My thanks.