My big question is why this spider doesn't adhere to the Robot Exclusion
Standard ... seems this one (which can run independent of a human operator
and searches websites "all the way down to its bottom pages") is pretty
clearly a robot that should adhere to the standard (rather than an agent
that falls into the "gray zone" of the need to comply to this standard.)
Aside from the issues of harvesting email addresses (which I'll post
something about later on as it raises some important new issues to consider
in the robots.txt standard) and the ethical issues involved, it seems that
you've got some work to do to incorporate the Exclusion standard in this
beasty.
Brian
-------- REPLY, Original message follows --------
Date: Sunday, 13-Oct-96 07:17 PM
From: HipCrime \ Internet: (hipcrime@hipcrime.com) To:
robots@webcrawler.com \ Internet: (robots@webcrawler.com)
Subject: ActiveAgent
robot-id: activeagent
robot-name: ActiveAgent
robot-cover-url: http://www.hipcrime.com
robot-details-url: http://www.hipcrime.com
robot-owner-name: robert returned
robot-owner-url: http://www.hipcrime.com
robot-owner-email: agent@hipcrime.com
robot-status: active
robot-purpose: other
robot-type: applet
robot-platform: all
robot-availability: source
robot-exclusion: no
robot-exclusion-useragent: no
robot-noindex: no
robot-host: anywhere
robot-from: no
robot-useragent: no
robot-language: java
robot-description: crawling email robot and publicity engine
robot-history: hipcrime's Internet art project
robot-environment: research/hobby
modified-date: 10-13-96
modified-by: RR1563
-------- REPLY, End of original message --------