A bad agent?

Nick Dearnaley (njd.autonomy@stjohns.co.uk)
Thu, 26 Sep 1996 18:09:26 +0000


This is a public reply to a message posted by Benjamin Franz
(18/09/96) - "Bad agent...A *very* bad agent".

Dear Mr Franz,

After reading your message about Autonomy, I think you may have a few
misconceptions about the software. Firstly, Autonomy is *not* a robot
as such. It does *not attempt to index sites*, or even read every
page on a site, but actually 'browses' much more like a human user.

Autonomy uses a neural network to read the text on a page, and
compares it with the concepts contained in its training text. It
evaluates the relevance of the page and then follows the link which
appears most relevant to its training. If it has linked to several
irrelevant pages in a row, then it backtracks to the last relevant
page and tries another path. This approach is much closer to how I
find information on the WWW, by finding a likely starting page and
then moving on via the most interesting link on the page. If I find
a boring page then I hit the back button and try another route.
Thus *avoiding infinite tree * problems. It also avoids loops and
rerequests.

This method gives it a default *self-limiting action *, which is by
relevance rather than site or quantity. Time and space limitations
are also set, which will limit the agent to a maximum time spent
searching and a maximum amount of data to be retrieved. The 'fire and
forget' mode that you refer to - Agent Kennels - actually provides the
functionality of the "general purpose robot that *caches high request
items*". While it is possible to send your Agent to search from a
Kennel server and disconnect from the Internet, the kennels cache
requests so that commonly sought pages are stored locally. The Kennel
service therefore provides a financial advantage to users (no phone
bills) while reducing the bandwidth requirements of the search.

Autonomy does identify itself to servers, but via the HTTP protocol
rather than as a robot. The reason for this is that Autonomy does not
function like a robot, and it was felt that it would be misleading if
it declared itself as one. Autonomy is much more like a user than a
robot in its approach to documents. As such it declares itself as the
User Agent of the owner, rather than a robot.

You were quite correct to state that the past available free beta of
Autonomy does not obey robots.txt files, and this was a flaw. The
current version in testing does comply fully with robots.txt. (so if
anyone would like to suggest good test subjects or sites, please mail
me). I appreciate the concern that this has caused, and we are aiming
to rectify this as soon as possible. Clearly we want the Internet to
remain the highly useful resource that it is and aim to make using it
easier not slower!

Yours,

Nick Dearnaley.
Developer, Autonomy Systems Ltd.