BOUNCE robots: Admin request

((no email))
Wed, 8 Nov 95 10:08:51 -0800


>From tbray@opentext.com Wed Nov 8 10:08:46 1995
Return-Path: <tbray@opentext.com>
Received: from giant.mindlink.net by webcrawler.com (NX5.67f2/NX3.0M)
id AA19311; Wed, 8 Nov 95 10:08:46 -0800
Received: from Default by giant.mindlink.net with smtp
(Smail3.1.28.1 #5) id m0tDEv9-000343C; Wed, 8 Nov 95 10:08 PST
Message-Id: <m0tDEv9-000343C@giant.mindlink.net>
Date: Wed, 8 Nov 95 10:08 PST
X-Sender: a07893@giant.mindlink.net
X-Mailer: Windows Eudora Pro Version 2.1.2
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
To: robots@webcrawler.com
From: Tim Bray <tbray@opentext.com>
Subject: Re: Preliminary robot.faq (Please Send Questions or Comments)
Cc: robots@webcrawler.com

We're wasting too much time on this. All I meant to say was that the
original language strongly suggested that robots use the following
algorithm:

sub RetrievePage(url)
text = HttpGet(url)
foreach sub_url in text
RetrievePage(sub_url)

Whereas lots of robots don't. Obviously it is recursive in that you
do pull urls out of pages and eventually follow them, but it doesn't
feel recursive. The 'fuzzy' stuff is a complete red herring - except
for the special case of 'fuzzy logic' (not what's being done here) the
word 'fuzzy' in the information retrieval context is a marketing term
without semantic content.

Cheers, Tim Bray, Open Text Corporation (tbray@opentext.com)