Harvest-like use of spiders

Fred Melssen (MELSSEN@AZNVX1.AZN.NL)
Wed, 03 Jul 1996 18:07:12 +0100 (MET)


We are planning to start a Harvest system to collect and index a limited
set of URL's, of relevance to our areas of interest. This limited set
will consist of approx. 100-200 URL's, in the area of 'Indigenous
Knowledge'.

As the Harvest system will put a considerable strain on our network-
infrastructure (even to the point of abandoning it), I have to think
about alternatives.

An alternative could be to use an existing robot (Altavista) and
limit the set of URL's a priori (in Altavista's case, using the
url:site field). In other words: I have to limit the searched
database to a set of defined URL's. This way, I can make this
limitation within the search-form. For example (Altavista):

q= keyword & (url:site1 & url:site2 & url:site3)

The drawback is I have to make a cgi-script which creates an URL
(for method=get) or form (for method=post), to submit to an existing
robot. I think Altavista is a good choice for this method.

My questions: What are the pre's and contra's using this method?
Has someone using this method before? Is the method reliable
enough because I have to presume the URL's I want to index have
indeed recursively been index by another robot. In other words:
What is the 'coverage area' of the method and how can I measure
it?

And - of course - is there a policy for creating my own form
for using other robots?

Thank you,

-fred

Indigenous Knowledge Home Page
CIRAN/Nuffic
http://www.nufficcs.nl/ciran/
------------------------------------------------------------------------
Fred Melssen | Manager Electronic Information Services
P.O.Box 9104 | Centre for Pacific Studies | Phone and fax:
6500 HE Nijmegen | University of Nijmegen | 31-024-378.3666 (home)
The Netherlands | Email: melssen@aznvx1.azn.nl | 31-024-361.1945 (fax)
| http://www.kun.nl/~melssen | PGP key available
------------------------------------------------------------------------