Looking for a spider

Alain Desilets (alain@ai.iit.nrc.ca)
Wed, 18 Oct 95 14:31:39 EDT

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Leigh DeForest Dupee: "Re: Unfriendly robot at 205.177.10.2"
Previous message: Reinier Post: "Re: Unfriendly robot at 205.177.10.2"
Next in thread: Alvaro Monge: "Re: Looking for a spider"
Reply: Alvaro Monge: "Re: Looking for a spider"
Maybe reply: Xiaodong Zhang: "Re: Looking for a spider"
Maybe reply: Alain Desilets: "Re: Looking for a spider"
Maybe reply: Alain Desilets: "Re: Looking for a spider"
Maybe reply: Alain Desilets: "Re: Looking for a spider"
Maybe reply: Marilyn R Wulfekuhler: "Re: Looking for a spider"
Maybe reply: Alain Desilets: "Re: Looking for a spider"
Maybe reply: Marilyn R Wulfekuhler: "Re: Looking for a spider"
Maybe reply: Alain Desilets: "Re: Looking for a spider"
Maybe reply: Gene Essman : "Re: Looking for a spider"
Maybe reply: Nick Arnett: "Re: Looking for a spider"
Maybe reply: Ted Sullivan: "Re: Looking for a spider"
Maybe reply: drose@AZStarNet.com: "Re: Looking for a spider"
Maybe reply: Ted Sullivan: "Re: Looking for a spider"
Maybe reply: i.bromwich: "Re: Looking for a spider"

Dear spider developpers.

My name is Alain Desilets. I am a researcher in the Interactive
Information Group of the National Research Council of Canada.

We are a small group (6 people) developing tools for interactive
access to information. Our technological angle on this problem is AI
based approaches, in particular Machine Learning and Agents. You can
find more about our work at http://ai.iit.nrc.ca/II_public/.

In order to test our methods we need to acquire a large corpus of
full HTML files from the Web. We plan to use a spider for that task.

We are aware of the controversy surrounding the creation of new
spiders and therefore do not plan to develop one. That
would not only be a duplication of effort but would also introduce a
new, possibly buggy spider in Koster's already vast list of Web
critters. Instead, we would like to use a publically available, well
behaved and proven spider.

Is there such spider available for serious research purpose?

Or maybe the corpus we need already exists? Is there a CD-ROM or .zip
file that would give us the whole of the web in full HTML?

Thanks for your help.

Alain Desilets

Institute for Information Technology
National Research Concil of Canada
Building M-50
Montreal Road
Ottawa (Ont)
K1A 0R6

e-mail: alain@ai.iit.nrc.ca
Tel: (613) 990-2813
Fax: (613) 952-7151

Next message: Leigh DeForest Dupee: "Re: Unfriendly robot at 205.177.10.2"
Previous message: Reinier Post: "Re: Unfriendly robot at 205.177.10.2"
Next in thread: Alvaro Monge: "Re: Looking for a spider"
Reply: Alvaro Monge: "Re: Looking for a spider"
Maybe reply: Xiaodong Zhang: "Re: Looking for a spider"
Maybe reply: Alain Desilets: "Re: Looking for a spider"
Maybe reply: Alain Desilets: "Re: Looking for a spider"
Maybe reply: Alain Desilets: "Re: Looking for a spider"
Maybe reply: Marilyn R Wulfekuhler: "Re: Looking for a spider"
Maybe reply: Alain Desilets: "Re: Looking for a spider"
Maybe reply: Marilyn R Wulfekuhler: "Re: Looking for a spider"
Maybe reply: Alain Desilets: "Re: Looking for a spider"
Maybe reply: Gene Essman : "Re: Looking for a spider"
Maybe reply: Nick Arnett: "Re: Looking for a spider"
Maybe reply: Ted Sullivan: "Re: Looking for a spider"
Maybe reply: drose@AZStarNet.com: "Re: Looking for a spider"
Maybe reply: Ted Sullivan: "Re: Looking for a spider"
Maybe reply: i.bromwich: "Re: Looking for a spider"