crawling FTP sites

Greg Fenton (gregf@opentext.com)
Wed, 11 Sep 1996 11:22:15 -0400


I am working on a robot to crawl FTP sites. It seems to me that
crawling a single "URL" is not a friendly way of going about it.
The overhead of making a connection, getting a single piece of
information (or a single file), and closing the connection just
doesn't seem right.

FTP was designed to be interactive, with a single user traversing
directories and grabbing zero or more files. I would think that
an FTP-friendly robot would behave similarily (minus user abuses).

Which brings me to question:
What would be considered FTP-friendly behaviour for a robot?
How much information should a robot get at one time?

Discussion on all of the above is encouraged,
gregf.

-- 
Greg Fenton, Software Engineer           http://index.opentext.net/
    || Open Text Corporation           mailto:gregf@opentext.com
    || 180 Columbia Street West         Phone:(519) 888-7111 x261
    || Waterloo, Ontario N2L 3L3          Fax:(519) 888-0677