Re: crawling FTP sites

James Black (black@eng.usf.edu)
Wed, 11 Sep 1996 19:34:56 -0400 (EDT)


Hello,

On Wed, 11 Sep 1996, Greg Fenton wrote:

> FTP was designed to be interactive, with a single user traversing
> directories and grabbing zero or more files. I would think that
> an FTP-friendly robot would behave similarily (minus user abuses).
>
> Which brings me to question:
> What would be considered FTP-friendly behaviour for a robot?
> How much information should a robot get at one time?

If you are going to do it this way, then you should look to see what the
local time is, and do it after midnight, so that it is more likely to be
slow, and make certain to read robot.txt in the top directory, to see that
they allow robots.
If you are not spending too long on there, you should be able to get
quite a bit. I have a program that gets about 5M of FAQs from mit
whenever I run it, so getting information shouldn't be a problem, just
consider the time that you are getting on the mchine.
Enjoy.

==========================================================================
James Black (Comp Sci/Elec Eng Senior)
e-mail: black@eng.usf.edu
http://www.eng.usf.edu/~black/index.html
"An idea that is not dangerous is unworthy of being called an idea at all."
Oscar Wilde
**************************************************************************