However, I don't have an easy way to determine if a response is made before
firing off a new one, except setting the time between hits to be large.
This is because of the way my robot is distributed. Do people think this
is reasonable?
I was also thinking that something we all might collaborate on would be a
list of sites that as a robot you shouldn't index, or should only index the
top levels. This might save the net a lot of bandwidth, and everyone a lot
of hassle. We could develop a database to hold this information. Besides
being useful to us, it might provide a good forum to show site designers
why they should or shouldn't do certain things with real examples.
I am using the data to do clustering and some economic models of the web.
I'll send mail to this list when I have my query engine up.
Sorry for any problems,
-Larry
>FYI, the following robot
>
>huron.stanford.edu backrub@pcd.stanford.edu:BackRub/0.5
> and
>grand.stanford.edu backrub@pcd.stanford.edu:BackRub/0.5
>
>is hitting a site once a second and isn't waiting for responses
>before firing off new requests. The owner has been notified.
>