>> I was also thinking that something we all might collaborate on would be a
>> list of sites that as a robot you shouldn't index, or should only index the
>> top levels.
What you shouldn't index depends on the type of robot and the intentions
of its operator. How could you (or anyone) decide for them?
Specifically, what application are you thinking of?
>> This might save the net a lot of bandwidth, and everyone a lot
>> of hassle. We could develop a database to hold this information.
There is a scalability and maintenance issue there.
>I'd also be interested in making a list of sites that you could index as part
>of the testing process. With more and more robot developers, it could be a
>great resource to have a page of URLsof people with servers they don't mind
>getting hit a lot with robot traffic, or which have special traps or other
>devices setup on them for testing purposes.
That sounds cool, though it might be even better to publish source
for such sites so people can set it up locally rather than go over
the Net...
-- Martijn
Email: m.koster@webcrawler.com
WWW: http://info.webcrawler.com/mak/mak.html