> Maybe you should be able to request a "load level" from the server
> and if it is low enough then grab the pages.
As we discussed before, the response time (lag) for your request is a
good clue to server load. Although it depends on a broad range of
parameters (connection bandwith, number of hops etc.) it is the only way
to get informations about the server load I know. If you want to be
perfect, you can do a ping befor your actual request to try to get
information about the "net-delay".
My robot http://www.hotlist.de uses this strategy (without the ping). I
fetch a maximum of one document per server during a period of about 20 x
(time for the last fetch) + 300 sec.
That isn't too much.
To prevent net-load there is no other way than running your robots at
off-peak times (02:00 to 08:00) or slow down it at other times.
Here in germany there are not so many robots these times and my opinion
is: better to have a good search engine with extended features like
email inform of customers, when it gets new hits for a search-term than
to implement everybodys own (maybe browser built in) spider-agent.
Another interesting thing to do would be to try to collect information
about ones special interests or browsing-behavior to be able to give a
better service, but german "Datenschutz" laws prevent us from doing
this.
-- ------------------------------------------------------------------ Michael Göckel CyberCon Gesellschaft Michael@cybercon.technopark.gmd.de für neue Medien mbH Tel. 0 22 41 / 93 50 -0 Rathausallee 10 Fax: 0 22 41 / 93 50 -99 53757 St. Augustin www.cybercon.technopark.gmd.de Germany ------------------------------------------------------------------