One other thing which may / may not be part of robots.txt is a concept
of an access rate. One of the reasons behind robots.txt is to prevent
robots from overwhelming a site. One idea Denis mentioned:
> Can I suggest this:
>
> use a draft format similar to /robots.txt
>
> if it seems widely agreeable, perhaps it would fold into /robots.txt
>
> on a trial basis, call the file /demand.txt or /load.txt or some such.
>
> include:
>
> Limit: <requests> / <time>
> Factor: <integer>
>
> which can be read as "if you expect to make more than <requests>
> during <time>, you should wait <integer> * <response time> after
> each request" with the expectation that any agent fetching /load.txt
> would measure the response time for that request, and would fetch it if
> it was going to make more than - what - 10 requests? (20, 50?)
>
> A small site could do:
>
> Limit: 15 / 1 m
> Factor: 10
>
> And a large site could do:
>
> Limit: 1000 / 1 h
> Factor: 3
My idea would be do define a Self-Tuning Standard, which could be used
instead of or in conjunction with this. The general method is that a
robot adjusts the Request rate R (req / min) as the Transfer rate T
(bytes / sec) changes. I.e. as T goes down, R goes down. There would
be some set limits on how large R could get and some initial defaults,
which could either be set in a standard (yikes!) or by the server.
For example:
R = T / 1000; (e.g. 1 req / minute for every K per sec)
Rmax = 60 (1 per second)
This way, robots could use the server to its full potential on off
hours, and would keep away accordingly when the server is being taxed
by other entities.
Ideas here?
Cheers,
-Erik
-- Erik Selberg "I get by with a little help selberg@cs.washington.edu from my friends." http://www.cs.washington.edu/homes/selberg _________________________________________________ This messages was sent by the robots mailing list. To unsubscribe, send mail to robots-request@webcrawler.com with the word "unsubscribe" in the body. For more info see http://info.webcrawler.com/mak/projects/robots/robots.html