Re: Broadness of Robots.txt (Re: Washington again !!!)

Erik Selberg (selberg@cs.washington.edu)
20 Nov 1996 13:25:53 -0800

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Klaus Johannes Rusch: "Re: User-Agent"
Previous message: HipCrime: "robots.txt"
In reply to: Brian Clark: "Broadness of Robots.txt (Re: Washington again !!!)"
Next in thread: Brian Clark: "Re: Broadness of Robots.txt (Re: Washington again !!!)"

[paraphrase of mail to Denis McKeon directly; thought I was sending to the
list]

One other thing which may / may not be part of robots.txt is a concept
of an access rate. One of the reasons behind robots.txt is to prevent
robots from overwhelming a site. One idea Denis mentioned:

> Can I suggest this:
>
> use a draft format similar to /robots.txt
>
> if it seems widely agreeable, perhaps it would fold into /robots.txt
>
> on a trial basis, call the file /demand.txt or /load.txt or some such.
>
> include:
>
> Limit: <requests> / <time>
> Factor: <integer>
>
> which can be read as "if you expect to make more than <requests>
> during <time>, you should wait <integer> * <response time> after
> each request" with the expectation that any agent fetching /load.txt
> would measure the response time for that request, and would fetch it if
> it was going to make more than - what - 10 requests? (20, 50?)
>
> A small site could do:
>
> Limit: 15 / 1 m
> Factor: 10
>
> And a large site could do:
>
> Limit: 1000 / 1 h
> Factor: 3

My idea would be do define a Self-Tuning Standard, which could be used
instead of or in conjunction with this. The general method is that a
robot adjusts the Request rate R (req / min) as the Transfer rate T
(bytes / sec) changes. I.e. as T goes down, R goes down. There would
be some set limits on how large R could get and some initial defaults,
which could either be set in a standard (yikes!) or by the server.

For example:

R = T / 1000; (e.g. 1 req / minute for every K per sec)
Rmax = 60 (1 per second)

This way, robots could use the server to its full potential on off
hours, and would keep away accordingly when the server is being taxed
by other entities.

Ideas here?

Cheers,
-Erik

-- 
				Erik Selberg
"I get by with a little help	selberg@cs.washington.edu
 from my friends."		http://www.cs.washington.edu/homes/selberg
_________________________________________________
This messages was sent by the robots mailing list. To unsubscribe, send mail
to robots-request@webcrawler.com with the word "unsubscribe" in the body.
For more info see http://info.webcrawler.com/mak/projects/robots/robots.html

Next message: Klaus Johannes Rusch: "Re: User-Agent"
Previous message: HipCrime: "robots.txt"
In reply to: Brian Clark: "Broadness of Robots.txt (Re: Washington again !!!)"
Next in thread: Brian Clark: "Re: Broadness of Robots.txt (Re: Washington again !!!)"