>The problem at xxx.lanl.gov is dumb robots trying to generate
>gigababytes of postscript/pdf from physics archives for no useful
>purpose whatsoever. The real users are deprived of access when a robot
>goes nuts.
>
Good paragraph Rob. NOW, for the first time, you explained the problem for
the rest of us. This is a great first step.
>If all robot writers followed the guidelines and used common sense
>there would be no problem.
So far, I for one agree with you. I do recognize this fact.
> Some people on this list refuse to accept
>this simple fact.
>
I'm not one of them.
What I do not recognize is that you can reasonably expect that every
robot writer WILL follow the guidelines, just because a few of us
decided that they should. THIS is where we differ.
Therefore, I understand your problem, and recognize the simple
fact that you mention above, but I do not agree with your solution
to the problem.
Now here is another simple fact that I agree with:
>Behave responsibly and the problems stop.
Let's apply this second simple fact to xxx.lanl.gov, so that xxx.lanl.gov
will act responsibly, and just as you say, I think
that the problem will stop. Here is my suggestion on how:
1. Take out all threats of retaliation from your site,
They do nothing to stop the problem, since the robots
that cause it do not read it nor understand these threats,
and quite a few humans that do understand it seem to
be offended by it. Your objective is not to be righteous,
nor to offend anyone. It is to insure that your site will
operate smoothly and serve its legitimate users the best
way you can.
2. Take out any seek-and-destroy terrorism from your site.
Replace it with a trigger link(s) that will mail the
"From:" Request-Header's value in the Request
(if it is set by the robot) a NICE message
in which you explain the problem, just as you did in the above
paragraph, and ask them POLITELY to PLEASE comply with
REP which will avoid the problem in the future. Give them
the reference to REP, just in case they do not know about it.
mail a maximum of one such message per day to any one address.
Compliance with REP is VOLUNTARY so nastiness will get you nowhere.
You need the robot writer's cooperation and you will get
better cooperation without nastiness. Even if you personally
believe that all these "@%*& dumb ignorant robot writers deserve
all the grief for causing you all these headaches", resist
the temptation, keep this to yourself, and out of the message.
3. If the From: field is not set by the robot, mail the
same NICE nonthreatening message to root@IP-address generating
the request, Postmaster@IP-Address, and Webmaster@IP-Address.
4. Refuse connection for xx days (my suggestion xx=1 ) from
the IP address that continues to offend after you warned them in
the above manner. If you wish and you think it is worth the trouble
double xx after each new offense from the same site.
5. Refuse connection to any IP-address for one day, every time that
IP-address generates more than zz Get's within an hour, whatever zz
is deemed reasonable by you. Remember, some IP-addresses are
proxies to many physicists, so if you set zz too low you will
hurt the legitimate users that you want to serve.
6. Post a human-readable NICE warning in a prominent place
on the top of the http://xxx.lanl.gov/ page, something like
"We are sorry, but due to the large overhead of generating
pages we can serve no more than zz requests per hour to any
one site." It seems that zz = 30 is too much for you,
so set it to a lower figure. Whatever is reasonable.
Act like the above, and the problem will disappear, you can stop
having to defend yourself in this forum, we can go and start
discussing another thread, no one gets offended,
and you kept control of your server no matter what requests are
generated by robots or humans.
--Steve Simon