Re: nastygram from xxx.lanl.gov

Steve Jones (stevej@gensw.com)
Wed, 10 Jul 1996 16:25:42 -0700


I know this appears to be a heated discussion and I have no
desire to be thrown into the fire, but the original poster
did ask the opinions of the group. Maybe these observations
would provide some compromise? There are going to be extremists
at both ends of the scale, and we are going to have to share
the same net together for some time to come.

Maybe it isn't the current definition, but it would seem, to
accommodate sites such as xxx.lanl.gov, that any automation
that might be in a position to make a decision about how its
behavior might be socially (netwise) acceptable, whether it be
a robot, browser, or anything else that doesn't have a human
brain, should consult robots.txt. Robots.txt really is the
answer for anything that isn't capable of making a truly informed
decision, such as a human might make.

If a resource is presented by an organization as a part of
the PUBLIC world wide web, that it is participating in a larger
whole, that is not governed by that participating individual.
Just as sending email to this group invites people who might
not agree with me to send their responses (good and bad), the
mere presentation of a URL space invites the rest of the web
to interact with it. If LLL's aim is to have a peaceful coexistence
with the rest of the web, it needs to adopt the generally-accepted
policies defined by this group on robots.txt. Otherwise, the
rest of society is going to appear deviant to it, but in reality
the deviance is theirs. In any case, retaliation is not acceptable
social behavior, and the rest of the web should not permit it.

Especially unacceptable is automatically generated retaliation,
as I gather that was what they generated. I also observe that any
resource on the web should by its very nature be robust enough
to handle automated requests without causing damage. Sure, it is
antisocial to not abide by the robots.txt convention. However,
causing unacceptable loading due to forks, etc, is somewhat of
an internal problem. If the risk of damage to their system or
system's performance is so severe when a HEAD is requested every
2 minutes, can't they make the decision to not publish those
areas to the world, but to their user community instead?

Steve Jones
General Software, Inc.

At 11:17 PM 7/9/96 -0700, you wrote:
>Rob Hartill writes...
>> bolav@skiinfo.no wrote:
>>
>> >> 459 HEAD requests produce 459 server child-forks. Many of the requests
>> >> will operate in papallel. If the server has an upper limit in forking
maybe
>> >> the site will be saved, else the paging volume will be full.
>> >
>> >With a two minute delay between each HEAD?
>> >I don't think so.
>>
>> Think again.
>>
>> At 2 minute intervals xxx.lanl.gov can become overloaded.
>>
>> YOU DON'T KNOW WHAT THESE URLs DO. DON'T MAKE IGNORANT ASSUMPTIONS.
>>
>> The HEAD requests at 2 minute intervals to xxx.lanl.gov cause scripts
>> to run which take more than 2 minutes to finish. Pile them up and the
>> thing melts down.