Re: Safe Methods

Benjamin Franz (snowhare@netimages.com)
Thu, 18 Jul 1996 13:26:39 -0700 (PDT)


On Thu, 18 Jul 1996, Rob Hartill wrote:

> 'Benjamin Franz' wrote
>
> >Safe in this context means that submitting GET or HEAD requests should not
> >change the state of the server for future requests. IOW - they should
> >not have 'side effects'.
>
> Searches don't affect the state of the server, but it has been argued
> by some on this list that searches can be unsafe for robot access.
>
> >> The only reason POST is "safe" at the moment is that the bloody-minded and
> >> clueless section of the robot community are too incompetent to work out
> >> how to correctly handle POST. I'm pretty sure those morons would be doing
> >> POST too if they could figure it out.
>
> > And you are scarcely winning friends among the group of people
> >*you* are asking favors from by insulting them.
>
> Anyone offended by my comment above must be categorizing themselves
> as either "bloody-minded" or "clueless"... the comment clearly refers to
> them and not the majority of the robot community.

Fair enough.

> [irrelevant critique of my HTML deleted]
>
> I started this reading this list thinking there were two types of
> robot owner. The resonsibible type and the type going through the
> learning process and moving towards responsibility as they learn the
> ropes. I now see that there's a 3rd type.. those that know about
> guidelines, exclusion protocols, and the problems that robots *might*
> cause to servers, but they've chosen to attempt to bend the 'rules'
> to serve their own purpose.
>
> Talking to this 3rd group is pointless, and I'm going to resist any
> further flame-bate from these people. Irresponsible actions from those
> who *should* know better will continue to give the robot community a
> bad name. I know some of you are doing a good job, it's a shame you have
> to put up with the others.

Of course there is also a fourth group - those who assume that everyone
who disagrees with them regarding robots is an irresponsible robot owner
instead of, say, a site admin with an interest in robots from the
standpoint of optimizing their sites' interactions with robots.
Guess what.

You like to bluster and insult people, and make demands of a lot of other
people for your own convenience without bothering to do your homework
first - but seem remarkably unwilling to concede that you may have to
change some of your own behaviors to co-exist with robots comfortably. The
HTML critique was a way of saying 'You've got a mighty big beam in your
own eye to be making so much noise obout the motes in everyone elses.'

REP is a *voluntary* standard. Any proposal for robots identifying
themselves will be voluntary. Your site *WILL* be hit by robots that fail
to respect robots.txt. It is simple common sense that if your
configuration can't deal with the occasional rogue robot you should change
the site to handle it. This is not a moral issue - it is simple
practicality if they are that much of a problem for your site.

An example of this kind of issue is a site I built for a customer that
prominently stars nude women. Some sites started linking directly to the
handful of publically available images. This added about 40% to the system
load to absolutely no benefit for the site since there was no hyperlink to
the text of the site when they did this. My solution was to create a
script that moves the images every night and add a line to the server
config to issue redirects to the site home page when non-existent URLs
were requested. Now these previously burdensome external links became free
advertising. You don't argue with the wind - you build windmills. You
*can't* stop robots from trying to traverse your site. You *can* adapt
your site to co-exist with them - and even exploit them.

You are preaching to the choir here about obeying REP for web traversing
bots. AFAIK, no one here has a vested interest in web traversing bots that
*don't* obey REP. Myself and others here made suggestions on how you could
protect yourself better and even improve your site's accessiblity from the
web. You have rejected them, apparently in the belief that somehow the
people here can 'kiss and make better' the rogue robot problem without
your having to make changes in how your site operates. We can't. You
won't. What else is left to say?

-- 
Benjamin Franz
"There are two groups of people in the world.
 Those who divide people into two groups, and those who don't."