The overhead is peanuts compared to the network traffic that'll be saved
when robots... add a few bytes and save a few megabytes.
>Poorly
>written/run robots that do not follow REP now are not going to issue a
>'Robots:' HTTP header, either.
True, but this makes it easier to improve a rogue robot without the
need to have someone intelligent operating it.
>Additionally - it would increase abuse
>attempts by servers to 'lie to the robots just to improve my search
>position' by allowing servers to more easily serve something *different*
>to robots than it actually serves to people.
So it's a bad idea to add a system to limit abuse of servers just in
case robot using indexers get abused. Not a convincing argument for the
suffering people who's existence justifies robots in the first place.
>It happens now - but not as
>much because the protocal doesn't provide a *direct* way to identify all
>robots (they can, of course, key on the User-Agent or IP, but it requires
>more work on their part).
So you're happy to keep making it difficult (impractical)
to detect a robot from athe server side so that your
life as a robot owner is made easier. That's just selfish.
The robot owners can trade lists of sites that abuse their service
now and in the future. Those sites can be avoided as punishment.
>Lastly, the place to bring up HTTP protocal changes is not the robots list
>or the Apache list, but the IETF HTTP-WG. It might have interactions with
>the rest of HTTP that require working out.
As an apache developer, I can suggest whatever I want to the others. As
a member of this list I think I can do the same here. Roy Fielding and others
from the HTTP camp are on one or both of these lists anyway.
Probably pointless to suggest it to HTTP-WG if people like you are going
to be selfish and ignore it.
rob
-- Rob Hartill (robh@imdb.com) The Internet Movie Database (IMDb) http://www.imdb.com/ ...more movie info than you can poke a stick at.