>What I am proposing is that we re-evaluate the reasoning behind
>robots.txt. The proposals I have seen in this list seem to rely on the
>assumption that robots.txt is enforcible when it is quite clearly not.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>These servers are going to have to learn how to
>prioritise different types of clients based on previous access patterns.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If there was an uniform way to identify a robot/agent then enforcing
robots.txt is made much easier.
:-)
What doesn't help is having (yesterday) to add "spYder" to a list
of USER-AGENTs that can't be trusted to follow robots.txt, even though it
was designed to.
-=-=-=
other rants...
"CyberSpyder" turned out to be a "content suitability" checker that blew
up on HTTP/1.1 responses.
"NetJet" is going to be a big problem in the future if it isn't forced to
behave. If you have a popular server, chances are it's already being spammed
from all over the world by NetJet users who have no idea that there are
a thousnad other users thinking exactly the same "I'll grab everything
while I sleep and refresh all those pages I didn't get around the reading
yesterday too".
There was an outcry when Netscape first released the concurrent download
feature (which is incredibly inefficient for HTTP/1.1 servers supporting
multiple requests per connection). NetJet and other "accelerators" are
far far worse.
Offline browsing is one thing, providing Joe Average with a tool that
lets him download tens of thousands of URLs from a single site while
he sleeps at night or mows the lawn is a recipe for disaster. Each Joe
Average thinks he's doing the net a favour because he believes the lies
that he's been fed from the folks who sold him his net-death-accelerator.
rob
_________________________________________________
This messages was sent by the robots mailing list. To unsubscribe, send mail
to robots-request@webcrawler.com with the word "unsubscribe" in the body.
For more info see http://info.webcrawler.com/mak/projects/robots/robots.html