> Out of approximately 160,000 discrete sites visited by fido (the robot
>for PlanetSearch), roughly 1.5% of the sites have a robots.txt
>file.
Interesting. Not surprising though.
> My feeling is that until the creation and maintenance of
>robots.txt files is automated, only people who truly understand the
>implications of robots (or whose sites are pounded mercilessly by
>ill-behaved robots) will use them.
More important than automation (I'd say) would be to make more
server maintainers aware of robots.txt and its use.
The current talk about regexp support sounds like an attempt to build
a giant sledgehammer when next to nobody knows nuts are worth cracking.
> If Netscape, Apache, etc. provided a simple-to-use tool for
>restricting access, etc., then used that to generate a robots.txt file,
>I expect we'd see more of them in use, and more that are syntactically
If you (anyone out there) can come up with an example robots.txt that
you'd like to see distributed with Apache then I'll see what can be
done about adding it. Perhaps some commented out examples plus a default
example to deny access to "/cgi-bin/" and a pointer to the online docs.
?
-- Rob Hartill. Internet Movie Database Ltd. http://www.imdb.com/ _________________________________________________ This messages was sent by the robots mailing list. To unsubscribe, send mail to robots-request@webcrawler.com with the word "unsubscribe" in the body. For more info see http://info.webcrawler.com/mak/projects/robots/robots.html