However, where I may disagree with Rob (and probably others on this
list) is if MetaCrawler should fall under the robots.txt standard. The
MetaCrawler does not run autonomously and suck up whatever it
finds. It simply verifies that pages are available and contain valid
data when instructed by the user, with the intent that users will then
be visiting some of those pages in due course. Therefore, it is
unclear to me if robots.txt is appropriate; if a person is able to get
to a page protected by robots.txt, shouldn't that person be able to run
an agent to determine if that page has good data BEFORE the agent
shows it to the user?
I suspect this issue should be addressed before too long; from what
some folks at MS tell me, IE 4.0 will have the "check favorites for
changes" option, and I suspect there are tons of other agents in
planning or on shelves that will be doing this same thing (I read
EchoSearch does the same thing as MetaCrawler in downloading pages,
but it does it ALL THE TIME, whereas with MetaCrawler it's a
user-enabled feature (enabled about 5% of the time)). Should
browser-assisting agents follow robots.txt? Should the RFC have some
stuff in it for them? Should there be an entirely different standard?
I can't say I know either way, and I can see arguments on both sides.
Other comments?
Cheers,
-Erik
-- Erik Selberg "I get by with a little help selberg@cs.washington.edu from my friends." http://www.cs.washington.edu/homes/selberg _________________________________________________ This messages was sent by the robots mailing list. To unsubscribe, send mail to robots-request@webcrawler.com with the word "unsubscribe" in the body. For more info see http://info.webcrawler.com/mak/projects/robots/robots.html