Re: Washington again !!!

Erik Selberg (selberg@cs.washington.edu)
19 Nov 1996 19:37:31 -0800


I just want to clarify one thing: we're talking about two different
issues with respect to the MetaCrawler. The first is rapid-firing;
this is clearly a bug, will be fixed, and the NETbot folks have
assured me that further safeguards will be in place so this doesn't
happen again. I'm absolutely NOT defending the lack of these
safeguards.

However, where I may disagree with Rob (and probably others on this
list) is if MetaCrawler should fall under the robots.txt standard. The
MetaCrawler does not run autonomously and suck up whatever it
finds. It simply verifies that pages are available and contain valid
data when instructed by the user, with the intent that users will then
be visiting some of those pages in due course. Therefore, it is
unclear to me if robots.txt is appropriate; if a person is able to get
to a page protected by robots.txt, shouldn't that person be able to run
an agent to determine if that page has good data BEFORE the agent
shows it to the user?

I suspect this issue should be addressed before too long; from what
some folks at MS tell me, IE 4.0 will have the "check favorites for
changes" option, and I suspect there are tons of other agents in
planning or on shelves that will be doing this same thing (I read
EchoSearch does the same thing as MetaCrawler in downloading pages,
but it does it ALL THE TIME, whereas with MetaCrawler it's a
user-enabled feature (enabled about 5% of the time)). Should
browser-assisting agents follow robots.txt? Should the RFC have some
stuff in it for them? Should there be an entirely different standard?
I can't say I know either way, and I can see arguments on both sides.

Other comments?

Cheers,
-Erik

-- 
				Erik Selberg
"I get by with a little help	selberg@cs.washington.edu
 from my friends."		http://www.cs.washington.edu/homes/selberg
_________________________________________________
This messages was sent by the robots mailing list. To unsubscribe, send mail
to robots-request@webcrawler.com with the word "unsubscribe" in the body.
For more info see http://info.webcrawler.com/mak/projects/robots/robots.html