Re: defining "robot"

Brian Clark (bclark@radzone.org)
Mon, 25 Nov 96 16:59:31 -0500


-- [ From: Brian Clark * EMC.Ver #2.5.02 ] --

Erik wrote:

>
>m.koster@webcrawler.com (Martijn Koster) writes:
>
>> Of course, if all your sources are search-engines generated by robots
>> that do /robots.txt, you should never get a link that violates it >:-))
>
>
>You'd think that were true, wouldn't you? Seems folks on the
>list don't agree for some silly reason... :)

Well, I'd be one of those people that doesn't agree. My silly reason is the
fact that I can go to most of the search engines and find links to websites
that haven't existed in over a year, showing they aren't really aggressive
in removing dead links. Now, imagine the day after a search engine's spider
comes through indexing the webmaster changes their mind about spider's right
to access to a directory. How long before all of the search engines reflect
that change?

By assuming that the search engine's results are somehow "gospel" and
dismiss a spider maintainer from the necessity of following REP is just a
way of inheriting their errors -- unless you are just summarizing their
results. If you are going to send a spider out to a website (or any other
kind of robot) you *should* check for an appropriate robots.txt. You might
have higher philosophical ground to argue why you shouldn't than someone
like ActiveAgent, but with Alta Vista updating their database every two
months at best, you're taking a lot of control out of a webmaster's hands.

Brian

--

------------------------------------------------------------------ Brian Clark - President - bclark@radzone.org http://www.radzone.org/gmd/ ------------------------------------------------------------------ _________________________________________________ This messages was sent by the robots mailing list. To unsubscribe, send mail to robots-request@webcrawler.com with the word "unsubscribe" in the body. For more info see http://info.webcrawler.com/mak/projects/robots/robots.html