Theo Van Dinter wrote:
> Sounds like a good concept, but what about the people who don't believe in
> relative URLs?
Let me rephrase it then: 'Most commercial engines will only follow links
that share the same BASE URL as the current URL.'
A good robot program would deduce that if it's vistiting
http://www.foo.com and it encounters <A
HREF="http://www.foo.com/foo.html"> vs. <A HREF="foo.html"> that it's
the same thing and that it's still a link to the same site.
Granted you could write a program to drill down through other links you
find. You could say put a "depth limit" as to how far you would go (to
avoid the Yahoo problem). But the idea is to only go where you are
invited. So if the robot is welcome at other sites found it would
already have them in its database. In the original case, Lycos probably
isn't stopping by at http://www.rfi.fr because it wasn't invited.
-- ================================================================== Mitch Allen WebDroid http://www.webdroid.com P.O. Box 6569 mailto:mitch@webdroid.com Boston, MA. 02114 _________________________________________________ This messages was sent by the robots mailing list. To unsubscribe, send mail to robots-request@webcrawler.com with the word "unsubscribe" in the body. For more info see http://info.webcrawler.com/mak/projects/robots/robots.html