Depth-first traversals hit the same site over and over again. It's
considered more-friendly to develop a list of sites you wish to visit and then
visit them a page-at-a-time in turn. That way, the load from your robot is
spread out over time.
Robots have been observed to take a server and render it helpless (and
useless) by issuing rapid-fire requests. The server spends so much time trying
to keep up with responses to that robot that it ceases to be able to serve
"real" clients.
Also, some sites are "infinite" in depth, in that they generate
content on the fly, and the path portion of the URL doesn't really refer to
filesystem entities but to database records or other data. A robot could get
trapped in a site and never get out if it traversed depth-first. By going
breadth-first, this won't be a problem, although the robot might not ever run
out of URLs to request from that site.
Hope that helps.
Steve
Steve DeJarnett Internet: steved@pmc.philips.com
Philips Multimedia Center PhoneNet: 415-846-4420
"The difference between fiction and reality is that fiction has to make sense."
-- Tom Clancy