Re: depth first vs breadth first

David Eichmann (eichmann@ricis.cl.uh.edu)
Thu, 29 Aug 1996 23:22:14 -0800


-->Robert Nicholson wrote:
-->>
-->>Hi, I was having a discussion with a work collegue about robots and
-->>implementing them in a depth or breadth first fashion.
-->>
-->>I seem to remember an early version of the "Guidelines for Robot Writers"
-->>page discouraging depth first and I'm wondering why.
-->>
-->>Would somebody please explain to me why it is "unfriendly" to implement
-->>a depth first traversal. I know this effects the stack locally but I don't
-->>quite appreciate how it effects the remote sites I'm visiting.
-->>
-->>NOTE: I don't intend to write a robot I'm just interested in the area :-)
-->
--> Depth-first traversals hit the same site over and over again. It's
-->considered more-friendly to develop a list of sites you wish to visit
--and then
-->visit them a page-at-a-time in turn. That way, the load from your robot is
-->spread out over time.
-->
--> Robots have been observed to take a server and render it helpless (and
-->useless) by issuing rapid-fire requests. The server spends so much time
--trying
-->to keep up with responses to that robot that it ceases to be able to serve
-->"real" clients.
-->
--> Also, some sites are "infinite" in depth, in that they generate
-->content on the fly, and the path portion of the URL doesn't really refer to
-->filesystem entities but to database records or other data. A robot
--could get
-->trapped in a site and never get out if it traversed depth-first. By going
-->breadth-first, this won't be a problem, although the robot might not
--ever run
-->out of URLs to request from that site.
-->
--> Hope that helps.
-->
--> Steve
-->
-->Steve DeJarnett Internet: steved@pmc.philips.com
-->Philips Multimedia Center PhoneNet: 415-846-4420
-->"The difference between fiction and reality is that fiction has to make
--sense."
--> -- Tom Clancy

-----------
David Eichmann
RBSE Director of R & D Web: http://ricis.cl.uh.edu/eichmann/
Chair, Software Engineering Program Phone: (713) 283-3875
University of Houston - Clear Lake fax: (713) 283-3869
2700 Bay Area Blvd. Email: eichmann@rbse.jsc.nasa.gov
Houston, TX 77058 or: eichmann@cl.uh.edu
RBSE on the Web: http://rbse.jsc.nasa.gov/rbse/