Re: depth first vs breadth first

steved@pmc02.pmc.philips.com
Thu, 29 Aug 1996 15:23:14 -0700 (PDT)


Robert Nicholson wrote:
>
>Hi, I was having a discussion with a work collegue about robots and
>implementing them in a depth or breadth first fashion.
>
>I seem to remember an early version of the "Guidelines for Robot Writers"
>page discouraging depth first and I'm wondering why.
>
>Would somebody please explain to me why it is "unfriendly" to implement
>a depth first traversal. I know this effects the stack locally but I don't
>quite appreciate how it effects the remote sites I'm visiting.
>
>NOTE: I don't intend to write a robot I'm just interested in the area :-)

Depth-first traversals hit the same site over and over again. It's
considered more-friendly to develop a list of sites you wish to visit and then
visit them a page-at-a-time in turn. That way, the load from your robot is
spread out over time.

Robots have been observed to take a server and render it helpless (and
useless) by issuing rapid-fire requests. The server spends so much time trying
to keep up with responses to that robot that it ceases to be able to serve
"real" clients.

Also, some sites are "infinite" in depth, in that they generate
content on the fly, and the path portion of the URL doesn't really refer to
filesystem entities but to database records or other data. A robot could get
trapped in a site and never get out if it traversed depth-first. By going
breadth-first, this won't be a problem, although the robot might not ever run
out of URLs to request from that site.

Hope that helps.

Steve

Steve DeJarnett Internet: steved@pmc.philips.com
Philips Multimedia Center PhoneNet: 415-846-4420
"The difference between fiction and reality is that fiction has to make sense."
-- Tom Clancy