Re: New robot turned loose on an unsuspecting public... and a DNS question

Thomas Maslen (tmaslen@Verity.COM)
Thu, 30 Nov 1995 17:40:32 -0800


> What do other robot writers do about name resolution?

In our case... cache the results of lookups so that we only do the
gethostbyname("foo") once for any particular "foo". This still gives pretty
evil behaviour on, say, a page of links to cool places where almost every
link points to a different host, but the average behaviour is much better
than not caching.

Also, if you're looking for a canonical representation for hosts so that you
can test "is this host the same as that one?", I'd suggest that you _not_
try matching the hostnames: rather, do the gethostbyaddr() and then look
for an intersection in the sets of IP addresses (but be prepared to rewrite
the code next year to deal with IPv6 addresses!). In other words, the
canonical representation for a host should be the set of IP addresses, not
the hostname strings.

Thomas
tmaslen@verity.com My opinions, not Verity's