New robot turned loose on an unsuspecting public... and a DNS question

Skip Montanaro (skip@automatrix.com)
Thu, 30 Nov 1995 16:15:21 -0500


No, it's not really another Godzilla movie. I started running the Musi-Cal
Robot today. It has the following properties:

1. Understands (and obeys!) the robots.txt protocol.

2. Doesn't revisit the same server more than once every 10 minutes.

3. Doesn't revisit the same URL more than once per month.

4. Only groks HTTP URLs at the moment.

5. Announces itself in requests as "Musi-Cal-Robot/0.1".

6. Gives my email ("skip@calendar.com") in the From: field of the
request.

7. It's looking for music-related sites, so you may never see it.

8. The HTML parser I'm using is rather slow, which helps avoid
network congestion.

9. You should only ever see it running from dolphin.automatrix.com,
a machine connected via 28.8k modem - again, a fine
network/server congestion avoidance tool.

10. It randomizes its list of outstanding URLs after every pass
through the list to minimize beating up a single server.

If there's anything I've forgotten to do (like announce it somewhere on
Usenet) or any parameter needs obvious tweaking, let me know.

I have been struggling with DNS resolution and was wondering if people could
give me some feedback. Ideally, I want to make sure I treat all aliases for
a server as the same server, so I was attempting to execute

gethostbyaddr(gethostbyname('www.wherever.com'))

but that seemed terribly slow and tcpdump traces suggested that it would get
stuck banging on the same server. Then I tried just the gethostbyname(),
but that wasn't much better. For now, I just accept what I have for a host
name and map a couple places I know that do round-robin DNS back into the
canonical name.

What do other robot writers do about name resolution? Feedback appreciated.

Thanks,

Skip Montanaro skip@calendar.com (518)372-5583
Musi-Cal: http://www.calendar.com/concerts/ or mailto:concerts@calendar.com
Internet Conference Calendar: http://www.calendar.com/conferences/
>>> ZLDF: http://www.netresponse.com/zldf <<<