Stupid robots cache DNS and not IMS

Roy T. Fielding (fielding@liege.ICS.UCI.EDU)
Sat, 13 Jul 1996 13:15:29 -0700


Are people out there remembering not to cache DNS results forever?
How about using If-Modified-Since for repeated retrievals?

There are a number of robots that continue to index our old server
machine, even 6 months after the www CNAME was moved to a different
machine. These robots are coming from

i0.inktomi.com (the worst -- retrieves URLs at 500/day
i1.inktomi.com over and over and over again, even though
i15.inktomi.com they haven't changed in TWO YEARS).

beastie.atext.com (not many requests per day, but it comes back
crawl1.atext.com every day -- requested /robots.txt 83 times --
crawl4.atext.com fails to understand that 404 means Not Found).
crimpshrine.atext.com
lolita.atext.com

galileo.mckinley.com
doxodox.mckinley.com
doppler.mckinley.com

204.162.98.27 (a particularly buggy robot)
204.162.98.39 (a particularly buggy robot)
204.162.98.47 (a particularly buggy robot)
204.62.245.169

strike.infoseek.com (Apparently has a lousy URL canonicalizer:
homer.infoseek.com 50% redirected URLs)
piglet.infoseek.com
homer-bbn.infoseek.com

b.mv.opentext.com (Retrieves /robots.txt every day, even when
dialup-b.mv.opentext.com it doesn't make any other requests)
demonet.opentext.com
ip026.opentext.com

scooter.pa-x.dec.com
backfire.ultraseek.com (way too fast)
jazz.stanford.edu (only retrieves /robots.txt -- I suppose it
wants to index our disallowed parts)

Keep in mind that all of the above are being done to a server which
has no current links to it -- hasn't had any since February.

And apparently only backfire.ultraseek.com has the sense to use
Conditional GET requests! You know, that HTTP header field called
If-Modified-Since. Yeah, that mechanism I added to the protocol in
early 1994 so that you only retrieve a file if it has changed, thus
saving the Internet for more useful pursuits.

Shit, you guys should be embarrassed -- I wouldn't even let code
like that be used in beta tests.

...Roy T. Fielding
Department of Information & Computer Science (fielding@ics.uci.edu)
University of California, Irvine, CA 92697-3425 fax:+1(714)824-4056
http://www.ics.uci.edu/~fielding/