Re: How long to cache robots.txt for?

Micah A. Williams (micah@sequent.uncfsu.edu)
Tue, 9 Jul 96 22:53:33 EDT

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Rob Hartill: "Re: nastygram from xxx.lanl.gov"
Previous message: Paul Francis: "RE: nastygram from xxx.lanl.gov"
In reply to: Aaron Nabil: "How long to cache robots.txt for?"
Next in thread: Jaakko Hyvatti: "Re: How long to cache robots.txt for?"
Reply: Jaakko Hyvatti: "Re: How long to cache robots.txt for?"

In the words of Aaron Nabil,
>
> Well, I've already touched on the implications of ignoring robots.txt,
> even when you aren't a robot. :(
>
> I also have a real honest-to-goodness robot running. It does obey
> robots.txt. Currently, after transfering it once, it never transfers
> it again. That's one extreme of caching robots.txt.
>
> The other extreme would be no caching, and testing it before GETting
> each URL. A little less extreme would be doing a get-if-modified-since
> on robots.txt before each transfer.
>
> My next robot implementation is going to cache robots.txt for a fixed
> period, say 1 week. Does this sound reasonable?

The spider I wrote (Pioneer) refreshes /robots.txt files for each new
execution of the robot. During its run it will try to
retrieve a robots.txt file for each new host it encounters. Then,
that policy file, if it exists, will be considered active for
that particular site until the robot shuts down.

This works fine for me because the longest period time I've
ever run the robot without interruptions is 10 hours.

If you're gonna rev up the spider then go for a vacation,
then perhaps another method is in order.

-Micah

-- 
============================================================================
Micah A. Williams | Computer Science | Fayetteville State University	
micah@sequent.uncfsu.edu | http://sequent.uncfsu.edu/~micah/ 
Bjork WebPage: http://sequent.uncfsu.edu/~micah/bjork.html
Though we may not realize it, we all, in some capacity, work for Keyser Soze. 
============================================================================

Next message: Rob Hartill: "Re: nastygram from xxx.lanl.gov"
Previous message: Paul Francis: "RE: nastygram from xxx.lanl.gov"
In reply to: Aaron Nabil: "How long to cache robots.txt for?"
Next in thread: Jaakko Hyvatti: "Re: How long to cache robots.txt for?"
Reply: Jaakko Hyvatti: "Re: How long to cache robots.txt for?"