Re: Cache Filler

Ian Graham (ianweb@smaug.java.utoronto.ca)
Fri, 29 Nov 1996 14:57:07 -0500 (EST)

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Ian Graham: "Re: an article... (was: Re: Standard?)"
Previous message: Benjamin Franz: "[...]Re: Cache Filler"

I believe this is overkill. In general, a caching server will fill with
comonly-accessed pages, subject to the requests of the actual users.
After some initial time (which is typically rather short) there is no
difference between your approach and a regular caching server. The
advantage of the regular approach is that you don't have to define
the "most popular" list of pages -- the cache will naturally be
filled by them.

Ian

--
Ian Graham ........................................ ian.graham@utoronto.ca
Information Commons                                      Tel: 416-978-4548
University of Toronto                                    Fax: 416-978-7705



> 
> Allow me to introduce myself,
> 
> I am a programmer at a London(England) based internet service company. We
> have been using caching http servers (Apache) for quite some time now but
> thought it would be nice to be able to prime the cache with certain sites,
> either because we know that users often visit them or because we know of a
> new popular site that has just opened that we would like our users to have
> immediate access to, or indeed to prime and take care of a web cache that
> is to be used as a proxy for other, subsidiary caches...
> 
> So I am currently developing just such a
> program/robot/agent/crawler/spider/ant/worm/<favourite term here>.
> 
> Why?
> 
> Well because I haven't heard of anything that just goes around filling a
> cache machine. Plus it will hopefully cut down on bandwidth being used by
> our users if we can cache *once* a lot of stuff from popular sites, which
> is the whole reason for this : cut down on number of hits to outside
> servers, provide customers with quicker access times. 
> 
> Yes ok, given enough time the cache will fill up on its own but as I said
> this is to be used mainly for *priming* a cache machine, and adding new
> sites that become popular.
> 
> In addition this is to be a Webmaster's tool down here, not for
> direct use by subscribers. This means that users won't be able to send it
> off to gather porn, (enough trouble expiring news without caching it all
> from the web!)  :P
> 
> How?
> 
> I'm using C++, because I like it, I can use lots of existing socket
> libraries and string manipulation classes. Hey it makes life easier and I
> can worry about reading robots.txt instead...
> 
> Programming isn't the problem though, features are. If there are any
> simple features that could be added to a new crawler for use by a wider
> community I would be happy to read proposals from any and all sources.
> 
> When?
> 
> Soon, I am working on now, I will test locally since we have a plethora of
> machines to mess around with, and then I will approach friends who manage
> sites to see if I can hit theirs for testing.
> 
> This means that you shoudln't see it in any access logs until it has been
> tested locally and on some cooperating outside systems. However if you do
> see it in your logs and you haven't been approached by me regarding
> testing PLEASE TELL ME! It will be using a User-Agent: field as follows;
> 
>   User-Agent: Snarf/v0.0-pre-alpha
> 
> Well it probably will...
> 
> Other things...
> 
> Well I have read a lot of the archived stuff on this group, and consumed
> Martijn Koster's pages. I expect to conform to robots.txt, deal with
> relative links including the '.' and '..' directories, use raw IP
> addresses to index previously visited servers to get around aliasing,
> possibly limit depth of searches(although this depends on the site), and
> all that other stuff to make it a 'nice' wobot...
> 
> I'll be on this group from now on to catch any other ideas or proposals
> for 'bots and if they apply I guess I'll try to stick to it.
> 
> Apart from that I'll accept any suggestions of what [not] to do,
> 
>  Cheers
>     
>     Nige
> 
> +--------------------------------------------------------------------+
> | Nigel A Rantor                 |  WEB Ltd                          |
> | e-Mail: nigel@mail.bogo.co.uk  |  The Pall Mall Deposit            |
> |                                |  124-128 Barlby Road              |
> | Tel: 0181-960-3050             |  London W10 6BL                   |
> +--------------------------------------------------------------------+
> |     She lies and says shes in love with him,                       |
> |     Can't find a better man,                                       |
> |                                 Better Man - Pearl Jam             |
> +--------------------------------------------------------------------+
> 
> _________________________________________________
> This messages was sent by the robots mailing list. To unsubscribe, send mail
> to robots-request@webcrawler.com with the word "unsubscribe" in the body.
> For more info see http://info.webcrawler.com/mak/projects/robots/robots.html
> 

_________________________________________________
This messages was sent by the robots mailing list. To unsubscribe, send mail
to robots-request@webcrawler.com with the word "unsubscribe" in the body.
For more info see http://info.webcrawler.com/mak/projects/robots/robots.html

Next message: Ian Graham: "Re: an article... (was: Re: Standard?)"
Previous message: Benjamin Franz: "[...]Re: Cache Filler"