Re: [...]Re: Cache Filler

Carlos Horowicz (carlos@atina.ar)
Mon, 2 Dec 1996 12:48:12 -0300 (ARG)


>
> On Fri, 29 Nov 1996, Nigel Rantor wrote:
>
> >
> > Well I have read a lot of the archived stuff on this group, and consumed
> > Martijn Koster's pages. I expect to conform to robots.txt, deal with
> > relative links including the '.' and '..' directories, use raw IP
> > addresses to index previously visited servers to get around aliasing,
>
> While this was a good idea when every server used a unique IP address - it
> is no longer a good idea. New generation servers can have several servers
> sharing the same IP address while still being distinct.
>
> > possibly limit depth of searches(although this depends on the site), and
> > all that other stuff to make it a 'nice' wobot...
> >
> > I'll be on this group from now on to catch any other ideas or proposals
> > for 'bots and if they apply I guess I'll try to stick to it.
> >
> > Apart from that I'll accept any suggestions of what [not] to do,
>
> I am concerned that this proposed robot seems to increase network traffic
> by prefetching pages that may never be viewed solely to improve the
> response for the *first* person to visit the site. Subsequent people would
> see no improvement over what they would see without prefetching the pages
> since the first person will have caused them to be loaded into your
> caching proxies cache.
>

I wouldn't focus *only* on new sites. The real problem is how to anticipate
user's requests to transfer the docs to your cache, and how to notice when
they should be refreshed.

The real win, if any, could be to set TTL's for the prefetched pages according
to the frequency the robot will run. This would improve performace not only
for the first visitor of a site.

On the other hand, the robot's run could be triggered by expiring TTL's telling
that some documents need to be refreshed.

> --
> Benjamin Franz
>
> _________________________________________________
> This messages was sent by the robots mailing list. To unsubscribe, send mail
> to robots-request@webcrawler.com with the word "unsubscribe" in the body.
> For more info see http://info.webcrawler.com/mak/projects/robots/robots.html
>
Carlos Horowicz (CH69)
Ministry of Foreign Affairs - ARNET
Buenos Aires, Argentina

_________________________________________________
This messages was sent by the robots mailing list. To unsubscribe, send mail
to robots-request@webcrawler.com with the word "unsubscribe" in the body.
For more info see http://info.webcrawler.com/mak/projects/robots/robots.html