> Now, if we had a distributed cache mechanism, I wouldn't need to grab
> their cache file anymore - the robot itself could either access the
> cache files directly, or talk to the local cache handler using the
> between-cache protocol.
Quite.
>The storage format of the CERN proxy-cache is quite convenient for file
> access by robots (except it should compress the data - I haven't looked
> at it lately, so if it does now please ignore the last comment).
Hmmm... I have the feeling the CERN cache is far from ideal these days.
> Unfortunately, the same problems come up as I described in the last
> message. It is a waste of bandwidth, time and storage to completely
> duplicate entire caches. The ideal way would be to have some
> selection criteria, but what?
You could do all sorts, but for just the caching side you can use the
standard caching mechanisms, and base it on popularity etc. For content
subject selection you'd have to find some other ways, but at least if
you're sitting on a complete cache you have the freedom to choose.
Wouldn't it be handy if you could run a java/Safe-perl/whatever
selector on the remote cache, so it can choose for itself according
to _your_ rules instead of the server? :-)
Happy New Year all,
-- Martijn
Email: m.koster@webcrawler.com
WWW: http://info.webcrawler.com/mak/mak.html