Re: Single tar (Re: Inktomi & large scale spidering)

Martin Hamilton (martin@mrrl.lut.ac.uk)
Mon, 27 Jan 1997 14:09:32 +0000


--==_Exmh_689653376P
Content-Type: text/plain; charset=us-ascii

Jaakko Hyvatti writes:

| - Single massive transfer might use all server bandwidth for a long time.
| Though a robot should maybe limit it with read()ing slowly or limiting
| window. Maybe something could be done with TOS also.

Ideally, there would be a mechanism similar to Harvest's incremental
updates in the Broker -> Gatherer protocol ? i.e. the robot would be
able to to say

just send me info about the objects which have changed
since such-and-such a date

(and please compress it ;-)

This implies we're doing a little bit more than just fetching a file -
though in the degenerate case that could actually be all we end up
doing. If nothing else, it would give people an incentive to run
fancy HTTP servers that supported incremental indexing, so they don't
get such a big hit when a robot comes by.

It also seems to me that in order to be useful for the sorts of WWW
indexing robots we have around at the moment, the content of the index
info delivered would need to be rather more detailed than the sort of
index objects (centroids) currently being shipped around with CIP. CIP
is great if you want to route a query to a particular index server,
but would you want to issue referrals to thousands of leaf-node WWW
servers in response to a query for a Next Generation CIP-enabled
search engine ?

Big question: would the level of detail in, say, a typical SOIF object
be sufficient for the people running the big robots ? Could you live
with it if that was all people were prepared to let you have ? :-)

Cheerio,

Martin

PS On the other hand, routing a query to one or more of a small-ish
mesh of cooperating index servers sounds like a fairly sensible way of
scaling the existing whole-WWW-indexes. You might already have
developed your own proprietary technology for doing this, though...

--==_Exmh_689653376P
Content-Type: application/pgp-signature

-----BEGIN PGP MESSAGE-----
Version: 2.6.3i

iQCVAwUBMuy3GNZdpXZXTSjhAQEnNgQAkUKksEnoDsTcfudGtOs5EKNtyrlUo9tm
Jh6Ns/FZJQA7wsVtMgu3EbpQIh/4QCHQ+UcniJn/GCke/QWLDn4DDQRE24vFCQMg
cO7GZAZh0DE1SFHRCqfBUf89E72vRPirf7e6pHDCYdJdChjQavNkeIrQcaYexNXS
8Dhi/wSJ57Q=
=pWjY
-----END PGP MESSAGE-----

--==_Exmh_689653376P--
_________________________________________________
This messages was sent by the robots mailing list. To unsubscribe, send mail
to robots-request@webcrawler.com with the word "unsubscribe" in the body.
For more info see http://info.webcrawler.com/mak/projects/robots/robots.html