Re: Single tar (Re: Inktomi & large scale spidering)

Sigfrid Lundberg (siglun@gungner.ub2.lu.se)
Wed, 12 Feb 1997 12:32:52 +0100 (MET)


On Tue, 11 Feb 1997, Martin Hamilton wrote:

> Erik Selberg writes:
>
> | I think the only reasonable solution in a lot of ways is to make a
> | spider which is attached to a server. This spider would then create
> | the tarballs (and you could have one big file, as well as a week's
> | worth of changes in another for incremental updating). The spider
> | could also do other useful things, like make sure you got all your
> | scripting done right, you didn't forget any links, etc. etc. etc.
>

[snip]

> What I'm really interested in is whether robot authors can be
> persuaded to pick up index data in a small number (ideally one?) of
> common formats and via a small number of common protocols (one?!).
> For example: SOIF and the Harvest Gatherer protocol, or RDM over HTTP.

There is a new type of robots emerging, the archiver bots. They want
fulltext (and images), and the commercial bots presumably might not
want SOIF.

>
> How about it ?

Although we would prefer a SGML based format, SOIF (or RDM) would do for us,
iff anchor texts and URLs were enumerated in a similar fashion (such that I
knew for sure which anchor text -- or alt-text -- belongs to each URL).

The Harvest Gatherer Protocol is interesting.

>
> Cheerio,
>
> Martin
>
>
>
_________________________________________________
This messages was sent by the robots mailing list. To unsubscribe, send mail
to robots-request@webcrawler.com with the word "unsubscribe" in the body.
For more info see http://info.webcrawler.com/mak/projects/robots/robots.html