> Erik Selberg writes:
>
> | I think the only reasonable solution in a lot of ways is to make a
> | spider which is attached to a server. This spider would then create
> | the tarballs (and you could have one big file, as well as a week's
> | worth of changes in another for incremental updating). The spider
> | could also do other useful things, like make sure you got all your
> | scripting done right, you didn't forget any links, etc. etc. etc.
>
[snip]
> What I'm really interested in is whether robot authors can be
> persuaded to pick up index data in a small number (ideally one?) of
> common formats and via a small number of common protocols (one?!).
> For example: SOIF and the Harvest Gatherer protocol, or RDM over HTTP.
There is a new type of robots emerging, the archiver bots. They want
fulltext (and images), and the commercial bots presumably might not
want SOIF.
>
> How about it ?
Although we would prefer a SGML based format, SOIF (or RDM) would do for us,
iff anchor texts and URLs were enumerated in a similar fashion (such that I
knew for sure which anchor text -- or alt-text -- belongs to each URL).
The Harvest Gatherer Protocol is interesting.
>
> Cheerio,
>
> Martin
>
>
>
_________________________________________________
This messages was sent by the robots mailing list. To unsubscribe, send mail
to robots-request@webcrawler.com with the word "unsubscribe" in the body.
For more info see http://info.webcrawler.com/mak/projects/robots/robots.html