Inktomi & large scale spidering

Otis Gospodnetic (otisg@panther.middlebury.edu)
Sun, 26 Jan 1997 00:59:35 -0500 (EST)


OK, I promise I'll stop, but here is some info for the person who was asking
about large scale spidering.
Look at www.inktomi.com

they claim they have the technology that uses multithreading and parallel
processing that allows them to index 10M documents per day.
If that is really so then they can kick everybody's butt in about a week (what
are they waiting for ?). Don't know....
All this seems to me like a very brute-force method instead of a well
thought-of(English?), elegant method that doesn't just try to be very fast,
but also very intelligent about what to index, what to ignore, and so on.

I have a feeling that if somebody could persuade all Webmasters out there to
publish and make available information about their site(s) for robots, that
person could have a superior spider in no time (not that others couldn't do
the same, but they would not be the first one to use the new approach).

I better stop....in the name of love...

Oh, one more thing.
Hotbot - where does their robot come from(hostname) ?
Is it inktomi.com ?
Also, atext.com is Excite, right ? Why atext.com ? Why not excite.com ?

Otis
==========================================================================
POPULUS People Locator - The Intelligent White Pages - http://POPULUS.net/
==========================================================================

_________________________________________________
This messages was sent by the robots mailing list. To unsubscribe, send mail
to robots-request@webcrawler.com with the word "unsubscribe" in the body.
For more info see http://info.webcrawler.com/mak/projects/robots/robots.html