Re: Client Robot 'Ranjan'

Kevin Hoogheem (khooghee@marys.smumn.edu)
Fri, 21 Jun 96 10:47:13 -0500


ok I was not saying anyone jumped the gun here or trying to say that everyone
should or should not have a crawler nor am i condoning it or saying that
anyone said it is bad or good.
blah ok...
Why can we not work for some open system of data indexing for the web??????
OH no kevin has just suggeed something stupid again.. how do i make it
if its an open system of data indexing.
Why do I go and say we should work on a simiway of storing data
for web indexer?
Ok as a webmaster/Network Adm/robot wrir myself I have a crawler
on mysite that only gathers information from my site for our search
engine located on our site, this allows people to find information
about our site fast and easly. We should incourage more sites
to do there own indexing so that people visiting them will find
the information they need faster and at a better hit ratio then
say going to Alta Vista or any other Search engine.
Ok I admit we still need the big boys our there to index more
then just one site.
Hmm it looks to me like a problem with how to do name serving,
Why do i say this is like the name server problem. , first
we have our core Search Engines, Excite, Alta Vista, Open Text
and the others, next we have our other network search engines
like our private ones that onlvisit a tiny sampling of our own
networks. What I would suggest is that these smaller ones
would pass there data along to the Core Search engines to add
to there database.
y is this a good idea. A few fold, first off it alows the
local administrator to index only what he wants to index and
doesnt have to worry about a Robots Standard that right now
doesnt have the flexablity that it needs. Also he doesnt
have to worry about rouge run away robots that index stuff
that he has excluded in his robots.txt.(i wonder has anyone seen
a robot that just looks at robots.txt then searches in those
directorys for jucy stuff ;)- )
Second the local robots will take a fraction of the time to
index there own site then say it would for Excite or any other
bot to index there site.
Adimistratorcan once after a site has been index go through
the data and update it or deletentrys in the database
so that only the newest files are aways in the database, hell
i have seen links that have been dead for ages on some search
engines.
At a given interval the core machines would receive the databases
of other smaller robot sites and incorperate there data with
what is stored at the core site, making for a bigger and more
accurate database at the core site.

Another thing that the smaller engines would or should be able to
do is to link searching with the core Engines so that if a link
is not found on the local site that has been indexed it could go
and fetch from the core for other informational sites. Yes i could
make many text fields to other places but to put an option search
this site alone or this site and the whole n, would be nice.

ok the drawbac:
The big boys cant claim they have the biggest and the best search
engine out there. WHO cares Why should they be saying this in
the first place. The whole idea of the internet at least when
i was staring to play on it in the mid 80's was information
is free and people are out helping each other get that information
not I have a coer and a bigger and better site then you do.
Most of the sites wont make the money by sayi I have the biggest
index in the world but by what they are doing now Advertising on
there sites. Well think of this then, if I have a local search
engine but allow it to be tied in with a bigger, I should have
to note which one i am using and Then even when the data
ack.. is printed out it should mention if
that is local or if it was fected from lets say lycos or some
other source.

OH well I think there needs to be more group work done on these
type of projects and not this bickering of oh someone is
gonna kill my bandwith so i cant mud tonight ;)-

Maybe the solution is this.. a Local engine on t loUsers
machine and he would download indexs of what he wants so he can do
searches and once in a while he would have to download a new index.
Like if i wanted to search for sports I would download a sports index
file from someplace, all transen to the user, but then all searching
is done on the home computer. The only time he would have to redownload
the index file is when a substanual amount of new data is added to the
index. Then only the new information idownloaded not the whole
index again.

Kevin Hoogheem
@marys.smumn.edu