Keyword indexing

David Reilly (dodo@fan.net.au)
Thu, 01 Jan 1970 10:00:00 +1000


I'm currently developing a new spider (IntelliAgent) whose purpose is to
find new internet resources within a specific subject domain (for example,
computer programming), and then create an index as a reference for a future
search engine.

My problem is deciding exactly *which* words are important to index, and
how to store such a huge amount of data in a manner that will be easily
accessible for a search engine.

Has anyone got any suggestions as to how to go about this? Should I maintain
a list of keywords which my spider will index, or should I index every single
word (including small ones such as if, the, and, but, etc...)?

I'm currently developing the application in C

=-=-=-=-=-=-=-=-=-=-=-==-=-=-=-=-=-=-=-=-=-=-==-=-=-=-=-=-=-=-=-=-=-=
David Reilly,
Computer Programmer, dodo@fan.net.au
http://www.fan.net.au/~dodo s1523@sand.it.bond.edu.au
=-=-=-=-=-=-=-=-=-=-=-==-=-=-=-=-=-=-=-=-=-=-==-=-=-=-=-=-=-=-=-=-=-=