Re: Keyword indexing

Scott 'Webster' Wood (swood@thewild.com)
Tue, 18 Jun 1996 12:52:09 -0400 (EDT)


>
> At 10:00 AM 1/1/70, David Reilly wrote:
> >I'm currently developing a new spider (IntelliAgent) whose purpose is to
> >find new internet resources within a specific subject domain...
>
> Are you going to crawl the entire web...?
> That can easily get wasteful and unproductive
> (would waste network, would take forever and you'd miss many).

> It might make more sense to seed the robot with a list of known (or
> suspected) relevant pages...

> Still I can't escape the feeling that there must be a better way
> to find _new_ resources for a subject, other than blind crawling
> on the Web. Like monitoring newgroups, mailing lists, parse specific
> pages etc, not to mention human editorship :-)

I am interested in just such a search engine. If anyone is working on
something that they need a place to test or getting ready to offer such a
harvesting/indexing tool, I would be very interested in talking to them.
Unfortunately, I am getting too busy with other things to spend a lot of
time working out the kinks in my own code - especially in that I have not
worked extensively with TCP/IP raw coding.

I have a specific list of starting points that I have in fact collected
using newsgroups, mailing lists and user referrals. The list is extensive,
and I would start to manually build an index if only I had a good format to
use for creating the database. Again, I do not currently have the time to
research current tools/file structures and would much prefer a working package
to automate as much of the collecting/sorting/structuring/indexing as possible.
Preferably something pre-bundled with a cgi search tool to hunt through the
database and perhaps admin programs for managing the data collected.

I would be especially interested in something that can create new lists of
starting points by finding URI's in the documents I ask it to start from, and
then check those quickly for matching key words to create a new list of
starting points I might use for my next harvest. (preferably looking for
repeats of items already in my index)

I don't ask for much do I? Anyone know of anything that can do the magic
I desire that will run on my Linux box? I have a project that I want to
get rolling her, but can't find the tools to do the job. I hate pounding
nails with my head, and would much prefer a 'hammer' to do the job right...

S
cott