Re: Looking for subcontracting spider-programmers

Hani Yakan (hani@i-online.com)
Fri, 06 Sep 1996 15:57:38 -0500


Mr. Rossi

Sounds like a very interesting project. The following are questions
based on your message:

> We have a list of about 60,000 URLs that we need to index, for searching
> purposes.
>How are these URLs currently stored. Did you gather this information
using a particular robot and if yes, will the input to the new program(s)
be in the same format that the 60,000 URLs are stored in? or is gathering
new URLs is also part of the project?

> 1*) Will write a program to verify those links, and build a database (for
> searching purposes) containing the url, title and first 200 words (cleaning
> the HTML).
>Do you have a specific database in mind (home grown, relational, object oriented) and
a specific database vendor in mind (ORACLE, Objectivity), or is this undecided?
Also, is there any development language requirements, or can the development
be in C++?

> 2) Build a search interface for the web and install it on our unix server.
>What kind of UNIX servers do you have (even though developed code should be portable
accross most UNIXs and Windows)?

> 3) Update the database once a month.
>Is there a special technical reason for the "once a month" update or is it just
a period picked for managerial resons?.

Thanks,
Hani Yakan
MailTo:hani@i-online.com