Re: Robot-HTML Web Page?

Ricardo Eito Brun (x8035952@fedro.ugr.es)
Mon, 27 May 1996 13:49:55 +0100


At 19:07 23/05/96 -0700, you wrote:
>I'm looking for a DEFINITIVE guide that completely explains how robots
>"read" HTML pages so that I can incorporate appropriate HTML codes to
>instruct the spider to submit appropriate summaries, keywords, etc. to
>various search engines that are based on spiders. (META?)
>
>I sincerely appreciate your help.
>
>Please reply through e-mail. I'm not subscribed to this list.
>
>Thank you very much.
>
I can only give you information about three indexes;
I have read that Lycos indexes the web pages taking the
100 word more frequently appearing in the first 20 lines in the text;
Others like Jumpstation indexes the words between the tags
<TITLE> and <Hn> (every heading from H1 to H6) and a set
of words which appear more frequently (called the 'subject');
WWWW takes the words appearing in the <TITLE>, the text
of the anchors and in the URL of the document (I'm suspicious
that this isn't an accurate information); and
URL-Search takes all the words in the HTML page (excluding
of course the tags).

Other indexes like Archie are compiled in a different
way, but I think we can't talk properly about them as 'robots'.
I have also read about one attempt of the OCLC to build a
Meta Database using MARC codes. However, I don't know
about any index which uses the META tags KEYWORD or a
similar one.