The Internet Archive robot

ACHAKS@inf.com
Fri, 06 Sep 96 13:17:42 IST


Dear Friends,

I am planning to implement a robot in Java.

If there are any available information/source available already can
you pl. guide me to its presence? ( I do not want to reinvent the
wheel or at least all of it)

To get the data out of a document I am looking for contents inside
the <Body> ... </body> region and the <title> ... </title> region. I
am ignoring the tag information altogether and the contents inside the
comment.
Do you think this is enough to get reliable and correct information
about the web page. Pl. suggest if you think otherwise.

I am planning to make the robot engine(Source code of the classes)
free after I make it.
Can you suggest where it would be best to upload for maximum access.

Is there any available Java classes/C code for implementing the robot
exclusion standard?

Pl. reply to achaks@inf.com


Angsuman