This begs a big question -- how will you measure the relevancy? Given the
nature of Web links (I discovered from trying this sort of thing) you have
to take a very fine-grained approach to evaluating whether or not to follow
a link. It appeared to me from a brief shot at this kind of robot that
you'd need to make the recursion decision based on following a series of
links, not based merely on the contents of individual pages.
The difficult problem is that there are many Web pages that touch a variety
of subjects -- often the most interesting jumping-off points cover several
topics. Traditional approaches to relevancy ranking tend not to take this
into account; the relevancy score would not reflect the usefulness of some
of the links in a document that might be quite relevant.
In short, it became clear to me that this kind of robot will take some time
to develop and maintain. It'll also follow many dead ends -- even the best
relevancy ranking systems today have an accuracy of perhaps 80 percent or
so. Multiply the 20 percent error to take into account recursion and the
robot will spin its wheels a bit.
Lots of trial and error is needed, but it's very interesting work, I
suspect. Verity might have an interest in supporting such research.
Nick