web topology

Fred K. Lenherr (lenherr@tiac.com)
Tue, 25 Jun 1996 09:43:52 -0400


Where can I find some detailed facts about over-all web topology?

I'm interested in things like connectivity statistics (e.g.,
given two pages with _some_ path between them, what's the average
shortest length between them)? Stuff like that.

Like several others here, I'm working on a domain-specific (domain of
knowledge, not of the Internet) robot. My idea so far is to:

1. Create a large set of known-relevant top level pages.

2. Index them, retrieve their children and qualify the children for
relevance to the domain of interest; index the relevant children.

3. Recurse on the relevant results from step 2; recurse only
to a limited depth on the irrelevant results from 2.

For example, allow up to 1 or 2 or whatever irrelevant docs
before discontinuing recursion on that sub-graph.

This should work if a reasonably large proportion of all relevant
docs can be reached via no more than n irrelevant ones. Of course
there will be misses, but if I had some data, I could choose the
search parameter more intelligently.

Any suggestions much appreciated.

-- Fred