web topology

Fred K. Lenherr (lenherr@tiac.com)
Tue, 25 Jun 1996 09:43:52 -0400

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Kevin Hoogheem: "Re: in-document directive to discourage indexing ?"
Previous message: Jaakko Hyvatti: "Re: in-document directive to discourage indexing ?"
Next in thread: Nick Arnett: "Re: web topology"
Maybe reply: Nick Arnett: "Re: web topology"
Maybe reply: L a r r y P a g e: "Re: web topology"

Where can I find some detailed facts about over-all web topology?

I'm interested in things like connectivity statistics (e.g.,
given two pages with _some_ path between them, what's the average
shortest length between them)? Stuff like that.

Like several others here, I'm working on a domain-specific (domain of
knowledge, not of the Internet) robot. My idea so far is to:

1. Create a large set of known-relevant top level pages.

2. Index them, retrieve their children and qualify the children for
relevance to the domain of interest; index the relevant children.

3. Recurse on the relevant results from step 2; recurse only
to a limited depth on the irrelevant results from 2.

For example, allow up to 1 or 2 or whatever irrelevant docs
before discontinuing recursion on that sub-graph.

This should work if a reasonably large proportion of all relevant
docs can be reached via no more than n irrelevant ones. Of course
there will be misses, but if I had some data, I could choose the
search parameter more intelligently.

Any suggestions much appreciated.

-- Fred

Next message: Kevin Hoogheem: "Re: in-document directive to discourage indexing ?"
Previous message: Jaakko Hyvatti: "Re: in-document directive to discourage indexing ?"
Next in thread: Nick Arnett: "Re: web topology"
Maybe reply: Nick Arnett: "Re: web topology"
Maybe reply: L a r r y P a g e: "Re: web topology"