Re: Links (don't bother checking; I've done it for you)

Darrin Chandler (dchandler@abilnet.com)
Fri, 29 Mar 1996 10:11:37 -0700


At 10:56 3/29/96 +0100, you wrote:
>Then, at the end of his page he had the following keywords repeated
>
>California
>Northwest
>Animation
>Promotion Web Site
>Development Web Site
>Knitter Web
>Money
>
>about 50 times each..
>
>okay, so the action to take in his case is obvious (deindex any siGHt(e)
>beloning to him), but what is the general case to stop this? It's very
>difficult as far as I can see. It's deliberate worthless junk, trying to
>get in at the level of people who are providing worthwhile information
>about california/web sites etc.

Well, if you were to analyze the comments of a web page and compare the
ratio of unique words to total words, you could cull a large percentage of
these types of pages. Even better, you could use that metric to decide
whether your indexer should include comments, which means you can still
index the page, but without the bogus keywords.
______________________________________________

_/| _| _| _|
_/_| _| _| _| _| _|_|_|
_/ _| _| _| _|
_/_|_| _|_|_| _| _| _| _| _| _|
_/ _| _| _| _| _| _| _| _| _|
_/ _| _|_|_| _| _| _| _|_| _|_|_|
_|
_|_|_|

Darrin Chandler, Duke of URL
Ability Software & Productions
Email: dchandler@abilnet.com
WWW: http://www.abilnet.com/
______________________________________________