Re: word spam

Benjamin Franz (snowhare@netimages.com)
Mon, 15 Apr 1996 08:14:05 -0700 (PDT)


On Mon, 15 Apr 1996, Trevor Jenkins wrote:

> On 12 Apr 96 at 0:22, chris cobb wrote:
>
> > This may have been discussed in past sections, but it comes to mind regarding the use
> > of large blocks of random or repetitive keyword text in web pages - either to obstruct a crawler's indexing
> > mechanism or to increase the ranking of a page.
> >
> > There is a feature in many word processors - Word comes to mind - as part of the
> > "Grammer Check" section.
>
> Sadly, some index engines are incluing grammatically correct pages
> that are not really pages. For example, use the Alta Vista engine and
> look for "posix". You will get "hundreds" of perl, Tcl/Tk or other
> scripts that include "posix.pl" or similar. These scripts are
> syntactically correct but are not what I was expecting to see. The
> grammar checker, in case, has verified that the content is okay but I
> would contend that the page should have been excluded from
> consideration.

posix AND NOT (posix.pl)

Add in some ranking keywords for the specific area of interest and Alta
Vista can produce quite good results.

It gets back to what I said about it being the responsibility of the
searcher to tailor the search. There is no feasible way for either the
indexing engines or the content providers to *exclude* things as being
irrelevant to searchers without assistance from the person making the
query. Almost everything is relevant to *someone*, even if not to you. If
you get a large irrelevant return from a search - retune your search to
exclude the non-relevant material.

--
Benjamin Franz