Re: Should I index all ...

Terry O'Neill (toneill@mariner.com)
Fri, 05 Jul 1996 11:20:17 -0600


Jaakko Hyvatti wrote:
>
> ...
> ...
> ...
>

> I'm not sure there would be any significant reduction in database size
> if a few suspected stopwords were excluded. And as you can not be sure
> about the language used or acronyms included, your stopword list might
> exclude meaningful information. If you also make searching phrases
> possible, stopwords may be significant in them.

No doubt that you exclude some meaningful information when you
use stopwords, but two benefits often outweigh this consideration.
First, your database does get smaller, easily 25%, although there are
many factors that affect this number. The second, and often more
important benefit, is that the elimination of stopwords prevents
users from clobbering your database by searching for combinations
of common words, e.g. find all documents that contain "S" and "THE"
and return them in relevance order. Also note that even if you
didn't care about either of these issues, search systems may
implement stop words to prevent conflicts between query operators
and query terms.

Terry O'Neill
mariner.com