Re: Should I index all ...

Nick Arnett (narnett@pinkfloyd.verity.com)
Mon, 8 Jul 1996 21:40:46 -0700 (PDT)


>No doubt that you exclude some meaningful information when you
>use stopwords, but two benefits often outweigh this consideration.
>First, your database does get smaller, easily 25%, although there are
>many factors that affect this number.

What's your source for that number? Unless you have a very primitive index
and a very aggressive stopword list, the size reduction is nowhere near
that large, I believe.

>The second, and often more
>important benefit, is that the elimination of stopwords prevents
>users from clobbering your database by searching for combinations
>of common words, e.g. find all documents that contain "S" and "THE"
>and return them in relevance order. Also note that even if you
>didn't care about either of these issues, search systems may
>implement stop words to prevent conflicts between query operators
>and query terms.

Unfortunately, it also prevents users from finding "The United States of
America" and similar phrases, which tends to confuse people quite a bit.
Stop words can be stripped from the query if necessary to avoid the problem
you describe. Our engine does so as a matter of routine when you use our
free text parser, for example.

Nick