What's your source for that number? Unless you have a very primitive index
and a very aggressive stopword list, the size reduction is nowhere near
that large, I believe.
>The second, and often more
>important benefit, is that the elimination of stopwords prevents
>users from clobbering your database by searching for combinations
>of common words, e.g. find all documents that contain "S" and "THE"
>and return them in relevance order. Also note that even if you
>didn't care about either of these issues, search systems may
>implement stop words to prevent conflicts between query operators
>and query terms.
Unfortunately, it also prevents users from finding "The United States of
America" and similar phrases, which tends to confuse people quite a bit.
Stop words can be stripped from the query if necessary to avoid the problem
you describe. Our engine does so as a matter of routine when you use our
free text parser, for example.
Nick