Re: Should I index all ...

Jaakko Hyvatti (Jaakko.Hyvatti@Elma.FI)
Fri, 5 Jul 1996 14:32:45 +0300 (EET DST)


Michael G=F6ckel:
> The clue is: Use a stopword list. There's not much sense in showing all=
=20
> your documents as a result of a search. Exclude words, which are found=20
> too often. This helps getting your database smaller (if you index a=20
> great number of documents). But you should notify the user about having=
=20
> entered a stop word.

I'm not sure there would be any significant reduction in database size
if a few suspected stopwords were excluded. And as you can not be sure
about the language used or acronyms included, your stopword list might
exclude meaningful information. If you also make searching phrases
possible, stopwords may be significant in them.=20

On the other hand, in some statistics or such it might be a good idea
to exclude stopwords, but excluding them from database gives no real
benefit.=20

It's all another question of user interface how you construct the
responses to a query, you might give users possibility to limit number
of matches so that their Netscape will not dump core with the produced
large document :-) But if they *want* to know where a stopword or an
acronym that looks just like a stopword is used they might as well be
given the answer, it does not cost anything, disabling and restricting
the use of words costs you.