Re: Search accuracy

Nick Arnett (narnett@Verity.COM)
Thu, 4 Apr 1996 18:11:28 -0800


At 8:22 PM 4/4/96, Robert Raisch, The Internet Company wrote:

>All in all, I prefer approaches like PLS' where the document is
>subjected to a statistical analysis, one where each word is
>indexed as well as its relationship with all the other words in
>the document. Along with the standard arsenal of boolean,
>fielded, and adjacency search features, I feel this represents a
>very reasonable "middle-ground" for the typical searcher.

I think you're saying two things at once here -- statistical analysis
helps, but a variety of algorithms/operators is important. This would
seem to be quite true; as was said here earlier, the more evidence, the
better. Statistic analysis (aside from its indexing speed and size issues)
is done on a corpus, not individual documents. This presents the problem
of combining search results from multiple corpuses. That's not an issue
until you try to leverage search across a bunch of indexes whose corpuses
have different co-occuring word frequencies. We find that customers don't
generally turn on our statistical operators when they're available. Do you
get better search results with co-occurring word search ("concept") search
turned on?

Search algorithms are like cold medicine -- if you combine a bunch of them,
you minimize the size effects.

>Of course, most of this is more than a little academic since
>the *vast* majority of all searches initiated online are for
>single keywords rather than more complexly constructed,
>multi-termed queries.

I'm not sure this ends up being true; even though each search may add just
one term, people often are building multi-word searches through trial and
error. There aren't many one-word searches that yield useful results on
the big Web indexes, in my experience.

I suspect that search is like page layout when PageMaker came out. No one
thought they'd need to learn typesetting "language," but they did. Today,
people don't think they'll learn query languages... but I predict that the
basics of a query language will be familiar to most Internet users within a
few years. Of course, the question is, what query language... ;-)

Nick