Search accuracy

Nick Arnett (narnett@Verity.COM)
Mon, 1 Apr 1996 09:29:25 -0800


>On Fri, 29 Mar 1996, Darrin Chandler wrote:

>The individual words usually cough up very different sub-sets of pages
>related to rabbits. A *good* search request would look for all of them - in
>the absence of such searches, I would keyword a page to all of them. And I
>would be correct to do so. But your unique words rejection heuristic would
>likely deny the page.

You're saying that a search on "buns" should return pages about rabbits? ;-)

The nit I'd like to pick here is that you're describing good recall
(finding all of the relevant documents), which is only half of the search
accuracy problem. The other half is precision, which is finding only
relevant documents. A thesaurus/dictionary-based semantic network could
return all of the documents that you describe... but the problem would
remain that it would *also* return many, many other documents that have
words with some sort of linguistic connection to these.

Balancing precision and recall is the big problem in search. Robots that
compile additional evidence can help in ways that go beyond just indexing
the words. For example, capturing HTML zone information can help score
documents based on where words appear.

Nick