Re: Search accuracy

Judy Feder (Judy_Feder@cq.com)
4 Apr 96 7:05:32


You're saying that a search on "buns" should return pages about rabbits? ;-)

The nit I'd like to pick here is that you're describing good recall
(finding all of the relevant documents), which is only half of the search
accuracy problem. The other half is precision, which is finding only
relevant documents. A thesaurus/dictionary-based semantic network could
return all of the documents that you describe... but the problem would
remain that it would *also* return many, many other documents that have
words with some sort of linguistic connection to these.

Balancing precision and recall is the big problem in search. Robots that
compile additional evidence can help in ways that go beyond just indexing
the words. For example, capturing HTML zone information can help score
documents based on where words appear.

Nick

Re: Nick's comments on semantic networks. I'm very pleased to see him giving a
plug for semantics, but I'd like to clarify one thing. A true semantic network
(which today is only offered by Excalibur Technologies' RetrievalWare) does not
force the user to make the precision/recall tradeoff. Yes, the semantic
network does boost recall by building in literally millions of word links (so,
stock is linked to equity, share, trade, bond, security, etc.). However,
unlike a thesaurus, or any other tool used in search engines today, the
semantic network also lets you specify word meaning. Thus, you can specify a
search on stock as "shares issued by a company...," telling the system to
ignore references to soup stock, live stock, retail stock, etc.

I agree, that leveraging fielded or zone information is also a very useful part
of the mix. But, the bottom line is that semantic networks provide the most
accurate searching -- precision and recall -- available to users and Web site
developers today. For more on this, see the TREC results at the NIST WWW site,

Judy