Re: Search accuracy

Benjamin Franz (snowhare@netimages.com)
Mon, 1 Apr 1996 18:01:11 -0800 (PST)


On Mon, 1 Apr 1996, Nick Arnett wrote:

> At 3:13 PM 4/1/96, Benjamin Franz wrote:
> >On Mon, 1 Apr 1996, Nick Arnett wrote:
>
> >> You're saying that a search on "buns" should return pages about rabbits? ;-)
> >
> >Actually, yes. ;-). Those who frequently talk about rabbits use 'buns' as a
> >synonym for rabbits. And a search on buns on Alta Vista *does* return pages
> >involving rabbits. Along with discussions of food, hair, long distance
> >running in cold weather, as well as human and non-human anatomy. This is
> >where skill in constructing a search to exclude things that are *not* of
> >interest comes in handy.
>
> I'm not sure that this is a good direction -- expecting people to define
> the subjects to exclude. After all, this isn't how we tend to "search"
> when we have a human being to help us. If you walked up to a reference
> librarian and said "Rabbits," what kind of response would you expect? I
> think it would be something along the lines of "What about rabbits?" Yet
> we expect computers to be better mind readers than humans!

Not really. The whole search page is an implicit 'information on X'
request. More significantly - a great deal of library science has to deal
with the issues of categorization and cross-referencing. The lack of
which on the web is the fundamental issue that lead to full body text
indexing being the search mechanism of choice on the WWW in the first place.

> Fuzzy logic -- the more evidence, the better -- seems to get people to
> relevant documents with fewer iterations. For example, you could probably
> come up with a query that would get rid of the documents that use "buns" to
> refer to anatomy (though it's not obvious to me, actually), but why not
> spend that energy and time providing more words, phrases and other evidence
> that a document is about rabbits, so that the anatomy documents fall to the
> bottom of the relevancy list?

Better yet might be iterated searching with ratings. You do an initial
search, then you can mark matches on the first page of returned results
for relevancy and rekey the search. The search engine could then re-rank
the returned results via a smart attempt to place the docs in an N-space
based on word frequencies and other measurable properties of the
documents. In a primitive way, Alta Vista kind of does this with its
ranking options for search terms.

-- 
Benjamin Franz