Re: Need help on Search Engine accuracy test.

Nick Craswell (Nick.Craswell@anu.edu.au)
Thu, 30 Jan 1997 12:23:40 +1100


HiPromote@aol.com wrote:
>
> Hi folks,
>
> I need help. I don't understand how the "recall" comes out , or how does
> Excite know the total of the relevant documents in other services ?

To judge recall you need to hire people to read the collection and judge
whether each document is relevant to your query. If your collection is
the WWW, however, this is almost laughably impossible.

One method people use when their collection is too big is the "pooling"
method. You submit the query to a number of engines, take their top
documents, pool those into a big list (eliminating duplicates) and then
you only need to hire enough relevance assessors to read through the
pool.

Unfortunately, the collection (i.e. the WWW) is very very very big, and
their pool, which is the top 20 from each of 5 engines, is very small.
Their pool for each query is at most 100 documents and at least 20, and
to claim that you have identified all relevant documents on the web from
a pool of <100 is rubbish.

So they can't really measure recall on the web.

Having said all that, I think that Excite give the best quality results
of any of the search engines on your everyday query. But that's just a
feeling I have after using all the different engines.

-- 
Nick Craswell                     ph: 249 4001 (w)
Department of Computer Science  Mail: Nick.Craswell@anu.edu.au
Australian National University   Web: http://pastime.anu.edu.au/nick/
_________________________________________________
This messages was sent by the robots mailing list. To unsubscribe, send mail
to robots-request@webcrawler.com with the word "unsubscribe" in the body.
For more info see http://info.webcrawler.com/mak/projects/robots/robots.html