To judge recall you need to hire people to read the collection and judge
whether each document is relevant to your query. If your collection is
the WWW, however, this is almost laughably impossible.
One method people use when their collection is too big is the "pooling"
method. You submit the query to a number of engines, take their top
documents, pool those into a big list (eliminating duplicates) and then
you only need to hire enough relevance assessors to read through the
pool.
Unfortunately, the collection (i.e. the WWW) is very very very big, and
their pool, which is the top 20 from each of 5 engines, is very small.
Their pool for each query is at most 100 documents and at least 20, and
to claim that you have identified all relevant documents on the web from
a pool of <100 is rubbish.
So they can't really measure recall on the web.
Having said all that, I think that Excite give the best quality results
of any of the search engines on your everyday query. But that's just a
feeling I have after using all the different engines.
-- Nick Craswell ph: 249 4001 (w) Department of Computer Science Mail: Nick.Craswell@anu.edu.au Australian National University Web: http://pastime.anu.edu.au/nick/ _________________________________________________ This messages was sent by the robots mailing list. To unsubscribe, send mail to robots-request@webcrawler.com with the word "unsubscribe" in the body. For more info see http://info.webcrawler.com/mak/projects/robots/robots.html