Re: Search accuracy

Robert Raisch, The Internet Company (raisch@internet.com)
Thu, 4 Apr 1996 20:22:09 -0400


It should be noted that while semantic networks offer greater
accuracy in the assessment of search results, they imply a
rather crushing editorial burden on *someone* as well. They
also address only one form of search and not the most prevalent
form at that.

It is all well and fine to limit your search to stock -- a term
related to finance and corporate worth, until you search for
feldarcarb -- a recently coined term related to initial
offerings of Internet-related companies whose value has been
artificially inflated by ignorance on the part of the market
and the public. (Apologies to Battlestar Galactica.)

Unless someone has updated the network to include this new
terminology and how it relates to the rest of the corpus, your
search may prove less than completely satisfying. And herein is
the problem, how often is the semantic network enriched and by
whom? Since language and meaning are not static, easily
encompass-able things, an effective semantic network must be
considered to be a verb rather than a noun.

The whom above is usually an expert in the chosen field rather
than a general purpose editor or information specialist. This
is another problem relating to semantic networks as the needed
experts would rather be doing something interesting, something
other than creating what amounts to a very detailed dictionary
of the specific terminology within a particular field.

This is why semantic network based indexing is popular in
highly technical fields such as medicine, where the changes to
the dictionary are not especially frequent. "Femur" has been
around a very long time indeed.

This raises another issue I have with semantic-network indexing,
that of the difference between what I call the Expanders and the
Contractors.

Expanders are your typical searcher, where one broadens the
initial search, casting the net ever wider to catch more
information rather than less. This kind of behaviour is typical
of the general public and allows one to capture results that
otherwise might have eluded the searcher usually because of
lack of forethought in constructing the search. Most of us do
not spend much time building an effective search schema before
we jump into the index and thus depend on the serendipitous
nature of this kind of search to show us something for which we
were originally unprepared and might not original expect to
find without help.

Contractors are those, usually in a highly technical field, who
know there is one perfect match for the search, if only they
could specify their query properly. Semantic networks are
perfect for this kind of searcher, providing much of the
"inferred" value of the search -- where *I* know which stock I
mean when I start out and only wish to see those results that
have something very specifically to do with finance.

All in all, I prefer approaches like PLS' where the document is
subjected to a statistical analysis, one where each word is
indexed as well as its relationship with all the other words in
the document. Along with the standard arsenal of boolean,
fielded, and adjacency search features, I feel this represents a
very reasonable "middle-ground" for the typical searcher.

Of course, most of this is more than a little academic since
the *vast* majority of all searches initiated online are for
single keywords rather than more complexly constructed,
multi-termed queries.

Go figure. We have a wealth of brushes, paints and inks within
easy reach and yet most of us would rather photocopy pretty
pictures out of coffee table books. ;)

</rr>