Re: META tag standards, search accuracy

Nick Arnett (narnett@Verity.COM)
Mon, 14 Oct 1996 10:14:52 -0700


At 7:56 AM 10/14/96, Benjamin Franz wrote:

>Outside of academia,
>no scheme that requires referencing *external* schema for META data is
>going to work and achieve general usage.

I don't quite understand why you say this. There is tremendous momentum in
a number of markets for this. For example, financial, semiconductor,
chemical manufacturing and many others have invested a great deal of time
and money into developing specialized ontologies that will add tremendous
value to their data. You could ask them to reduce their meta-data to
something very simple, but they won't listen to you. This sort of issue
comes up in sports, grocery shopping and other consumer activities that are
coming to the Web, not just vertical industries.

Even for general search services the architecture will allow a simple
schema. That's one of its advantages; it can be simple or complex as
needed. The average Josephine page author doesn't need to know or care
that the definitions are external; the biologist, market analyst and
semiconductor engineer can take advantage of the fact that they are.

>The Dublin core is unfortunately yet another example of where the
>the people *envisioning* the system are forgetting yet again that the
>people *using* the system are (A) Not technical people (B) Are not even
>aware of there being a formal standard to do something.

Sorry, but I think that you are forgetting that some of the people are
technical, using specialized vocabularies and information. The Web isn't
just for popular, general-purpose information (whatever that is).

>Worse, it doesn't even address keywords
>- which are at the heart of every search engine query!

This shows a basic misunderstanding of full-text search. Keyword search is
not what full-text search engines do; they look at all of the words and
increasingly take into account parameters such as density, proximity, case,
statistics, thesauri, etc.

>Lastly, in conflict with
>existing widespread usage, it renamed 'description' as 'subject'. That
>is just begging to be sandbagged.

This shows a basic misunderstanding of the problem! The same field names
mean quite different things in different systems, a problem that librarians
have known about for quite some time, which we don't have the privilege of
ignoring. Take the word "title," for example. In one context, it might
mean the title of the document. In another, it might mean the author's job
title. If you're suggesting that it can *only* mean the former, then we're
trying to reduce the meaning of the word in the English language, which is
widely used by non-technical people who don't care a whit about Web geeks'
notions of simplifying things. ;-)

Simple forms of communication don't let you say much...

Nick