Re: META tag standards, search accuracy

Eric Miller (emiller@oclc.org)
Mon, 14 Oct 1996 13:10:26 -0400


Benjamin Franz writes:
> The Dublin core is unfortunately yet another example of where the
> the people *envisioning* the system are forgetting yet again that the
> people *using* the system are (A) Not technical people (B) Are not even
> aware of there being a formal standard to do something.

Excellent points... and point of fact, they are *indeed* recognized,
however, we keep coming back to the questions of options... It is
increasingly difficult to effectively find things on the web. Ok: now
what? There seems to be (at least) two underlying hypotheses here...

1) That some cataloging by untrained professionals (ie. most people
on the web) is (better||worse) that no cataloging
2) Good indexing and bad cataloging is (better||worse) than good cataloging

I would assert that we do not know the answer to either of these...
One of the things that the Dublin Core (or any simple resource
description model) provides is a place to test these hypotheses.

Its clear that library cataloging is entirely to expensive (and time
consuming) for describing all of the resources on the net... no one (I
hope) is going to argue this. Its also clear that full-text indexing
will not work (someone may argue against this...) Resource description
therefore might be viewed as an axis from

|---------|-----------|----------|
Full-text Dublin-core Rich Descriptive Standards
Indexing (e.g. MARC, FGDC, TEI, etc.)

The Dublin Core is an attempt at defining a "middle-ground" between
these two by providing "commonly understood" descriptive elements
that may be used to describe something as simple as:

<META name = "DC.subject" CONTENT = "foo">
<META name = "DC.author" CONTENT = "bar">

To a more complex description or external reference to a DTD.

Most people are not technical, most people are not catalogers, most
people are not aware of "metadata" issues... but most people know what
they want and know when they can't find what they need.

Knowing this... we have to start somewhere, if for no other reason for
a comparison for knowing what to do. I would *love* the ability for
all resources to be self-describing. I would *love* to through some of
the cool harvesting/database/data-mining technologies at this (and we
are). But we also need to know how these various strategies compare
with one-another with regards to effectiveness...

> mis-use by the unknowledgable: The Dublin core is not. The *first* thing
> that is going to disappear are the <LINK>s to the schema, since people
> aren't going to understand them. Worse, it doesn't even address keywords
> - which are at the heart of every search engine query! This is a

LINKS are LINKS, there used for illustrative purposes, there are *not*
Dublin Core.

> ridiculous omission for a META data standard. Lastly, in conflict with
> existing widespread usage, it renamed 'description' as 'subject'. That
> is just begging to be sandbagged.

This is a difference of resource description and resource
discovery... Semantics are difficult to agree upon in a single
discipline (let alone a global environment). Continuing examples are
needed...

eric j. miller <URL:http://purl.oclc.org/net/eric>
emiller@oclc.org Office of Research, OCLC, Inc.
emiller@cis.ohio-state.edu Dept. of Geography, The Ohio State University