Re: Description or Abstract?

Martijn Koster (m.koster@webcrawler.com)
Fri, 5 Jul 1996 08:23:12 -0700


At 10:56 AM 7/5/96, Davide Musella wrote:

>I had already read that doc. But I've not found any real proposal
>to standardize the description of an html file. they'll do (we hope).

I think we agreed on how to do it if we wanted to do it, but didn't
commit to doing it. Or something :-)

>I'm not sure it was more important to define
>the robot metadata than the description metadata;

That may be, but if you put 5 indexing robot authors in a room,
and give them five minutes per topic, then you do what you need,
and what you can. META=ROBOT is easier for us to define because it
is our domain... Description meta info is a much more hairy beast.

>the proposal about
>how to define the scheme used is not clear and it bring more confusion.

Not sure which scheme you're talking about now, but it's only a matter
of writing it up nicely, which I hope to do (or see).

>The description metadata must be used by authors and they need
>something of not complex.

What we said about NAME=DESCRIPTION is that it is text, no internal
structure or dictionary. Can't get much simpler than that.
What we said about NAME=KEYWORDS is that it is a list of comma-separated
phrases, which is also something everybody understands.

What we specifically said we didn't want to see is confusing internal
structure in CONTENT, or unexpected semantics with common formatting
practice.

The NAME=ROBOTS stuff requires some background on how robots work,
but the normal incantation, NOINDEX, is fairly self-explainatory.

>many catalogation-schemes are for librarians not for normal author and they
>ask for a method (maybe derived from dublin?) easy to use,

Dublin core easy to use? Have you looked at their latest version?
It's quite complex because they go into all sorts of syntax and
symantic relationships between different bits. I think they overshot
their objective on the "simple" aspect. In fact, from hearing Stu
a few times it appears that the move is away from the "single,simple"
scheme that doesn't satisfy everyone, to an extensible framework where
experts-in-the-field can slot in their own standards. I'd have to read
the Warwick papers to be sure. For web-wide indexing services this is
probably of little value as the users won't be experts.

>without a separation of the metainfo from the main file.

I'm not conviced this is a good thing; I believe it's a impossible to
satsify both an "easy to use" and a "powerful" (language, charset,
field-value pairs, hierarchical relationships) objectives in an HTML
tag you expect people to type. I also think that marrying the meta
information to the content-format (HTML) may slow both.

>who began this thread was looking for a simple scheme semantically
>understood by any robot, so to insert metadata in his files, and how many
>times have you met someone with the same problems? too many!

I wish computer serial ports all use DB25 plugs, so that I can use
a standard cable with any computer (he says as he plugs in another
5-segment cable :-). But while everybody does things differently,
and have different philosophies, it's just hard to standardise.

-- Martijn

Email: m.koster@webcrawler.com
WWW: http://info.webcrawler.com/mak/mak.html