HEAD request [was Re: Server name in /robots.txt]

Davide Musella (davide@jargo.itim.mi.cnr.it)
Tue, 23 Jan 1996 19:26:19 -0100


> The server is supposed to parse the document, and slam the value into
> an HTTP header. This is of course a waste of server CPU and bandwidth
> for the majority of cases, and opens a whole can of worms with the
> semantics of HTTP header namespace collisions.
It isn't the only way to handle the META info, The WN server does it using
a table, so they parse the document only once a day.
Ok, it isn't the best way, but there are many ways to resolve it.

> This makes far more sense -- let user-agents decide what they want to do
> with the data;

Yes, but if they can work only with the data content in an HTTP header,
why request the whole document...
You can save the 90% of retrieve time, and the load of the net will be a bit lower.

> So the idea is that you can do both HTTP-EQUIV=foo and NAME=bar in the
> same META tag. The last draft I saw on the subject had HTTP-EQUIV
> as the main thing, with NAME being optional. I think it makes far
> more sense to have NAME, and abolish HTTP-EQUIV, or at least make
> it a secondary choice.
> In fact it'd be good if robots started to promote this. I'd add it
> to WebCrawler if I wasn't buried in other work...
But, if the webCrawler can index a doc by the content of the META NAME tag
it can also use the META HTTP-EQUIV tag so it can use an HEAD request
have the indexing info without parse the document and be sure to have
the best indexing info about that doc, 'cause the author has indexed it for you.
I've made some alterations to that draft, to be clearer and more exhaustive.

You can find the draft here. Suggestions are welcome.

Davide

-----------
Davide Musella
davide@jargo.itim.mi.cnr.it

INTERNET DRAFT Davide Musella
draft-musella-html-metatag-02.txt National Research Council

The META Tag of HTML

[...]

1. Introduction

Now the synopsis of the META HTTP-EQUIV Tag is not severe, allowing so
the use of different key words to define the same things.
The functions like this:

<META HTTP-EQUIV = "author" CONTENT = "Pennac, Rossi">
or:
<META HTTP-EQUIV = "writer" CONTENT = "Pennac, Rossi">

could reppresent the same concepts with two different syntax.
The aim of this Draft is to define which are the words to use to
define the contents of an HTML document.
There are, also, some easy rules to implement a binary logic (AND or
OR) for the CONTENT field.

2. The META Tag

The META element is used within the HEAD element to embed documents
meta-information not defined by other HTML elements. Such information
can be extracted by servers/clients for use in identifying, indexing
and cataloging specialized document meta-information.

Although it is generally preferable to used named elements that have
well defined semantics for each type of meta-information, such as
title, this element is provided for situations where strict SGML
parsing is necessary and the local DTD is not extensible.

In addition, HTTP servers can read the contents of the document head
to generate response headers corresponding to any elements defining
a value for the attribute HTTP-EQUIV. This provides document authors
with a mechanism (not necessarily the preferred one) for identifying
information that should be included in the response headers of an
HTTP request.

The META element has three attributes:

- HTTP-EQUIV
- NAME
- CONTENT

The HTTP-EQUIV and the NAME attributes are mutually exclusives.

3. HTTP-EQUIV.

This attribute binds the element to an HTTP response header. If the
semantics of the HTTP response header named by this attribute is
known, then the contents can be processed based on a well defined
syntactic mapping, whether or not the DTD includes anything about it.
HTTP header names are case insensitive. If absent, the NAME
attribute should be used to identify the meta-information and it
should not be used within an HTTP response header.
It is possible to use any text string, but if you want to define
these properties you have to use the following words:

keywords: to indicate the keywords of the document
author: to indicate the author of the document
timestamp: to indicate when the document is authored
(HTTP-date format)
expire: to indicate the expire date of the document
(HTTP-date format)
language: to indicate the language of the document
(using ISO3316 code or ISO639 code)
abstract: to indicate the abstract of the document
organization: to indicate the organization of the author
revision: to indicate the revision number of the document
(format: 00, 01, 02, or 000, 001, ...)

An HTTP server must process these tags for an HEAD HTTP requestr.
Do not name an HTTP-EQUIV attribute the same as a response header
that should typically only be generated by the HTTP server. Some
inappropriate names are "Server", "Date", and "Last-Modified".
Whether a name is inappropriate depends on the particular server
implementation. It is recommended that servers ignore any META
elements that specify HTTP equivalents (case insensitively) to their
own reserved response headers.

4. NAME.

This attributes can be used to define some properties such as "number
of pages" or "preferred browser" or any info an author want to insert
in his document. The keywords indicates in the previous paragraph for
the HTTP-EQUIV are still valid also in the NAME context.
An example:

<META NAME= "Maybe Published By" CONTENT = "McDraw Bill">
or
<META NAME= "keywords" CONTENT = "manual, scouting">

Do not use the META element to define information that should be
associated with an existing HTML element.


5. CONTENT

Used to supply a value for a named property.
It can contain more than one single information; it is possible to
use the Boolean operator (AND, OR) to insert a Boolean definition of
the field.
The AND operator will be represented by the SPACE (ASCII[32]) and the
OR operator by the COMMA (ASCII[44]).
The AND operator is processed before the OR operator. So a string
like this: "Red ball, White pen" means :"(Red AND ball) OR (White AND pen)".
Example:

<META HTTP-EQUIV= "Keywords" CONTENT= "Italy Products, Italy Tourism">

The spaces between a comma and a word or vice versa are ignored.

6. Cataloguing an HTML document

These 'keywords' were specifically conceived to catalogue HTML documents.
This allows the software agents to index at best your own document.
To do a preliminary indexing, it's important to use at least the
HTTP-EQUIV meta-tag "keywords".