Re: Possible robots.txt addition

Francois Rouaix (Francois.Rouaix@inria.fr)
Wed, 06 Nov 1996 16:37:34 +0100


Ian Graham noticed:

> A common problem (at least within our organization) is the expiry and/or
> change of domain names: internal departmental or divisional reorganizations
> lead to changes in domain names (in our case, *.utirc.utoronto.ca to
> *.hprc.utoronto.ca) and the eventual elimination of 'obsolete' domains,
> after some period of coexistence.
>
> Unfortunately, it is currently impossible to tell robots which of
> domain name should be used for a particular site.

and suggested an addition to robots.txt.

There is a recent addition to the HTTP protocol, proposed for 1.1,
but trivially implemented by any HTTP client : the Host header, which
contains (sic) the host name part of the URL, as known by the client.

Since the normal HTTP requests includes only the distant path part of the
URL, it was impossible to host different servers on the same machine with
the same port. This is now easier.
An application of this is to *redirect* (301) requests that refer in Host
to the old domain. This way, a canonical URL can be given for each actual
location.
Of course, this requires robot writers to add the Host: header in the
requests (trivial), and the webmaster to implement this redirection (I admit
I haven't check if that's easy).
It's maybe more difficult for the webmaster, but it has the advantage of
not being limited to robots. Any client will notice the change.

--f