Re: Possible robots.txt addition

Issac Roth (iroth@cisco.com)
Tue, 5 Nov 1996 20:47:09 -0800 (PST)

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Martin Kiff: "Re: Possible robots.txt addition"
Previous message: Hrvoje Niksic: "A new robot -- ask for advice"
In reply to: Ian Graham: "Possible robots.txt addition"
Next in thread: Francois Rouaix: "Re: Possible robots.txt addition"

> Not knowing a more appropriate forum, I thought I would bring up this
> issue here.
>
> A common problem (at least within our organization) is the expiry and/or
> change of domain names: internal departmental or divisional reorganizations
> lead to changes in domain names (in our case, *.utirc.utoronto.ca to
> *.hprc.utoronto.ca) and the eventual elimination of 'obsolete' domains,
> after some period of coexistence.
>
> Unfortunately, it is currently impossible to tell robots which of
> domain name should be used for a particular site. Consequently, a robot
> can continue to index under an obsolete name, until the domain actually
> disappears and all the references become invalid.
>
> I therefore propose adding two new field to robots.txt, to indicate both
> preferred and obsolete domain names for a given server. For example:
>
> use-domain: preferred.domain.name
> obsolete-domain: obsolete.domain.name

This seems like a really useful idea. It certainly isn't robot exclusion, but
robots.txt seems as appropriate a place as any for it. (It could go in another
file as well) I'd like to see something like this though:

use-name: myhost.mydomain.edu

Which would allow a robot to know what hostname to return for the URLs on that
site when that host goes by many different names.

Anyone?

Issac

>
> A robot could then check these domain names against the one it originally
> used to access the site, make sure they all map to the same IP
> address ( ;-) ), and then switch to the preferred name. In principle, it
> could also iterate through existing names, and update those that are
> obsolete.
>
> Admittedly this is a bit outside the bounds of robot restriction rules,
> and perhaps more properly belongs as part of a server meta-information
> document or the HTTP protocol. But, neither of these two solutions
> exist, and the required modifications to robots.txt are both simple
> and quickly beneficial to site maintainers and robot-based indexers.
>
> --
> Ian Graham ......................................ian.graham@utoronto.ca
> Information Commons Tel: 416-978-4548
> University of Toronto Fax: 416-978-0440
>

Next message: Martin Kiff: "Re: Possible robots.txt addition"
Previous message: Hrvoje Niksic: "A new robot -- ask for advice"
In reply to: Ian Graham: "Possible robots.txt addition"
Next in thread: Francois Rouaix: "Re: Possible robots.txt addition"