Possible robots.txt addition

Ian Graham (ianweb@smaug.java.utoronto.ca)
Mon, 4 Nov 1996 14:35:49 -0500 (EST)


Not knowing a more appropriate forum, I thought I would bring up this
issue here.

A common problem (at least within our organization) is the expiry and/or
change of domain names: internal departmental or divisional reorganizations
lead to changes in domain names (in our case, *.utirc.utoronto.ca to
*.hprc.utoronto.ca) and the eventual elimination of 'obsolete' domains,
after some period of coexistence.

Unfortunately, it is currently impossible to tell robots which of
domain name should be used for a particular site. Consequently, a robot
can continue to index under an obsolete name, until the domain actually
disappears and all the references become invalid.

I therefore propose adding two new field to robots.txt, to indicate both
preferred and obsolete domain names for a given server. For example:

use-domain: preferred.domain.name
obsolete-domain: obsolete.domain.name

A robot could then check these domain names against the one it originally
used to access the site, make sure they all map to the same IP
address ( ;-) ), and then switch to the preferred name. In principle, it
could also iterate through existing names, and update those that are
obsolete.

Admittedly this is a bit outside the bounds of robot restriction rules,
and perhaps more properly belongs as part of a server meta-information
document or the HTTP protocol. But, neither of these two solutions
exist, and the required modifications to robots.txt are both simple
and quickly beneficial to site maintainers and robot-based indexers.

--
Ian Graham ......................................ian.graham@utoronto.ca
Information Commons                                   Tel: 416-978-4548
University of Toronto                                 Fax: 416-978-0440