A common problem (at least within our organization) is the expiry and/or
change of domain names: internal departmental or divisional reorganizations
lead to changes in domain names (in our case, *.utirc.utoronto.ca to
*.hprc.utoronto.ca) and the eventual elimination of 'obsolete' domains,
after some period of coexistence.
Unfortunately, it is currently impossible to tell robots which of
domain name should be used for a particular site. Consequently, a robot
can continue to index under an obsolete name, until the domain actually
disappears and all the references become invalid.
I therefore propose adding two new field to robots.txt, to indicate both
preferred and obsolete domain names for a given server. For example:
use-domain: preferred.domain.name
obsolete-domain: obsolete.domain.name
A robot could then check these domain names against the one it originally
used to access the site, make sure they all map to the same IP
address ( ;-) ), and then switch to the preferred name. In principle, it
could also iterate through existing names, and update those that are
obsolete.
Admittedly this is a bit outside the bounds of robot restriction rules,
and perhaps more properly belongs as part of a server meta-information
document or the HTTP protocol. But, neither of these two solutions
exist, and the required modifications to robots.txt are both simple
and quickly beneficial to site maintainers and robot-based indexers.
-- Ian Graham ......................................ian.graham@utoronto.ca Information Commons Tel: 416-978-4548 University of Toronto Fax: 416-978-0440