Finding the canonical name for a server

Jaakko Hyvatti (Jaakko.Hyvatti@Elma.FI)
Mon, 10 Jun 1996 14:11:15 +0300 (EET DST)


There has been some discussion earlier about the redundancies
in the web servers caused by symbolic links or other replication of
files in different URLs.

Now there is another minor issue about redundancy where a server has
multiple aliases in DNS or even multiple IP addresses, or is mirrored
in multiple sites. I believe at least with the DNS aliases that have
the same IP, one should register URLs with the host part of only one
of these aliases into a search service.

The problem is to find the native DNS name corresponding the
www-server. Resolving reverse name for the IP does not qualify, as it
usually is something else than the server name that the server wants
to exhibit as its name.

Also the DNS canonical name (CNAME) is not better
as www servers are usually configured into DNS like this:

alpha2 IN A 1.2.3.4
www IN CNAME alpha2

The reverse is like:

$ORIGIN 3.2.1.IN_ADDR.ARPA.
4 IN PTR alpha2.company.com.

(With numerical URL host parts the reverse is of course better than
nothing.)

The correct canonical name is I believe the name configured into
HTTPD for returning them in Location: headers with 302 or 304
redirects. For example with NCSA and Apache servers this is done
with ServerName command in httpd.conf. How to obtain it?

- index the site until you get a 302 or 304 redirect, and extract
the host part from there. Check that the IPs match.
- index the site until you find a URL pathname with at least 1
hierarchical level. Issue a HEAD request with incorrect URL
with the trailing / stripped off the hierarchy (directory) name.
If you get a 302 redirect, strip off the host part and check that
the IPs match.

I'll demonstrate this with a live example:

$ telnet www.natsemi.com 80
Trying 199.2.26.194...
Connected to www.natsemi.com.
Escape character is '^]'.
HEAD /design HTTP/1.0

HTTP/1.0 302 Found
Server: Netscape-Commerce/1.12
Date: Monday, 10-Jun-96 11:03:37 GMT
Location: http://www.national.com/design/
Content-type: text/html
Content-length: 217

Connection closed by foreign host.
$

There the canonical name of www.natsemi.com is www.national.com.
The IPs match also. Well the DNS reverse would have returned the same
result.

Do whichever of the above is possible first, and register all
already fetched pages with the newly found canonical server name.
Make sure that any future requests are made with that name, unless
the DNS information changes (that's difficult).

What do you think? Is this implemented somewhere? Are there
problems with it?