Re: Domains and HTTP_HOST

Benjamin Franz (snowhare@netimages.com)
Thu, 7 Nov 1996 09:57:00 -0800 (PST)

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Hrvoje Niksic: "Matching the user-agent in /robots.txt"
Previous message: HipCrime: "ActiveAgent"
In reply to: Hrvoje Niksic: "Matching the user-agent in /robots.txt"

On Thu, 7 Nov 1996, Klaus Johannes Rusch wrote:

> URL: http://ipaddressoraname/document
>
> HTTP/1.0 compliant, present a list of virtual servers to pick from
> (or, if you can identify the virtual server based on the unique
> URL, redirect to the /~aname/document).
>
> The only assumption here is that the server does NOT run a
> non-virtual server at the same IP address, i.e. that /document
> is not a valid URL (except, of course, for the virtual /~document URLs).

This is a *really* bad assumption. Large multi-server sites *routinely*
use the same local URL path for different servers for ease of
maintainance. And systems with many independant clients will see
overlapping name spaces all the time: '/image/' for example.

>
> Case 4:
> -------
> URL: http://ipaddressoraname/~aname/document
>
> An non-HTTP 1.1 compliant client accessing the virtual location, just
> give it the document.
>
>
> Side note: I wonder why Host: was introduced other than for increasing
> complexity of server configuration and indexing, running virtual servers
> using different interfaces is much easier to handle (but slightly more
> expensive of course).

No - it isn't *any* easier to handle different interfaces. As soon as the
percentage of browsers passing 'Host: ' rises above 95% (it is currently
around 85%) we will shift all of our client sites to non-IP virtual hosts.
This will save us tons of IP addresses and reduce our startup overhead
when configuring new servers since we won't have to reconfigure our
interfaces each time we add a server. I have suspicion that *many* other
sites will do the same thing soon and that this will precipitate the
'Great Browser Upgrade' where the old (Netscape 1.2 and older, MSIE 2.0
and older, Lynx 2.4 and older, and most other browsers) browsers are
rapidly discarded as people find they can no longer browse many web sites.

The most important advantage of 'Host: ' is that it allows new non-IP
server capable (but not HTTP/1.1 capable) browsers to interoperate with
old 1.0 servers rather than breaking them by embedding the host in the
passed URL. As for indexing - if the robots just pass the 'Host: ' header
- it is no different than indexing the older unique IP servers. The 'is
this the same server under a different name' problem is insoluble in the
general case anyway. If you are determined to try and reduce that problem
- your best approach is the checksum everypage and use the checksums to
'fingerprint' sites. Of course - you will get a bunch of 'false idents'
from mirror sites that way.

-- 
Benjamin Franz

Next message: Hrvoje Niksic: "Matching the user-agent in /robots.txt"
Previous message: HipCrime: "ActiveAgent"
In reply to: Hrvoje Niksic: "Matching the user-agent in /robots.txt"