Re: HOST: header

Rob Hartill (robh@imdb.com)
Wed, 17 Jul 1996 11:26:57 -0500 (CDT)



>> A request from an Apache developer (not me)..
>>
>> Please could robot owners start sending HOST headers. Aparently this'll
>> help server admins to filter out bad URLs. This is in everyone's interest,
>> unless there's someone out there that thinks robot writers have a right
>> to use whatever URLs they want (nothing would surprise me any more).
>>
>> thanks.
>> rob
>>
>You may want to tell your friend that they can configure the apache server
>itself to restrict access for hosts without the need for robots.txt. Also,
>a simple firewall instruction can restrict such requests from specific hosts.

I think you're missing the point. The HOST header tells the server which
host address is in the URL being used to access the site. There are lots
of bad URLs out in net-land that work, but refer to the wrong hostname.
Some server admins care enough to want to make an effort to correct those
bad URLs by detecting them and reacting differently.

e.g. us1.imdb.com is a valid address in a URL for us.imdb.com, but people
shouldn't be using it, and robots shouldn't index it if I can stop them..
the only way I can stop them is if they tell me (the server) which hostname
they are using. HTTP/1.0 doesn't do that, but 1.1's HOST header does.

Anyway, I'm not using this yet, but the person who asked me to forward
the request is using this feature to cleanup links to his site.

Check the HTTP/1.1 spec for more info on the HOST header.

-- 
Rob Hartill (robh@imdb.com)
The Internet Movie Database (IMDb)  http://www.imdb.com/
           ...more movie info than you can poke a stick at.