Re: More Robot Talk

Nick Arnett (narnett@Verity.COM)
Fri, 17 Jan 1997 08:47:48 -0800


At 03:54 PM 1/17/97 +0200, Jaakko Hyv=E4tti wrote:
>
> Nice to have something of a technical nature for a change here.
>
> The heuristics for URL canonicalization presented here do not yet take
>into account the new HTTP/1.1 Host: request header. It complicates the
>matters more once some servers actually will have multiple virtual hosts
>with a single IP address, differentiated by Host: headers or absolute
>URI's in request headers (see rfc2068).

Or worse, multiple hosts *and* multiple IP addresses. Some of the large
sites, Netscape, notably, have this situation, which is not resolvable at
this level.

Nick

---------------------------------------
Verity Inc.
Connecting People with Information

Product Manager, Categorization and Visualization
408-542-2164; home office 408-369-1233; fax 408-541-1600
http://www.verity.com

_________________________________________________
This messages was sent by the robots mailing list. To unsubscribe, send mail
to robots-request@webcrawler.com with the word "unsubscribe" in the body.
For more info see http://info.webcrawler.com/mak/projects/robots/robots.html