Re: do robots send HTTP_HOST?

Aaron Nabil (nabil@teleport.com)
Mon, 14 Oct 1996 06:33:57 -0700 (PDT)


Joe Pruett writes...
> i want to keep robots out if they aren't using the canonical name for my
> web server (www.foo.com). but i can't find anywhere that documents if
> robots send the HTTP_HOST header defined in http 1.0.

Some do, some don't. Probably depends mostly on what the crawler is
written in.

The libwww-5 in perl has the option to. My crawler uses it and does send it.

The w3c library might also.

You might try these two URLs for kicks between lynx versions 2-4 and 2-5,
which didn't and do send a Host header respectively. Maybe the nice people
at hotwired will share their CGI with you.

http://www.hotwired.com/robots.txt
http://hard.hotwired.com/robots.txt

-- 
Aaron Nabil
nabil@teleport.com