wwwbot.pl problem

Andrew Daviel (andrew@andrew.triumf.ca)
Thu, 23 Nov 1995 12:42:51 -0800 (PST)


(I send a request to libwww-perl-request just before my last message
to the list, so I might not be on yet. Please Cc any replies to me.)

I was having trouble with wwwbot from the libwww-perl-0.40 library.
I continued to work on the problem after posting to the perl list.

It seems that botcache is not well enough defined, so that
a site with User-Agent: * Disallow / would kill subsequent GETs to a
site that was previously in the cache. I have made a patch which adds the
address to the cache, and fixes a couple of other odd cases, such as
where the address is not fully defined working within a domain,
and there are host names such as ypsun, ypsun2 etc. which would
become confused with the path count.

See ftp://andrew.triumf.ca/pub/wwwbot.patch

Andrew Daviel email: advax@triumf.ca
TRIUMF voice: 604-222-7376
4004 Wesbrook Mall fax: 604-222-7307
Vancouver BC http://andrew.triumf.ca/~andrew
Canada V6T 2A3 49D14.7N 123D13.6W