Re: image map traversal

Benjamin Franz (snowhare@netimages.com)
Mon, 25 Mar 1996 09:37:27 -0800 (PST)


-----BEGIN PGP SIGNED MESSAGE-----

On Mon, 25 Mar 1996, Mark Norman wrote:

> Hello,

> I previously asked the question: how can you retrieve an image map file from a
> web server? (the ascii file with the urls and coordinates). Someone answered
> that web servers do not allow you to retrieve this file. This leads me to
> wonder how a robot can descend the document tree of a web site if, as is the
> case for many sites, the path to lower level documents is through image maps
> and not through explicit hyperlinks?

The simple answer is: you can't. One more *very* good reason not to hide
a site exclusively behind imagemaps. The search engines probably won't be
able to index your site.

> You could interrogate the map completely if you know its dimensions, but I
> don't know how to get the dimensions.

There are a number of tools that could figure its dimensions either via tha
HEIGHT and WIDTH tags or by looking in the image file itself. In
fact, given the occasional use of HEIGHT and WIDTH to resize images,
you should use both. Look for a perl program called 'wwwimagesize'.

> But this would require many, many queries of the image map to find its
> hotspots.

Yup. Order X * Y accesses. Even a modestly sized image map is too large to
explore *completely* via automation. You could accelerate the exploration
by using a low resolution grid, say every 15 pixels or so (which would
still require about 278 accesses to explore a 250x250 map), with the
trade off that you could miss very small hotzones. You should also
store the *final* URL instead of the XY coord as the URL of the page
fetched to prevent false duplication of pages.

Benjamin Franz

-----BEGIN PGP SIGNATURE-----
Version: 2.6.2

iQCVAwUBMVbZhOjpikN3V52xAQEj5gP+J193pvHZiVFDOC3w0t4RhnkcIjPgalWh
fw8hW5tzFBtTq85ZapMdgPi7R4l6Bkmqca54M8kefR0ehe9I1Be5KF154BHN3b7E
jXf41H8AkRmMK7JBmEX3E0qIUgdXmu2fttvGJdvAx4Kpt2qKt/z9aUEgwjkTJixR
Ql05Jg4cuYs=
=BLTl
-----END PGP SIGNATURE-----