Crawlers and "dynamic" urls

David Koblas (koblas@novomedia.com)
Mon, 9 Dec 1996 16:23:28 -0800


I knew this would eventually be a problem, but right now I'm being
"overcrawlled" by Excite.

The general problem is that we rewrite URLs to contain a session key
so that we can do behavour analysis (we do support cookes too). The
problem is that this ID is only good for an hour, in the process of
being crawlled by a search engine this ID eventually expires and a
new one is issued. Of course this means that to the crawler there
is a whole new set of unexplored URLs.

Any ideas of how to deal with this type of problem.

David
--koblas@novoemdia.com

ps. Ideas like the "sitelist.txt" would be nice, since it means that
we would provide an index of our site to the crawler, rather then
having a crawler determine it and deal with the IDs.

pps. I have a perl script that build/rebuild a sitelist.txt file, now
if crawlers would only support it.

_________________________________________________
This messages was sent by the robots mailing list. To unsubscribe, send mail
to robots-request@webcrawler.com with the word "unsubscribe" in the body.
For more info see http://info.webcrawler.com/mak/projects/robots/robots.html