Re: Crawlers and "dynamic" urls

Klaus Johannes Rusch (e8726057@student.tuwien.ac.at)
Tue, 10 Dec 1996 18:13:18 CET


In <19961210002328.AAA22826@novo2.novomedia.com>, koblas@novomedia.com (David Koblas) writes:
> The general problem is that we rewrite URLs to contain a session key
> so that we can do behavour analysis (we do support cookes too). The
> problem is that this ID is only good for an hour, in the process of
> being crawlled by a search engine this ID eventually expires and a
> new one is issued. Of course this means that to the crawler there
> is a whole new set of unexplored URLs.
>
> Any ideas of how to deal with this type of problem.

What about putting the session key last, and restricting access for those URLs,
like

http://yoursite/special/document.html
http://yoursite/special/document.html/sessionkey1/
http://yoursite/special/document.html/sessionkey2/

are equivalent so you would add the following to robots.txt:

/document.html/

Klaus Johannes Rusch

--
e8726057@student.tuwien.ac.at, KlausRusch@atmedia.net
http://www.atmedia.net/KlausRusch/
_________________________________________________
This messages was sent by the robots mailing list. To unsubscribe, send mail
to robots-request@webcrawler.com with the word "unsubscribe" in the body.
For more info see http://info.webcrawler.com/mak/projects/robots/robots.html