Also, perhaps the importance of dealing with this could be more
prominent, with the potential problems to the site explained (I
experienced some data corruption myself), on caching-related sections
of generic cgi FAQ's.
Respectfully,
-Ann Cantelow
-------------------The Interactive Poetry Pages----------------------
Collaborative poetry in real time- across the net.
http://www.csd.net/~cantelow/poem_welcome.html
---------------------------------------------------------------------
---------------------------------------
On Sat, 1 Jun 1996, Louis Monier wrote:
> This is an old thread, but I was out of town, then busy.
>
> If one thing about this whole robot field worries me, it is the
> existence of sites like this one. If you think about it, this scheme is
> bad for everyone:
> 1. the robot, which can get trapped and visit the same pages (or worse,
> slightly different versions of the same pages) over and over.
> 2. the site, whose access stats and visitor database is all screwed up.
> 3. the users of the index, who inherit a large number of bogus URLs, and
> further contributes to (2) by inherinting one of the robot's IDs.
>
> Need I say more? I think this scheme is detestable. Cookies may be the
> way to go, and if one does not want to rely on them, at least use a
> decent syntax so that robots can guess the trick, say by making it
> obvious that a script is been invoked with arguments. Having one common
> encoding (a 10-digit number as first path element) would be good, but
> it's too late. Another idea would be for these sites to recognize
> robots somehow, and only generate "clean" URLs, so robots would take
> only one trip through the site. But again, that's a lot of people to
> convince.
>
> So in the meantime, we use a semi-automatic solution: such sites are
> suspected, manually confirmed, and added to a s---list so that only
> their top-level page is indexed. I suspect that people trying to run
> fast robots right now, and who have not yet found out about this
> phenomenon, are simply accumulating junk from these sites. Ah ah!
>
> Seriously, this is a big problem. My friends at w3 tell me not to worry
> because cookies will eventually eradicate such schemes, but in the
> meantime this is a real problem. Any thoughts?
>
>
> --Louis
>