alta vista and virtualvin.com

chris cobb (c-cobb@ix.netcom.com)
Mon, 6 May 1996 00:56:11 -0400


This is a question I hope Louis will be able to answer, but the topic should
be of interest to others.
---

Virtual Vineyards(VV) (www.virtualvin.com) is an example of a
site that customizes all internal links for each visitor by assigning
a 9 digit number to each new home page viewer.

When a person retrieves the home page for this site, each link
on the home page (which points to another part of virtual vineyards)
contains this newly created number. For example, the "what's new"
link from the home page for me might be:

www.virtualvin.com/vvdata/026684189/whatsnew.htm

and you may have

www.virtualvin.com/vvdata/552378463/whatsnew.htm

Both of us see the same "what's new" page but the server is
keeping track of us.

On the "what's new" page, each of us might have
a link back to the home page, but mine would continue to have
my ID and you yours. Neither of us would get a new ID unless
we reloaded the home page.

VV appears to use a custom server which strips this number
from each request and uses it to record the progress of each
user through the site. The site does allow purchases of items
with a "basket" metaphor - apparently using these IDs instead
of a cookie to identify incoming requests.

The problem arises when you consider that a webcrawler would
encounter unusual problems when cataloging a site of this nature.
Each referencing link (other than the home page)
that the crawler recorded would contain an
ID. As numerous, non-related people visited the site by way of
the crawler's index, the host site would become confused and the
entire tracking and shopping mechanism would break down. If two people
searched for the same chardonney at Alta Vista, visited
the exact same page using the query results and
clicked 'purchase', the site would see two purchase
requests from what looked to be the same person. The IDs would
remain the same as long as these users roamed the site -
even if they went back to the home page of VV while doing
so. Even more distressing, this ID that the crawler recorded
would progressively become days, weeks, and possible months
old but still remain used on a frequent basis by different people.

To examine this in practice, I searched for a low level (not home) page from
VV on Alta Vista. I wanted to see if Alta Vista did indeed record a
user ID when indexing these pages.

Alta Vista did not return a match to the low level page query, but
did contain an index of the VV home page.

I initially thought that a 'robots.txt' file was having an effect - perhaps
Alta Vista was not visiting any lower pages because the
creater of the VV site realized the potential problems of their
approach and attempted to guard against it by limiting crawler
access. I did not, however, find a robots file.

My questions:
- Why doesn't Alta Vista index the lower levels of this site? Even though there is not a
robots.txt file, it seems that Alta Vista is aware of what is going on and
manages to avoid the problem. How is this done?
- What comments do others have about indexing sites of this nature?

Chris Cobb