Re: Alta Vista searches WHAT?!?

Wayne Lamb (wlamb@walnut.holli.com)
Wed, 17 Jan 1996 09:42:45 -0500 (EST)


Reinier Post wrote:
>
> You (Ed Carp @ TSSUN5) write:
> >
> >There has been a concern raised on another list that I belong to, about the
> >privacy implications of robots and such.
>
> >The specific example was that the
> >Alta Vista web crawler didn't only index linked documents, but any and all
> >documents that it could find at a site!
>
> Did you also get the messages in which the author explained that
> this isn't true?
>
> >Is this true, and if so, how is it doing it? How does one keep documents
> >private? I sure don't want my personal correspondence sitting out on
> >someone's database just because my home directory happens to be readable!
>
> I have a big problem with your phrase 'happens to be'.
>
> There have been more discussions like this, in which people were quite happy
> to make a bunch of documents available without restriction, except to
indexers.
> Their main idea was that it is common practice to keep documents 'out of
> sight' without actually indicating access restrictions explicitly. I think
> this is plainly wrong. On Unix, if you want to indicate who is allowed
access
> to your files, you use file permissions. If a certain file of mine is world
> readable, the implication is that I, the author, intentionally allow the
rest
> of the world to read my file. (Here, 'the world' means any user with access
> to the file system.) I have, occasionally, browsed other people's
directories
> and found stuff that wasn't intended for me to be read; I always assumed a
> mistake on their part, and decided not to read on, as a matter of courtesy.
> But the mistake was theirs.
>
> The same principle has always been assumed on the Internet, I guess.
> Iif you serve files off a WWW server without access restrictions,
> you intend to make them available to the rest of the world.
> There is no way of knowing the purpose of the accesses you get for your
> documents: it may be an individual user, a WWW indexer, or a secret program
> operated by the FBI/Mossad/KGB/whoever to scan for suspect activities.
>
> It's the access permissions that specify your intentions, not the existence
> of explicit references to the files, or the set of users you have told
> the URLs to your site explicitly, or anything else.
>
> In my opinion, it's a mistake to accuse robots of malicious behaviour
> when all they do is find files that have been made available to them.
>
> robots.txt should be regarded as a service to robots, a way of saying:
> don't bother to index this, the results won't justify the load it will
> place on the network and on my system. To honour this is a matter of
> courtesy. If you don't want robots to get access to your documents at
> all, then set proper access restrictions on the documents themselves.
>
> The only problem I see is that 'the world' is not the same for everybody.
>
> For example, suppose user A wants all files to be readable for all
> other users on the system. To user A, 'the world' is all users
> on the system. User A makes all files world readable.
>
> Now suppose that user B runs a WWW server, making all files on the system
> a vailable to the whole Internet. (User B will think twice before doing this
> on purpose, but it may be a configuration error.) Suddenly, user A's files
> have become available to the whole Internet community. Suppose that user C
> (a WWW indexer) finds user A's files. It is unreasonable for user A to
> blame C, when B is at fault. Obviously, there must be a way for A to
correct
> the problem, and get the files removed from C's index. This is possible in
> most WWW indexers. But if A is indignant at the mere fact that C found his
> files, s/he's barking up the wrong tree.
>
> --
> Reinier Post reinpost@win.tue.nl
> a.k.a. <A HREF="http://www.win.tue.nl/win/cs/is/reinpost/">me</A>
> [LINK] [LINK] [LINK] [LINK] [LINK] [LINK] [LINK] [LINK] [LINK] [LINK] [LINK]
>

Please take me off of your list

--