Re: robots.txt , authors of robots , webmasters ....

Kevin Hoogheem (khooghee@marys.smumn.edu)
Thu, 18 Jan 96 13:22:37 -0600


>
>
> Begin forwarded message:
>
>
> Previously, I wrongly accused Alta-Vista of indexing pages that
> I had no interest in having indexed. It turned out that rather
> than poking each TCP port for an HTTP server, Alta-Vista
> actually did what every other 'bot does and follows all the
> links it can find. I spent some tube-time sleuthing and
> discovered that the pages were indeed referenced from other,
> generally accessible pages.
>
> I now believe my indignation at the possibility of this
> port-poking behavior was based on two separate considerations:
>
> 1. that the poking of ports would impose an unwelcome
> burden on my servers, and

First I dont think that too many web-robot writers would write it
so that it would probe all ports on a machine, rather would write
the option to look at other ports on the runners command or if they
were
to find a differant port mentioned in a url.

>
> Where did we get the idea that just because a thing is
> accessible, that that gives us the moral right to access it,
> perhaps against the interests of its owner
>
> In another message, Reinier states his belief that if a user
> makes the mistake of exposing his home directory to the web,
> that we (as robot owners) can index anything we find there with
> impunity; that the error is on the part of the web-master and
> not on the part of the robot's designer.
>
> Let me see if I understand Reinier's point and can perhaps
> state it another way: If I leave my house unlocked, I have
> given my permission for any and all to come in and read my
> personal papers. Does this strike anyone else as somewhat
> absurd?
>
> In our enthusiasm to become the cartographers of this new
> region of the information universe, do we not run the risk of
> violating the privacy of the indigenous peoples we find there?
>
> I believe that this "-WE- are the most comprehensive index of
> cyberspace" mentality is very dangerous and suggests a kind of
> information vigiliantism that I find personally distasteful.
>
> Perhaps what is really needed is a reevaluation of the role of
> the robots.txt file. If we take the stance, as I believe we
> should, that the decision to be indexed belongs in the hands of
> the owner of the data, not in the mechanical claws of wild
> roving robots, the robots.txt file should become the a source of
> permission not exclusion from indexing. And most importantly,
> that the expectation should be one of privacy, not exposure.
>
> In other words, we should not index a web-site if there is no
> robots.txt file to be retrieved that gives explicit permission
> to do so.
>
> It should be noted that there is a fairly strong case to be
> made that a robot threshing through a non-published web site is
> an illegal activity under the abuse of computing facilities
> statute in U.S. law.
First off I do think that we as computer users on Unix systems
think that there should be some level of protection of our documents
that if it was intended to be private then they should be
protected. But in the other case if they set up a web direcory then
they are saying that this information is PUBLIC and any one that
wishes to search it out can freely look at it. They should take the
time and trouble to lock it up and make it so that no one but the
intended people can see it. True if I leave my house unlocked I
dont want anyone going into it but that is the risk I take isnt it?
Also I feel that no one is realy out there doing a ip sweap of
every number out there trying to connect to port 80 as of yet to
find every server they possibley can, not only would that take
forever but would put a big burden on not only there machine and
users but the time wasted to find what???
I also feel that it is up to the web robot writers to share in
some responsiblity to write robots that do not try to go out of the
WWW published directorys and maybe themselfs just not even look into
folders that might be of test related documents.. Why look into a
folder called test unless there is a freely published html document
that refers to that.