Re: robots.txt , authors of robots , webmasters ....

Benjamin Franz (snowhare@netimages.com)
Thu, 18 Jan 1996 09:03:56 -0800 (PST)


On Thu, 18 Jan 1996, Robert Raisch, The Internet Company wrote:

>
> Perhaps what is really needed is a reevaluation of the role of
> the robots.txt file. If we take the stance, as I believe we
> should, that the decision to be indexed belongs in the hands of
> the owner of the data, not in the mechanical claws of wild
> roving robots, the robots.txt file should become the a source of
> permission not exclusion from indexing. And most importantly,
> that the expectation should be one of privacy, not exposure.
>
> In other words, we should not index a web-site if there is no
> robots.txt file to be retrieved that gives explicit permission
> to do so.

If you will review recent messages here you will discover that only about
5% of sites *have* a robots.txt file. This means that using the prescription
of 'don't index unless there is a robots.txt file' would result in about
one site in twenty being indexed *at best*. Because of there being such a
low probability of a site with a robots.txt file linking to *another*
site with a robots.txt file, the reality would be orders of magnitude
worse that that. A robot would have to be exceptionally lucky to find a few
hundred sites that way.

In other words - it completely destroys the usefullness of robots for
resource discovery.

It is, and must be, the responsibility of each site to provide their own
document security. If you don't want your pages indexed - add access
control or *don't put them on the web*. It is *trivial* on most servers to
block directory trees from remote access. You could even specifically
target the search engines for blocking.

If you don't want people reading your material - don't leave it on the
table in the reading room of the library (which is what you are doing
when you place documents on the WWW with no access control).

-- 
Benjamin Franz