Re: Suggestion to help robots and sites coexist a little better

Rob Hartill (robh@imdb.com)
Wed, 17 Jul 1996 08:33:54 -0600 (MDT)


Nick Arnett wrote:
>
>This is not a practical approach. First, it's a serious uphill battle to
>get widespread acceptance of changes on servers. It can be done, of
>course, but it takes time and political action...

I can push hard to get it into Apache. It also lets CGI scripts protect
themselves. CGI is probably the most vulnerable type of URL.

>Second, this doesn't solve one of the really big problems, which is that
>many people who publish information on the Web don't have any kind of
>administrative access to the server. They can't modify robots.txt, they
>can't set any server parameters... nor should they be able to.

Servers can implement robot protection with more than robots.txt.. e.g.
a server could allow robots.txt (or something equivalent) in various
directories (allowing more people write access).

The fact that it won't be useful to everyone shouldn't be used as an
excuse to reject the idea.

>This is why Fuzzy Mauldin led an effort at the W3C workshop on distributed
>search to come up with robot instructions that could be encoded in
>individual pages. This way, even if a thousand people are publishing on a
>single server, each one has the ability to tell robots what to do with
>regard to their pages.

This is useful, but it doesn't *protect* people from robots. By the
time my CGI has output HTML containing "don't index me" messages, it's
too late.. I've been hit with an unwanted CGI request.

>If the servers want to leverage that info somehow, that'll come, but any
>proposal like this has to take into account the limited privileges of Web
>publishers.

Let's have both.. There's no reason to have one or the other when both
are useful.

-- 
Rob Hartill (robh@imdb.com)
The Internet Movie Database (IMDb)  http://www.imdb.com/
           ...more movie info than you can poke a stick at.