Re: Suggestion to help robots and sites coexist a little better

Scott 'Webster' Wood (swood@thewild.com)
Wed, 17 Jul 1996 17:39:14 -0400 (EDT)


> >Nick Arnett wrote:
> >Servers can implement robot protection with more than robots.txt.. e.g.
> >a server could allow robots.txt (or something equivalent) in various
> >directories (allowing more people write access).
> >
> >The fact that it won't be useful to everyone shouldn't be used as an
> >excuse to reject the idea.
>
> True, but the fact that a second solution would be required *is* a valid
> reason. Getting the market to adopt any change is difficult; the simpler
> it is, the more chance that it'll be adopted and sooner.

Again, let me comment. Presumably, the administrator is going to be
the one worried about the robots. Presumably, the administrator will
be willing to at least do something reasonably dynamic to have
reasonable control over robots. The existing robots text can either
forbid all robots from everything (simple) or can be more elaborate
for more dynamic and specific restrictions. (ie any solution can be
designed for both)

If separate instructions files were used, there could be additional
instructions added to let the robot know whether it should look for
robots.txt files in deeper directories. It might make the robot
writers job harder, but would allow users more control while still
allowing the administrator of the site to prevent robots from
endlessly looking for robots.txt files in directories where they do
not exist. Simpler yet, the root of the server could include
information to tell the robot specifically where it may or may not
find additional instructions, giving the administrator complete
control. (ie look for additional instructions under any ~username/)

Hairy, but effective...

Personally, I would like to see support for a single file that would
not only include simple instructions/restrictions for robots, but that
could be much more complex to include actual index information to
prevent the need for the robot to retrieve any resources other than
that file. It would also give the admin control over what is indexed
and how it would read. Adding a META key at the beginning of a 100+K
file may make the robot happy and give me control over keywords and
descriptions, but it still means that the server is going to send out
100K+ to a robot that does not know better.
(perhaps someone can fill me in more if robots can simply request the
HEAD - I have not started my coding yet for my internal test robot)

Scott