RE: Broadness of Robots.txt (Re: Washington again !!!)

Martin.Soukup (martin.soukup@fulcrum.com)
Fri, 29 Nov 1996 16:54:56 -0500


>----------
>From: Erik Selberg[SMTP:selberg@cs.washington.edu]
>Sent: Thursday, November 21, 1996 2:50 AM
>To: Captain Napalm
>Cc: robots@webcrawler.com
>Subject: Re: Broadness of Robots.txt (Re: Washington again !!!)
>
>> But you don't need to add a catagory. If the internal robot can be given
>> a unique name, you can add it to robots.txt with more lax rules:
>>
>> User-agent: internalcrawler
>> Disallow: /tmp
>> Disallow: /cgi-bin
>>
>> User-agent: netnanny
>> Disallow: /tmp
>> Disallow: /Internal
>> Disallow: /cgi-bin
>
>so what happens when I get a second internalcrawler? the problem
>inherent w/ robots.txt is the administration --- it needs to be easy!
>aside from the sysadmin only problem, requiring the sysadmin to keep a
>detailed inventory of all the crawlers and all their behaviors is
>unreasonable, and it will force new crawlers to spoof other crawlers
>that are similar. (MS IE spoofing Netscape ring any alarm bells for
>folks?). The behavior category is more abstract and allows for better
>administration than faking it with lots of user-agents.

Wouldn't it make the whole idea much simpler and greatly increase the
ease of use (and with that the number of people using it) if a part of
the standard (or along with the standard) a generation/admin program was
included. If one or more programmers on this list maintained an admin
package the standard could be very detailed and even carry a decent
degree of complexity without turning sysadmins off. Just an idea,
because the way i see it, robot writers (and even agent writers) could
really use a good RES, but the more we try to add to it the less likely
admins are going to be to use it. Also, it worries me that microsoft
doesn't seem to wholly be on the boat. Either way, if anyone is
interested in working on an admin package, or getting one, i'll whip the
basis of one up in a weekend. Thoughts?

Martin
_________________________________________________
This messages was sent by the robots mailing list. To unsubscribe, send mail
to robots-request@webcrawler.com with the word "unsubscribe" in the body.
For more info see http://info.webcrawler.com/mak/projects/robots/robots.html