Re: Broadness of Robots.txt (Re: Washington again !!!)

Erik Selberg (selberg@cs.washington.edu)
20 Nov 1996 23:50:25 -0800


> But you don't need to add a catagory. If the internal robot can be given
> a unique name, you can add it to robots.txt with more lax rules:
>
> User-agent: internalcrawler
> Disallow: /tmp
> Disallow: /cgi-bin
>
> User-agent: netnanny
> Disallow: /tmp
> Disallow: /Internal
> Disallow: /cgi-bin

so what happens when I get a second internalcrawler? the problem
inherent w/ robots.txt is the administration --- it needs to be easy!
aside from the sysadmin only problem, requiring the sysadmin to keep a
detailed inventory of all the crawlers and all their behaviors is
unreasonable, and it will force new crawlers to spoof other crawlers
that are similar. (MS IE spoofing Netscape ring any alarm bells for
folks?). The behavior category is more abstract and allows for better
administration than faking it with lots of user-agents.

-- 
				Erik Selberg
"I get by with a little help	selberg@cs.washington.edu
 from my friends."		http://www.cs.washington.edu/homes/selberg
_________________________________________________
This messages was sent by the robots mailing list. To unsubscribe, send mail
to robots-request@webcrawler.com with the word "unsubscribe" in the body.
For more info see http://info.webcrawler.com/mak/projects/robots/robots.html