Re: Broadness of Robots.txt (Re: Washington again !!!)

Martijn Koster (m.koster@webcrawler.com)
Thu, 21 Nov 1996 12:53:59 -0800


At 6:50 PM 11/20/96, Captain Napalm wrote:

> But you don't need to add a catagory. If the internal robot can be given
>a unique name, you can add it to robots.txt with more lax rules:

Yup.

You could even play tricks by defining your categories as User-agents,
and look for a list in turn, like I mentioned before:

# we trust the link checker LCPro
User-agent: lcpro
Allow: /

# all link checkers are bogues
User-agent: alinkchecker
Disallow: /

Then a robot could search for its own name, then its categories,
then '*'.

Messy, but he :-)

> But, this does raise an issue: What about areas that are common to all
>robots (or rules)? It might (keyword, might) be a good idea to have a set
>of "global rules" that all robots follow, in addition to any specific rules.

You can either put that complexity in the spec and all robots, or leave it
up to the servers who ant it to do that in a pre-processing step, and
put the expanded output in /robots.txt.

> Then again, how many different rule sets does a typical robots.txt file
>have? Also, do specific rules for a robot override the "global rules"?
>Maybe not ...

Agreed :-)

-- Martijn

Email: m.koster@webcrawler.com
WWW: http://info.webcrawler.com/mak/mak.html

_________________________________________________
This messages was sent by the robots mailing list. To unsubscribe, send mail
to robots-request@webcrawler.com with the word "unsubscribe" in the body.
For more info see http://info.webcrawler.com/mak/projects/robots/robots.html