Re: Limiting robots to top-level page only (via robots.txt)?

Jaakko Hyvatti (Jaakko.Hyvatti@www.fi)
Tue, 26 Mar 1996 12:02:40 +0200 (EET)


doucette@tinkerbell.macsyma.com (Chuck Doucette):
> So, if indeed I wanted to prevent a robot from indexing any page other
> than the default one for the top-level (http://www.macsyma.com/), how
> could I do that?

With the Disallow: lines in robots.txt one disallows all pathname
strings starting with the specified string. If you want to allow / but
disallow /..*, you may list all your top-level files and directories:

User-agent: *
Disallow: /xyz.html
Disallow: /sales
...

which you have to update any time you add new files/dirs, or you
could disallow all files by their first letter and list all possible
filename starting letters, so you do not have to maintain robots.txt
too often:

User-agent: *
Disallow: /a
Disallow: /b
Disallow: /c
Disallow: /d
...
Disallow: /z