www.kollar.com/robots.html

John D. Pritchard (jdp@cs.columbia.edu)
Thu, 06 Jun 1996 16:13:50 -0400


hi,

i would like robots.txt to provide an "Allow" attribute, specifying which
pages a robot should go to. As sites become increasingly dynamic, it will
be more productive for a webhacker to specify which pages are static or
should be indexed than enumerating which are not static or should not be
indexed.

Many sites will have some number of static top or high level pages which
are entry points, beyond which nearly everything may be dynamic.

This is obviated only when the top-most page has links to every other entry
page. There is no reason why this would generally be the case, especially
in the case that other pages are reached via paths that are disallowed for
the robot -- but not necessarily unavailable to the robot by alternative
paths -- or are available only by very circuitous routes. Another
supporting case would be I/Code login or other authentication or login
intended for readership polling but not for security, behind which are
pages which would like to be indexed and which are available for robots via
alternative, unlinked paths.

Generally i am thinking about sites where fewer nodes would be indexed than
not -- and where paths are varied and not simple as in a dynamic site.

-john