Re: Broadness of Robots.txt (Re: Washington again !!!)

John D. Pritchard (jdp@cs.columbia.edu)
Thu, 21 Nov 1996 17:32:28 -0500


> >I totally agree with these statements. I would suggest a slightly
> >different implementation. The standard should include a list of general
> >behavior classifications and assign a fictitious "User-Agent" name to each
> >class.
>
> Whhops, you beat me to it :-)

how about using feature negotiation for rationalizing a robot class namespace

-*-*-*-*-*-*-*-*-*-*-*-*-

draft-holtman-http-negotiation-03.txt
(Transparent Content Negotiation in HTTP)

ABSTRACT

HTTP allows one to put multiple versions of the same
information under a single URL. Transparent content
negotiation is a mechanism, layered on top of HTTP, for
automatically selecting the best version when the URL is
accessed. This enables the smooth deployment of new web data
formats and markup tags.

Design goals for transparent content negotiation include: low
overhead on the request message size, downwards compatibility,
extensibility, enabling the rapid introduction of new areas of
negotiation, scalability, low cost for minimal support, end
user control, and good cacheability.

-*-*-*-*-*-*-*-*-

an HTTP request-header

robots-class-list: "name and version"

could dereference the appropriate robot class namespace version of
/robots.txt

benefits for robot builders: multiple robot class lists could exist under
this class list namespace.

implementors would choose from various class lists to fit their needs.

one class list would be extensive, another simple. one for search engines,
another for previewers.

benefit to the server: the request for a /robots.txt reveals a user-agent
as well as a robots-class-list for more valuable hit data. running robots
would be identified by user-agent as well as robot-class-list.

-john

_________________________________________________
This messages was sent by the robots mailing list. To unsubscribe, send mail
to robots-request@webcrawler.com with the word "unsubscribe" in the body.
For more info see http://info.webcrawler.com/mak/projects/robots/robots.html