Agent Categories (was Re: Broadness of Robots.txt (Re: Washington again !!!)

Martijn Koster (m.koster@webcrawler.com)
Thu, 21 Nov 1996 12:54:09 -0800


At 9:43 AM 11/21/96, Erik Selberg wrote:

>Hmm... rather than using a User-Agent, perhaps using a User-Type
>similar to Content-type (someone else made a similar suggestion, now
>that I think about it). For example:
>
>User-Type: robot/webcrawler # WebCrawler's robot
>User-Type: robot/scooter # DEC's robot
>User-Type: watcher/metacrawler # MetaCrawler
>
>Or is this too much headache in terms of trying to add yet another
>MIME type to the HTTP standard?

Well, there might actually be benefit to that beyond /robots.txt

Maybe the categories for the robots database can be a starting point:

purpose = indexing | maintenance | statistics
type = standalone | browser | plugin

Maybe indexing should be "indexing-public | indexing-private | indexing-live"
Maybe maintenance should be "link-validator | content-refresher |
htmlcheck" etc.

I have some severe doubts on being able to come up with reaseonable
categories though (as you can see above). Take "watcher"; what kind of
watching is that? Should we also have
"watcher that doesn't overload my server",
"watcher that doesn't strip adverts",
"watcher that has paid for this month's access",
"watcher that runs >24 hours" Etc.?

We are arguing about the definition for a robot ever other day;
now we can start arguing about the definiiton for 20 categories :-)

-- Martijn

Email: m.koster@webcrawler.com
WWW: http://info.webcrawler.com/mak/mak.html

_________________________________________________
This messages was sent by the robots mailing list. To unsubscribe, send mail
to robots-request@webcrawler.com with the word "unsubscribe" in the body.
For more info see http://info.webcrawler.com/mak/projects/robots/robots.html