>Hmm... rather than using a User-Agent, perhaps using a User-Type
>similar to Content-type (someone else made a similar suggestion, now
>that I think about it). For example:
>
>User-Type: robot/webcrawler # WebCrawler's robot
>User-Type: robot/scooter # DEC's robot
>User-Type: watcher/metacrawler # MetaCrawler
>
>Or is this too much headache in terms of trying to add yet another
>MIME type to the HTTP standard?
Well, there might actually be benefit to that beyond /robots.txt
Maybe the categories for the robots database can be a starting point:
purpose = indexing | maintenance | statistics
type = standalone | browser | plugin
Maybe indexing should be "indexing-public | indexing-private | indexing-live"
Maybe maintenance should be "link-validator | content-refresher |
htmlcheck" etc.
I have some severe doubts on being able to come up with reaseonable
categories though (as you can see above). Take "watcher"; what kind of
watching is that? Should we also have
"watcher that doesn't overload my server",
"watcher that doesn't strip adverts",
"watcher that has paid for this month's access",
"watcher that runs >24 hours" Etc.?
We are arguing about the definition for a robot ever other day;
now we can start arguing about the definiiton for 20 categories :-)
-- Martijn
Email: m.koster@webcrawler.com
WWW: http://info.webcrawler.com/mak/mak.html
_________________________________________________
This messages was sent by the robots mailing list. To unsubscribe, send mail
to robots-request@webcrawler.com with the word "unsubscribe" in the body.
For more info see http://info.webcrawler.com/mak/projects/robots/robots.html