Something must be said about these limits.
- Some minimum which one should expect the robot to handle.  (I would
  say "MUST handle", but of course robots do as they please.)
- What should the robot do when it reaches a limit?  Assume Disallow by
  default, or Allow, or somehow depend on the record for the user-agent
  in question, or try to follow the User-Agent: * record (if found
  before the limit) instead, or...?
  A related point: it might be useful to allow robots to tell www sites
  that they did not like their robots.txt.  E.g.
	Errors-To: /cgi-bin/robot-message
  early in a bad http://www.uio.no/robots.txt might cause the robot to
  access
	http://www.uio.no/cgi-bin/robot-message ?
		error=Too+big+regexp &
		robots-txt-line=32 &
		URL=/failed/on/this/url
  where anything after the '?' is up to the robot, not specified by the
  RFC.  (It would have to be read by a human anyway, so little would be
  gained by some specific format.)
Regards,
Hallvard
_________________________________________________
This messages was sent by the robots mailing list. To unsubscribe, send mail
to robots-request@webcrawler.com with the word "unsubscribe" in the body.
For more info see http://info.webcrawler.com/mak/projects/robots/robots.html