Re: Broadness of Robots.txt (Re: Washington again !!!)

Erik Selberg (selberg@cs.washington.edu)
21 Nov 1996 15:30:14 -0800


Rob Hartill <robh@imdb.com> writes:

> Erik Selberg wrote:
>
> >Hmm... rather than using a User-Agent, perhaps using a User-Type
> >similar to Content-type (someone else made a similar suggestion, now
> >that I think about it). For example:
> >
> >User-Type: robot/webcrawler # WebCrawler's robot
> >User-Type: robot/scooter # DEC's robot
> >User-Type: watcher/metacrawler # MetaCrawler
>
> Some more new robot headers to consider:
>
> Robot-confidence: 45 # owner's 45% confident that his code won't
> #explode. This could be computed on the basis of
> #the ratio of sites visited and sites trashed.
> Days-since-last-accident: 2 # when robot last screwed up.
> Responses-not-understood: 301 302 40* # response codes that are a foreign
> # language to the robot and not
> # worth sending unless you want to
> # see how fast the robot can connect.
> Janitor-phonenumber: 555 1234 567 # Just in case the power cord needs
> # yanking from the wall.
> Standard-excuses: 1,5,9 # hashes to lookup text excuses in a RFC, this saves
> # the server admin contacting the site for an
> # explanation and saves the robot owner responding.
> # A big bandwidth saver.
>

Great! Terrific! But I don't think I could support adding them unless we
also tossed in:

Robot-sneeze-and-destroy: 5 # delay between when you post to the robots list
# mentioning it when and the robot comes
# to your site

Robot-bug-delay: 3 # delay between when you find and report
# an errant robot and when the problem is
# fixed.

Robot-bs-or-not: 30% # Confidence that the bug has actually
# been fixed since last reported bug,
# and confidence that another bug
# hasn't crept in (although maybe that
# should be a separate header?)

-Erik

-- 
				Erik Selberg
"I get by with a little help	selberg@cs.washington.edu
 from my friends."		http://www.cs.washington.edu/homes/selberg
_________________________________________________
This messages was sent by the robots mailing list. To unsubscribe, send mail
to robots-request@webcrawler.com with the word "unsubscribe" in the body.
For more info see http://info.webcrawler.com/mak/projects/robots/robots.html