Database Format
---------------

Records
-------
Records are formatted like RFC 822 messages.

Unless specified, values may not contain HTML, or empty lines,
but may contain 8-bit values.

Where a value contains "one or more" tokens, they are
to be separated by a comma followed by a space.

Fields can be repeated and grouped by appending number 2 and up,
for example:
  robot-owner-name1: Mr A. RobotAuthor
  robot-owner-url1: http://webrobot.com/~a/a.html
  robot-owner-name2: Mr B. RobotCoAuthor
  robot-owner-name2: http://webrobot.com/~b/b.html


Fields Schema
------
robot-id:	      
  Short name for the robot,
  used internally as a unique reference.
  Should use [a-z-_]+
  Example: webcrawler

robot-name:
  Full name of the robot,
  for presentation purposes.
  Example: WebCrawler

robot-details-url:
  URL of the robot home page,
  containing further technical details on the robot, 
  background information etc.
  Example: http://webcrawler.com/WebCrawler/Facts/HowItWorks.html

robot-cover-url:
  URL of the robot product,
  containing marketing details about either the robot,
  or the service to which the robot is related.
  Example: http://webcrawler.com/

robot-owner-name:
  Name of the owner. For service robots this is the person
  running the robot, who can be contacted in case of specific
  problems.
  In the case of robot products this is the person
  maintaining the product, who can be contacted if the
  robot has bugs.
  Example: Brian Pinkerton
                      
robot-owner-url:
  Home page of the robot-owner-name
  Example: http://info.webcrawler.com/bp/bp.html

robot-owner-email:	
  Email address of owner	
  Example: np@webcrawler.com
  
robot-status:
  Deployment status of the robot. One of:
  - development: robot under development
  - active: robot actively in use
  - retired: robot no longer used

robot-purpose:
  Purpose of the robot. One or more of:
  - indexing: gather content for an indexing service
  - maintenance: link validation, html validation etc.
  - statistics: used to gather statistics
  Further details can be given in the description

robot-type:
  Type of robot software. One or more of:
  - standalone: a separate program
  - browser: built into a browser
  - plugin: a plugin for a browser

robot-platform:	
  Platform robot runs on. One or more of:
  - unix
  - windows, windows95, windowsNT
  - os2
  - mac
  etc.

robot-availability:
  Availability of robot to general public. One or more of:
  - source: source code available
  - binary: binary form available
  - data: bulk data gathered by robot available
  - none
  Details on robot-url or robot-cover-url.

robot-exclusion:
  Standard for Robots Exclusion supported.
  yes or no

robot-exclusion-useragent:
  Substring to use in /robots.txt
  Example: webcrawler

robot-noindex:	
  <meta name="robots" content="noindex"> directive supported:
  yes or no

robot-host:
  Host the robot is run from. Can be a pattern of DNS and/or IP.
  If the robot is available to the general public, add '*'
  Example: spidey.webcrawler.com, *.webcrawler.com, 192.216.46.*

robot-from:
  The HTTP From field as defined in RFC 1945 can be set.
  yes or no

robot-useragent:
  The HTTP User-Agent field as defined in RFC 1945
  Example: WebCrawler/1.0 libwww/4.0

robot-language:
  Languages the robot is written in. One or more of:
  c,c++,perl,perl4,perl5,java,tcl,python, etc.

robot-description:
  Text description of the robot's functions.
  More details should go on robot-url.
  Example: The WebCrawler robot is used to build the database
           for the WebCrawler search service operated by GNN
           (part of AOL).
           The robot runs weekly, and visits sites in a random order.

robot-history:
  Text description of the origins of the robot.
  Example: This robot finds its roots in a research project
           at the University of Washington in 1994.

robot-environment:
  The environment the robot operates in. One or more of:
  - service: builds a commercial service
  - commercial: is a commercial product
  - research: used for research
  - hobby: written as a hobby

modified-date:
  The date this record was last modified. Format as in HTTP
  Example: Fri, 21 Jun 1996 17:28:52 GMT