Forgot to answer the second part of your question. A well-behaving robot would
at least send the following headers:
- User-agent: a unique identification for your robot in the format
yourrobotsname/version
- From: your email address, so people have a chance to
contact you in case of a problem with your robot, e.g.
Mark Norman <mnorman@hposl41.cup.hp.com>
- Accept: a list of MIME types your robot can handle, e.g. text/html
- If-Modified-Since: a date when you last indexed a ressource, so you don't
need to get the whole document if it hasn't changed anyway
See <URL:http://info.webcrawler.com/mak/projects/robots/guidelines.html> for
further guidelines for robots, and <URL:http://www.w3.org/> for details of the
HTTP protocol.
Klaus Johannes Rusch
-- e8726057@student.tuwien.ac.at, KlausRusch@atmedia.net http://www.atmedia.net/KlausRusch/ _________________________________________________ This messages was sent by the robots mailing list. To unsubscribe, send mail to robots-request@webcrawler.com with the word "unsubscribe" in the body. For more info see http://info.webcrawler.com/mak/projects/robots/robots.html