Re: robots.txt

Jaakko Hyvatti (Jaakko.Hyvatti@www.fi)
Thu, 15 Feb 1996 10:38:43 +0200 (EET)


Tangy Verdell <TVerdell@dca.com>:
> how standardized is the robots.txt file. has anyone ran into problems where a
> significant number of sites have made typos or erros in their robots.txt file.

Of 810 Finnish (*.fi) sites 731 have no robots.txt, and of the rest
79 the following 10 have something wrong with theirs. I'll list them
here because they are so few, just for laughs and to give you some material.
No actual typos here, but other mistakes (no empty lines between entries,
multiple directories on one line, empty user-agent).

(I have mailed the webmasters, but it doesn't always help.. and if the
server software gives status 200 OK instead of 404, how can we expect to have
a conforming robots.txt?)

******** http://pmtpc2.hut.fi/robots.txt

# robots.txt for http://pmtpc2.hut.fi/

User-agent: *
Disallow:

User-agent: JumpStation
Disallow:

User-agent: Webcrawler/0.00000001
Disallow:

User-agent: Lycos/x.x
Disallow:

User-agent: EIT-Link-Verifier-Robot/0.2
Disallow:

User-agent:
Disallow:
******** http://www.cardinal.fi/robots.txt

<text/html><body>

<p><strong>Error: file '/usr/Web/cardinal/robots.txt' can not open No such file or directory</strong></p>

******** http://www.kemi.fi/robots.txt

<TITLE>WWW.KEMI.FI </TITLE>

<img align=LEFT src="/www/img/vaakuna.gif" >
<h1> WWW.KEMI.FI </H1>

<address>City of Kemi</address>
<address>Kemin Kaupungin ATK-osasto</address>
<address>Keskuspuistokatu 20 </address>
<address>Puhelin: 9698- 259224</address>
<address>e-mail: webmaster@kemi.fi</address>
<p>
<center><img align=middle src=/www/line/or_bar.gif></center>
<p>
<h1><center> URL-osoite meni metsään !!! </center></h1>
<h3><center><a
href="//www.kemi.fi/www/index.htmlx">www.kemi.fi</a></h3></center>
<p>
<center><img align=middle src=//www.kemi.fi/www/img/postilaa.gif>
<a href=//www.kemi.fi/www/webmail.html>webmaster@kemi.fi</a></center>
<p>

******** http://www.datum.fi/robots.txt

<BODY><H1>Error 404</H1>
Unable to open the specified file<hr><ADDRESS>httpd-info@glaci.com</ADDRESS></BODY>

******** http://joynws1.joensuu.fi/robots.txt

<BODY><H1>Error 404</H1>
Unable to open the specified file<hr><ADDRESS>httpd-info@glaci.com</ADDRESS></BODY>

******** http://kevdog.abo.fi/robots.txt

Help me out here. You've requested a file called "robots.txt".
That file does not exist on this site, nor has it ever existed.
There just simply has never been such a file here. (Actually,
that is not entirely true. The file you're currently reading is
called "robots.txt".)

Each day, though, 5 or 10 people try to check out "robots.txt".
It's annoying. So, could you be so kind as to e-mail me and tell
me what site gave you the URL saying that there was a "robots.txt"
file on this site. I'll contact the webmaster there and make a
suitable plea that they remove the link.

Or perhaps I should make some goofy "robots.txt" file.

Kev
kev@ray.abo.fi

******** http://kirke.helsinki.fi/robots.txt

<TITLE>This is the file "robots.txt"
</TITLE>
<H1>This is the file "robots.txt"</H1>

Here is an extract of the server error log:<P>

<TT>[Wed Jun 29 05:56:09 1994] httpd: access to /usr/local/www/robots.txt failed for beta.xerox.com, reason: file does not exist<P>

[Wed Jun 29 13:36:05 1994] httpd: access to /usr/local/www/robots.txt failed for beta.xerox.com, reason: file does not exist<P>

[Thu Jun 30 14:46:31 1994] httpd: access to /usr/local/www/robots.txt failed for beta.xerox.com, reason: file does not exist<P>

[Mon Aug 1 18:46:59 1994] httpd: access to /usr/local/www/robots.txt failed for beta.xerox.com, reason: file does not exist<P>

[Sat Aug 20 09:13:31 1994] httpd: access to /usr/local/www/robots.txt failed for pentland.stir.ac.uk, reason: file does not exist<P>

[Mon Aug 22 22:32:36 1994] httpd: access to /usr/local/www/robots.txt failed for halsoft.com, reason: file does not exist<P>

[Tue Aug 23 01:11:08 1994] httpd: access to /usr/local/www/robots.txt failed for indy1.lri.fr, reason: file does not exist<P>

[Tue Aug 23 01:35:45 1994] httpd: access to /usr/local/www/robots.txt failed for indy1.lri.fr, reason: file does not exist</TT>

As you can see the file robotxs.txt has not existed on this server. I created it solely to get this
message through. If you read this message I'd appreciate if you'd send me the document address containg the link to http://kirke.helsinki.fi/robots.txt . <P>

It is no big deal, but I am curious!<P>

Regards,<P>

<ADDRESS>Heikki Lehv&auml;slaiho, Server Manager, Heikki.Lehvaslaiho@Helsinki.FI
</ADDRESS>

******** http://laaksonen.csc.fi/robots.txt

no robots

******** http://teknet.tky.hut.fi/robots.txt

# Robots.txt file for teknet.tky.hut.fi, robots welcome!
User-agent: *
Disallow: /cgi-bin /linux /inet /gallup

******** http://www.csc.fi/robots.txt

User-agent: *
Disallow: /app-defaults/
Disallow: /backup/
Disallow: /cgi-bin/
Disallow: /htbin/
Disallow: /tools/
Disallow: /math_topics/backup/
Disallow: /math_topics/data/
Disallow: /math_topics/icons/
Disallow: /math_topics/scripts/
Disallow: /math_topics/texts/
Disallow: /math_topics/wais/
Disallow: /programming/wais/
User-agent: Peregrinator-Mathematics
Disallow: /math_topics/GAMS/
Disallow: /math_topics/opt/
Disallow: /math_topics/net/