Re: An updated extended standard for robots.txt

Captain Napalm (spc@armigeron.com)
Tue, 12 Nov 1996 15:53:28 -0500 (EST)


It should be noted that the 2.0 standard is still in flux and is liable to
change without warning (I should probably add that to the page). Also, note
that it has ALREADY changed, so you should check the page again.

With that said:

It was thus said that the Great Art Matheny once stated:
>
> I am uncertain about how my version 2.0 robot should interpret the
> following robots.txt directive:
>
> Robot-version: 2.0
> Disallow: /images
>
> Should it retrieve "/images/huge.gif" or not?
>
Actually, that's an ambiguous condition. Version 1.0 of the standard had
the semantics for the Disallow: directive that the robot should treat the
"/images" string as "/images*". I had defined new semantics for Disallow:
for 2.0, one for an explicit match and one for regular expresions.

> If "yes", does that mean that the robot should treat the "/images" string
> as an implicit regular expression equivalent to "/images*"?
>
If you followed the spec (it may have been poorly written) then this would
be treated as an explicit disallow, that is, the file named "/images" would
not be allowd, but "/images.ext" or "/images/image.ext" would be.

> If "no", then the "Robot-version" directive is more than a documentation
> line since it modifies the robot's behavior.
>
The original intent was yes, the version modified the robot's behavior.
The consensus seems to be that might not be such a good idea.

The updated version is located at:

http://www.armigeron.com/people/spc/robots2.html

-spc (It's not called cutting technology for nothing ... )

_________________________________________________
This messages was sent by the robots mailing list. To unsubscribe, send mail
to robots-request@webcrawler.com with the word "unsubscribe" in the body.
For more info see http://info.webcrawler.com/mak/projects/robots/robots.html