Re: An extended version of the Robots...

Hallvard B Furuseth (h.b.furuseth@usit.uio.no)
Fri, 15 Nov 1996 13:09:03 +0100 (MET)


At 12 Nov 1996, Martijn Koster wrote:
>At 10:41 AM 11/12/96, Hallvard B Furuseth wrote:
>
>>Again, I think someone should provide a Perl library
>
> There is: libwww-perl5. I've not used that myself.
>
>>and a C library which implements the entires RES;
>
> As discuessed before, I'm happy to share the WebCrawler /robots.txt
> parsing code (ANSI C) with anyone who asks? :-)

Great. Mention them in norobots.html and similar places.
(I don't need them myself; I just need robot authors to find them:-)

>>A better approach might be to use headers with *options*. Example:
>>
>> Hide: /Foobar # explicit
>> Hide;R: /Foo.*barbaz # regexp
>> Show;R;I: /foo.*bar # regexp; ignore case
>> Show;3-: /baz.* # Only for robot versions 3 and later
>
> Complex...

Why, no.

;R is no more complex than parsing the Allow/Disallow: arguments to find
out whether they are a regexps, explicit or general matches. And it's
safer because explicit matches can't accidentally be parsed as regexps
if we insert an URL with an unusual character.

;I is no more complex than an Ignore-Case: field if we want that
functionality.

;3- *is* maybe more complex than a Robot-Version: field, if that is
introduced somehow. Like I said, I'm not sure I like that one.

Regards,

Hallvard
_________________________________________________
This messages was sent by the robots mailing list. To unsubscribe, send mail
to robots-request@webcrawler.com with the word "unsubscribe" in the body.
For more info see http://info.webcrawler.com/mak/projects/robots/robots.html