Re: An extended version of the Robots...

Captain Napalm (spc@armigeron.com)
Mon, 11 Nov 1996 14:37:35 -0500 (EST)


It was thus said that the Great Hrvoje Niksic once stated:
>
> Captain Napalm (spc@armigeron.com) wrote:
> > > This makes implementation outside Perl a near impossibility.
> > > Normal v7 regexps are far easier to deal with.
> > What would that be? What /bin/sh uses? Tcsh? grep? Perl? Do you use
> > '?' or '.' to match a single character? What about one or more characters?
> > This may be impossible to do correctly, given the different number of
> > regexprs and meta-characters that are possible (I like the set used in
>
> I suppose v7 are those used by grep (there is far less of the stuff
> than in Perl). What sh uses is a simple globbing syntax that is easy
> to cover, e.g.:
> `*' - matches 0 or more characters
> `?' - matches exactly 1 character
> `[...]' - introduce character ranges, regexp-style
> `\' - escapes the next character
>
Okay. I came across some C code by John Kercheval that implements this
style of regexs. The code was written in 1991 and the email address I have
for the author is 'johnk@wrq.com'. I have no idea if it is still valid.

> So, the string "*" would match any string (even empty ones), "?*"
> would match strings with 1 or more characters, "*abc" would match
> strings ending with "abc", whereas "abc*" would match strings
> beginning with "abc". "[a-z]*" matches a string beginning with a
> lower-case letter, and "\**" matches the string beginning with an
> asterisk. This is quite logical, and not too hard to implement. You
> use this style in your exampes
>
Yes.

> I would like robots.txt to use the normal shell-style globbing syntax,
> since it is much simpler and faster to use.
>
I was unaware that Perl had different regex than sh and grep (I don't use
Perl - don't like it). I would recomend using the sh style.

-spc (since I already have code for it 8-)