Regexps (Was: Re: An extended version of the Robots...)

Hallvard B Furuseth (h.b.furuseth@usit.uio.no)
Mon, 11 Nov 1996 16:49:33 +0100 (MET)


Hrvoje Niksic <hniksic@srce.hr> wrote:
>Captain Napalm (spc@armigeron.com) wrote:
>>> This makes implementation outside Perl a near impossibility.
>>> Normal v7 regexps are far easier to deal with.
>> What would that be? What /bin/sh uses? Tcsh? grep? Perl? Do you use
>> '?' or '.' to match a single character? What about one or more characters?

>> This may be impossible to do correctly, given the different number of
>> regexprs and meta-characters that are possible

We can't have different robots interpreting the same regexp differently,
so we must define exactly which regexps robots.txt allows. We should
provide at least public-domain C and Perl implementations, so it will be
easy to follow the standard. Otherwise many robot writers won't bother.

> Perl has an even more powerful regexp syntax than this.

Powerful but too hard to implement.

> I would like robots.txt to use the normal shell-style globbing syntax,
> since it is much simpler and faster to use.

And it's also much less useless.

Plain grep regexps may be a good compromise.
What's this 'v7' stuff? Anybody got a (pointer to a) definition?
Anybody got a good public domain regexp implementation?

Regards,

Hallvard