ietf-nntp Proposed wildmat text

Charles Lindsey chl at clw.cs.man.ac.uk
Thu Dec 7 04:03:43 PST 2000


In <20001206121735.I31363 at demon.net> "Clive D.W. Feather" <clive at demon.net> writes:


>Charles Lindsey said:
>>>  5.1 Wildmat syntax
>> 
>> Please exclude '!' also and let people escape it (\!) if they ever need
>> it. That would be consistent with the treatment of '*', '?', ',', etc.

>No it wouldn't. The other characters can occur anywhere within a wildmat
>pattern; it's just that they aren't interpreted as a wildmat-exact when
>they do. Allowing ! anywhere except at the start of a non-negated wildmat
>reduces the number of "can't happen" cases and simplifies the parser.

Just trying to eliminate one of the ambiguities you had to fix by text :-( .

>The purposes of \u are:
>- for people who want to insert non-ASCII characters and don't have UTF-8
>- to escape control characters
>- as another way of escaping special characters like comma
>I really don't know if we need it without PAT - opinions ?

I can see slight merit for people who cannot generate genuine UTF-8. But I
could live without it if PAT goes.

>> And to specify SP, I would still like a 2-hex-digit option (since people
>> who think in purely ASCII terms won't see the need for \u0020 when \x20
>> looks more familiar). (but if there is no PAT, I don't care.)

>That's what \s was for, modulo the discussion about white-space folding.

\s should certainly go if PAT goes.

>> No, you are trying to define meanings for all sorts of stupid things like
>> [a-------b] which are quite useless, but will complicate implementation
>> tremendously.

>On the contrary, the rather hairy-looking syntax there actually expresses
>the *simplest* implementation: scan left-to-right for "x-y" triples
>that don't overlap previous triples.

My problem was that I could not understabnd your disambiguating rule.

>>  - Where a wildmat-set-body contains one or more "-"s, they shall be
>>    examined from left to right; each shall be the centre of a
>>    wildmat-set-1-range or wildmat-set-2-range if the whole wildmat-set-body
>>    can be thus parsed.

It is just about possible to see what your rule is saying if you already
know the algorithm, but not to deduce the algorithm given only your rule.

Let me suggest a better wording:

    Where a wildmat-set-body contains one or more "-"s, they shall be
    examined from left to right; whenever such a "-" can be construed as
    the centre of a wildmat-set-1-range or wildmat-set-2-range, then any
    otherwise valid parse which does not so construe it shall be
    eliminated.

For example [%---] is to be parsed as [(%--)-] rather than [%(---)]
            [a-b-c] is to be parsed as [(a-b)-c] rather than [a-(b-c)]


>If a>b, then the entire contents of the set becomes undefined, but it
>should still match exactly one character (IMO). But I'm not too bothered
>about that. [%--:] is the range % to - plus the character :, in both
>regex and my syntax. The only debate is about a-b-c, which I think is a-b
>plus - plus c, and you disagree. I'm willing to be persuaded that this
>should change.

Yes, I see now that there are fewer differences between regex(5) and your
grammar than I had supposed.


>> Here is a grammar that encompasses what I would now suggest:
>[...]
>> BTW, I see that no grammar has been given for UTF-8-non-ascii. That needs
>> to be fixed.

>That can just be a cross-reference to the grammar at the end of the
>document.

[which could also do with some close scrutiny before it is finalized]

>> Observe that I have not actually forbidden 'a-b' where a>b. This could be
>> done by a suitable piece of text, but OTOH one could argue no harm is done
>> by leaving it in.

>That's also the approach I took. I think, however, that in this case the
>set contents are unspecified but the set still matches one character from
>the set contents.

Agreed.

>> Here is some output from a modern version of 'ed'. I claim it is accepting
>> and rejecting exactly those cases which would be accepted and rejected by
>> the above grammar.
>> 
>> chl% ed
>> a
>> &
>> -
>> 9

>Huh ?

What is your problem? Try it out for yourself on your friendly
neighbourhood UNIX box.

-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Email:     chl at clw.cs.man.ac.uk  Web:   http://www.cs.man.ac.uk/~chl
Voice/Fax: +44 161 436 6131      Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9     Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5



More information about the ietf-nntp mailing list