ietf-nntp [a-z] in Wildmats

Clive D.W. Feather clive at demon.net
Mon Jun 25 00:35:32 PDT 2001


Charles Lindsey said:
>> I'm a bit confused here... are you talking about having character ranges
>> apply to graphemes instead of characters?  That's heading in the direction
>> of collation-aware character classes, which is a *huge* amount of work and
>> is way more weight than the poor little rickety framework of wildmat is
>> going to be able to bear.  For stuff like that, you really want to go all
>> the way to POSIX regexes.
> 
> I would not seriously suggest going that route. The problem is that the
> unaware user might have expected it to work that way.
> 
> BTW, is "grapheme" defined anywhere, or is it the same as "glyph"?

All of this, and more, is addressed in Unicode Technical Report 18.

This has three levels of support for [...]:

* Level 1: locale-independent support for characters:
  - [\u1234] to get individual characters
  - [{L},{Nd}] to get categories
  - [{L}-[{Lt},Q,q]] to exclude subsets
* Level 2: locale-independent support for characters and graphemes. A
  grapheme is defined by:

        grapheme = [base] *combiner
        base     = <any character not in category M or C>
        combiner = mark / virama letter / hangul-c / %xFF9E / %xFF9F
        mark     = <any character in category M>
        virama   = <any character in combining class 9>
        letter   = <any character in category L>
        hangul-c = %x1160-11FF
 
  [This includes some nonsense sequences like A-virama-sigma, but includes
   all sensible graphemes and excludes all sensible pairs of graphemes.]

* Level 3: locale-sensitive stuff.

I am *NOT* suggesting that we put any of this into NNTP, but anyone looking
at sets in regular expressions is strongly advised to study UTR 18.

> But since I have just changed my opinion in favour of omitting [...], and
> you have more or less agreed (and Clive too), then maybe that is the way
> the consensus is now moving.

I think so; is anyone still arguing for including them ?

-- 
Clive D.W. Feather  | Work:  <clive at demon.net>   | Tel:  +44 20 8371 1138
Internet Expert     | Home:  <clive at davros.org>  | Fax:  +44 20 8371 1037
Demon Internet      | WWW: http://www.davros.org | DFax: +44 20 8371 4037
Thus plc            |                            | Mobile: +44 7973 377646 



More information about the ietf-nntp mailing list