ietf-nntp [a-z] in Wildmats
Clive D.W. Feather
clive at demon.net
Mon Jun 25 00:35:32 PDT 2001
Charles Lindsey said:
>> I'm a bit confused here... are you talking about having character ranges
>> apply to graphemes instead of characters? That's heading in the direction
>> of collation-aware character classes, which is a *huge* amount of work and
>> is way more weight than the poor little rickety framework of wildmat is
>> going to be able to bear. For stuff like that, you really want to go all
>> the way to POSIX regexes.
>
> I would not seriously suggest going that route. The problem is that the
> unaware user might have expected it to work that way.
>
> BTW, is "grapheme" defined anywhere, or is it the same as "glyph"?
All of this, and more, is addressed in Unicode Technical Report 18.
This has three levels of support for [...]:
* Level 1: locale-independent support for characters:
- [\u1234] to get individual characters
- [{L},{Nd}] to get categories
- [{L}-[{Lt},Q,q]] to exclude subsets
* Level 2: locale-independent support for characters and graphemes. A
grapheme is defined by:
grapheme = [base] *combiner
base = <any character not in category M or C>
combiner = mark / virama letter / hangul-c / %xFF9E / %xFF9F
mark = <any character in category M>
virama = <any character in combining class 9>
letter = <any character in category L>
hangul-c = %x1160-11FF
[This includes some nonsense sequences like A-virama-sigma, but includes
all sensible graphemes and excludes all sensible pairs of graphemes.]
* Level 3: locale-sensitive stuff.
I am *NOT* suggesting that we put any of this into NNTP, but anyone looking
at sets in regular expressions is strongly advised to study UTR 18.
> But since I have just changed my opinion in favour of omitting [...], and
> you have more or less agreed (and Clive too), then maybe that is the way
> the consensus is now moving.
I think so; is anyone still arguing for including them ?
--
Clive D.W. Feather | Work: <clive at demon.net> | Tel: +44 20 8371 1138
Internet Expert | Home: <clive at davros.org> | Fax: +44 20 8371 1037
Demon Internet | WWW: http://www.davros.org | DFax: +44 20 8371 4037
Thus plc | | Mobile: +44 7973 377646
More information about the ietf-nntp
mailing list