ietf-nntp [a-z] in Wildmats

Mon Jun 25 10:15:07 PDT 2001

Charles Lindsey <chl at clw.cs.man.ac.uk> writes:
> Russ Allbery <rra at stanford.edu> writes:

>> The difference between grapheme and glyph is that a glyph is a single
>> visual unit of a script, whereas a grapheme is a single logical unit;
>> for example, until recently, "ch" and "ll" were graphemes in Spanish
>> even though both were composed of two glyphs.  (Or at least that's my
>> understanding of it.)

Clive's excerpt from the standard indicates that Unicode is using the term
slightly differently, so don't put too much stock in this definition.
Looks like Unicode is using it to mean a base character plus combining
marks, so something that may appear to be a single visual element but is
actually composed of multiple unicode code points.

> Ugh! Too many terms! I think "glyph" is the concept we need for the
> object the user thought he had written (or caused to appear on his
> screen or on his printout). So even if he is Spanish, he is well aware
> he had to take two distinct actions to make "ch" appear. But he might be
> more surprised to learn that one or both of them was regarded by the
> software a made up of multiple characters (well, not in the case of "c"
> or "h", but in more complex cases).

Yeah, it sounds like the Spanish ll and ch are an easier case than what
Unicode is using grapheme to refer to.

> So in the case of [...], the user might naively expect it to be made up
> of glyphs, whereas in practice if has to be made up of characters (too
> hard to implement it otherwise). Which is why I suggest omitting the
> feature (unless someone can persuade me that filename globbing in Unix
> systems already does it sensibly).

I don't know what, if anything, the Austin Group current has for filename
globbing in a Unicode locale, but I definitely agree that this is getting
into stuff we don't want to tackle in NNTP.

Let's just leave out [...] and define ? as matching any single Unicode
code point and leave it at that.  In practice, that may mean that ? isn't
particularly useful for group names in some languages, but I have a
sneaking suspicion that eventually someone will need to work out an
extension that includes something like Unicode regexes to do the matching
that people really want to do.

-- 
Russ Allbery (rra at stanford.edu)             <http://www.eyrie.org/~eagle/>