ietf-nntp wildmat routines and text

Russ Allbery rra at stanford.edu
Thu Jul 27 16:16:44 PDT 2000


Clive D W Feather <clive at demon.net> writes:
> Russ Allbery said:

>> I would really prefer to avoid all of this and just stick with wildmat
>> the way it's implemented right now but with the minimal change of
>> specifying the character set as being UTF-8.

> Understandable.

My main question here is how much are we really gaining?  UTF-8 characters
can be sent in NNTP commands to pretty much all existing servers without
any difficulty, so adding backslash escapes doesn't result in any changes
in expressiveness.  Is it mostly just a change to make debugging and
manual command entry easier?  I guess I can understand that... that's
probably a reasonable thing to want.

> Would it be less significant if the property was only removed for ASCII
> alphanumerics, and retained elsewhere. I can see people writing \: for
> safety, but not \a\l\t.

Depends on how simple and stupid the algorithm they're using is.  The
"simplest thing that could possibly work" is to just backslash every
character without caring if it's alphanumeric or not.  It complicates the
meaning of backslash any way you cut it; the question is whether anyone
was taking advantage of that meaning of backslash.  It also makes the
wildmat parser mildly more complicated (because now it has to know how to
do ISO 10646 to UTF-8 conversion and has to handle more ill-formed
wildmats; one of the rather nice things about wildmats is that outside of
character classes, I think the only ill-formed wildmat is one ending in a
backslash).

The nice thing about wildmats is that they're extremely simple
(particularly if you ignore character classes, which almost no one uses).
The farther away we move from simplicity, the more we should just be using
regexes and be done with it.  :/

>> (The changes to character classes I'm less concerned about, since in
>> practice I think they're pretty rarely used.)

> So what do you think \ should mean in a class ?

I guess I don't really care all that much; it means more work for the
parser, but I only have to write that once.  :)  I doubt that there are
enough users of character classes out there already to notice the change.

-- 
Russ Allbery (rra at stanford.edu)             <http://www.eyrie.org/~eagle/>



More information about the ietf-nntp mailing list