[ietf-nntp] Further syntax

Charles Lindsey chl at clerew.man.ac.uk
Fri Mar 5 09:12:48 PST 2004


In <87k720h598.fsf at windlord.stanford.edu> Russ Allbery <rra at stanford.edu> writes:

>Okay, given that, the concern that I have is that in practice, many
>servers will not be doing this yet.  I think standardizing on UTF-8 is
>clearly the Right Thing To Do, but I don't really want to issue a standard
>that says this without any warnings at all to client authors that in
>practice they're going to get back stuff that is not in UTF-8.  We need to
>put in some warning somewhere.

>If we want to accurately reflect current practice, I think we have to say
>something like SHOULD be UTF-8, clients MUST be prepared for character
>sets that are not in UTF-8.  I don't know what that does to the syntax.
>:/

But I am not sure that "clients MUST be prepared" actually means anything.
AFAICS, the only places where UTF-8 might occur (you listed them) are such
that the NNTP server itself never needs to look at them as part of the
protocol. It just get a load of octets from some place (a Newsgroups file,
for example) and serves them up. So a cooperating subnet that chose to use
one of those Chinese GB* charsets could still use a bog-standard NNTP
server already in exact compliance with out standard (assuming, that is,
that it just took the bunch of octets on trust and did not try to use one
of those "valid UTF-8 detectors" anywhere).

I think the only place where the internals of an NNTP server _might_ need
to look closely at any UTF-8 would be if USEFOR were to declare that
newsgroup-names were to be in UTF-8 (Note that USEFOR has currently
postponed any such decision to a future document).

So I suspect that the simplest thing for our standard to say is that
everything MUST be (or be assumed to be) in UTF-8. If some cooperating
subnet discovers that they can get away with other charsets, then good
luck to them, but don't give them the least encouragement to do so.

>We can say everything MUST be ASCII or UTF-8, but, well, that's not what
>Usenet looks like right now, and it's not going to change on a dime.

But, strictly speaking, any such on-ASCII encountered on Usenet right now
is already in breach of some standard, so breaching one more standard is
neither here nor there. :-(

-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133   Web: http://www.cs.man.ac.uk/~chl
Email: chl at clerew.man.ac.uk      Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5



More information about the ietf-nntp mailing list