Message from Ned Freed: ietf-nntp NNTP and 16-bit charsets
Stan O. Barber
sob at verio.net
Sun May 6 00:49:07 PDT 2001
> > > > NOTE: Texts using charsets which represent characters as
> > > > sequences of 16 or 32 bits (e.g. UCS-2 and UCS-4) cannot be
> > > > reliably conveyed in the above format.
> > > False. 16-bit or 32-bit character sets that have an encoding that
> > > avoids NUL, CR, or LF work fine. Possibly a pedantic point, but it
> > > wouldn't surprise me if there are legacy Asian character sets with that
> > > property.
> > They may well get it right in the low order byte, but I would be
> > surprised if they all got it right in the high order byte.
> Surely you would agree that designing a 16-bit or 32-bit character set
> with this property is not particularly hard? There's a very obvious range
> of character numbers that you simply don't assign.
> I don't know whether anyone has designed such a character set, but it's
> clearly possible. I believe that makes the note above factually
> incorrect.
Not only have such charsets been designed, they are the rule rather than the
exception for multibyte charsets. The exception is UTF-16, and it really is
just that: An exception.
I already explained in a previous message that the various legacy Asian
charsets work this way. I've written code to handle most of the Asian charsets
so this happens to be something I know a lot about.
> > Anyway, I have made my copy read "cannot, in general, be reliably
> > conveyed". But I have to keep reminding myself that I am not the editor
> > of this list, so the text is just a suggestion for Stan to act upon, of
> > course.
> Why not just say exactly what we mean?
That's what we should do here.
> NOTE: Texts using encodings (such as UTF-16 or UTF-32) that may
> contain the NUL octet or the CR or LF octets in contexts other than
> the CRLF line ending cannot be reliably conveyed in the above format.
> I believe that UTF-16 and UTF-32 are the correct things to reference, not
> UCS-2 or UCS-4, but someone who has a firmer grasp on the difference
> between a charset and an encoding may want to check me on that.
You are quite correct: The charsets to reference are UTF-16 and UTF-32.
Ned
More information about the ietf-nntp
mailing list