ietf-nntp NNTP and 16-bit charsets
rra at stanford.edu
Fri May 4 19:48:45 PDT 2001
Charles Lindsey <chl at clw.cs.man.ac.uk> writes:
> Russ Allbery <rra at stanford.edu> writes:
>> Charles Lindsey <chl at clw.cs.man.ac.uk> writes:
>>> NOTE: Texts using charsets which represent characters as
>>> sequences of 16 or 32 bits (e.g. UCS-2 and UCS-4) cannot be
>>> reliably conveyed in the above format.
>> False. 16-bit or 32-bit character sets that have an encoding that
>> avoids NUL, CR, or LF work fine. Possibly a pedantic point, but it
>> wouldn't surprise me if there are legacy Asian character sets with that
> They may well get it right in the low order byte, but I would be
> surprised if they all got it right in the high order byte.
Surely you would agree that designing a 16-bit or 32-bit character set
with this property is not particularly hard? There's a very obvious range
of character numbers that you simply don't assign.
I don't know whether anyone has designed such a character set, but it's
clearly possible. I believe that makes the note above factually
> Anyway, I have made my copy read "cannot, in general, be reliably
> conveyed". But I have to keep reminding myself that I am not the editor
> of this list, so the text is just a suggestion for Stan to act upon, of
Why not just say exactly what we mean?
NOTE: Texts using encodings (such as UTF-16 or UTF-32) that may
contain the NUL octet or the CR or LF octets in contexts other than
the CRLF line ending cannot be reliably conveyed in the above format.
I believe that UTF-16 and UTF-32 are the correct things to reference, not
UCS-2 or UCS-4, but someone who has a firmer grasp on the difference
between a charset and an encoding may want to check me on that.
Russ Allbery (rra at stanford.edu) <http://www.eyrie.org/~eagle/>
More information about the ietf-nntp