ietf-nntp NNTP and 16-bit charsets

Fri May 4 19:48:45 PDT 2001

Charles Lindsey <chl at clw.cs.man.ac.uk> writes:
> Russ Allbery <rra at stanford.edu> writes:
>> Charles Lindsey <chl at clw.cs.man.ac.uk> writes:

>>>        NOTE: Texts using charsets which represent characters as
>>>        sequences of 16 or 32 bits (e.g. UCS-2 and UCS-4) cannot be
>>>        reliably conveyed in the above format.

>> False.  16-bit or 32-bit character sets that have an encoding that
>> avoids NUL, CR, or LF work fine.  Possibly a pedantic point, but it
>> wouldn't surprise me if there are legacy Asian character sets with that
>> property.

> They may well get it right in the low order byte, but I would be
> surprised if they all got it right in the high order byte.

Surely you would agree that designing a 16-bit or 32-bit character set
with this property is not particularly hard?  There's a very obvious range
of character numbers that you simply don't assign.

I don't know whether anyone has designed such a character set, but it's
clearly possible.  I believe that makes the note above factually
incorrect.

> Anyway, I have made my copy read "cannot, in general, be reliably
> conveyed". But I have to keep reminding myself that I am not the editor
> of this list, so the text is just a suggestion for Stan to act upon, of
> course.

Why not just say exactly what we mean?

    NOTE: Texts using encodings (such as UTF-16 or UTF-32) that may
    contain the NUL octet or the CR or LF octets in contexts other than
    the CRLF line ending cannot be reliably conveyed in the above format.

I believe that UTF-16 and UTF-32 are the correct things to reference, not
UCS-2 or UCS-4, but someone who has a firmer grasp on the difference
between a charset and an encoding may want to check me on that.

-- 
Russ Allbery (rra at stanford.edu)             <http://www.eyrie.org/~eagle/>