ietf-nntp Message from Ned Freed: NNTP and 16-bit charsets

Charles Lindsey chl at clw.cs.man.ac.uk
Fri Apr 27 03:59:11 PDT 2001


In <3AE8B960.3A1CC3F9 at verio.net> "Stan O. Barber" <sob at verio.net> writes:

>> An article with its headers written in UTF-8, and with a Content-Type that
>> specifies charset=some-16-bit-set....

>Actually, whether or not it is legal depends on the content-type, the charset,
>and the encoding. There are restrictions that forbid some combinations.

OK, it seems we shall have to outlaw them, but first let us get all the
technical information out in the open. So please can you give chapter and
verse for some of the restrictions you mention?

>In particular, charsets that use NUL characters or which represent CR or LF as
>anything other than 8bit CR or LF characters cannot be represented using either
>a 7bit or 8bit encoding. Your only encoding choices for such a charset are
>quoted-printable, base64, or binary.

I am not quite convinced there. Suppose a charset uses 0xff as a line
terminator, but never uses 0x0d and 0x0a for anything. Then it would look
like one long line, and would go through the existing spec provided you
stuck a CRLF.CRLF on the end. Not a pretty sight, I agree. BTW, we have
specified no limit on line length. Is everyone happy with that?


>Well, while it doesn't come out and say it, it is clear that the response is
>line-oriented and that lines must be terminated by CRLFs. This implicitly
>disallows the use of the binary encoding.

Not quite. It does not actually say what is supposed to happen if it
encounters a naked CR or LF. Are these 'line terminators'? If so, they
must be changed to CRLF. Or are they an error (i.e. the server or client
should have fixed them before submitting them to NNTP)? And it does not
actually say anywhere whether NULL is a permitted character or not (and
the 16-bit charset business could be handled much easier if NNTP could be
assumed to transport NULL without demur). So these things all need to be
stated one way or the other.

>> So I think the wording needs to be clarified. But first, we have to decide
>> what we INTEND. I see the following possibilities:

>> 1. We say that such an arbitrary octet stream is to be subjected to a
>> transformation ...

>This approach has been tried in the messaging world. It is been found to be
>surprisingly hard to get right, and it is why we implemented an extension to
>SMTP that uses counted chunks for transferring binary material rather than
>overloading dot-stuffing. (Dot stuffing is also claimed by some to be an
>efficiency concern. I don't buy it personally, but the claim has been made.)

Yes it does sound tricky, so we should not go that route unless existing
implementations can be shown to do it correctly already. But I am still
curious to know what happens with the existing usage which is claimed to
happen in Asian countries which use totally incomprehensible 16-bit
charsets (not even Unicode). Do those people use NNTP currently, and if so
how?

>> 2. We say that only 8-bit charsets are supported, ...

>The right thing to say in this case is that you don't allow the binary encoding
>or any new encoding with a comparable output range. Problem solved given the
>existing restrictions on other encodings.

Yes, it is looking that way.

>                                Ned
Eh? It said From: Stan Barber at the top!

-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133   Web: http://www.cs.man.ac.uk/~chl
Email: chl at clw.cs.man.ac.uk      Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5



More information about the ietf-nntp mailing list