[NNTP] Re: New NNTP drafts approaching IETF Last Call

Mon Mar 14 16:56:49 PST 2005

Mark Crispin <MRC at CAC.Washington.EDU> writes:

> The text in section 3.2:

>     Note that texts using an encoding (such as UTF-16 or UTF-32) that
>     may contain the octets NUL, LF, or CR other than a CRLF pair cannot
>     be reliably conveyed in the above format (that is, they violate the
>     MUST requirement above).  However, except when stated otherwise,
>     this specification does not require the content to be UTF-8 and
>     therefore it MAY include octets above and below 128 mixed
>     arbitrarily.

> seems silly to me.  Nobody sends UTF-16, UTF-32, UCS-2, or UCS-4 data in
> Internet protocol commands.  Viewed one way, it's a tautology; viewed
> another, it confuses contexts.

This section is covering multiline responses, which are used (among other
things) for conveying the actual article.  While no one uses those
character sets for *commands*, they are all valid character sets to use in
a MIME object, which is why this comes up.

> Furthermore, the second sentence, while obviously intended to maintain
> compatibility with the past, is short-sighted and will lead to
> compatibility problems forever.

Actually, it's not for backward compatibility.  It's for MIME
compatibility.  We can't require any specific character set be used for
all Usenet news articles since news articles are simply MIME messages and
MIME allows the use of a variety of different character sets.  I don't
think it would be the correct choice to try to force all Usenet news
articles into UTF-8.

What we're basically trying to say here is that the data conveyed via
multiline responses must comply with the MIME restrictions on the 8bit
CTE, except that NNTP doesn't have the line length limitation of the MIME
8bit CTE.  There may very well be a better way of phrasing this.  We were
a bit reluctant to reference MIME directly since NNTP as a protocol
doesn't know anything about MIME structure and doesn't actually care; it
just has some similar restrictions due to the inability of existing
software to correctly handle embedded NUL characters.

Does that make more sense?

-- 
Russ Allbery (rra at stanford.edu)             <http://www.eyrie.org/~eagle/>