[NNTP] Re: New NNTP drafts approaching IETF Last Call

Russ Allbery rra at stanford.edu
Mon Mar 14 17:46:16 PST 2005


Mark Crispin <MRC at CAC.Washington.EDU> writes:
> On Mon, 14 Mar 2005, Russ Allbery wrote:

>> We can't require any specific character set be used for all Usenet news
>> articles since news articles are simply MIME messages and MIME allows
>> the use of a variety of different character sets.  I don't think it
>> would be the correct choice to try to force all Usenet news articles
>> into UTF-8.

> OK, that was not clear.

> I think that you need to do two things here:
>   1) Clarify that, as of this specification, all command responses are
>      UTF-8.  Death to all ISO 8859-x, Shift-JIS, etc. responses.

Here's the practical problem:  Some of the command responses (such as OVER
and HDR in particular) are article headers.  Usenet in practice has a
significant "just send 8-bits" culture in some hierarchies, and while I
agree with you on the merits of that culture, it's pretty hard, as a news
administrator, to disallow those messages.  It's a substantial portion of
Usenet traffic, particularly in some European and Asian hierarchies.

So, given that, what can NNTP do about it?  We can say it's not allowed,
but that isn't going to cause change in the real world; it's just going to
mean that people will ignore the specification and the whole situation
gets worse for client authors because the specification they're writing to
will be widely ignored.

Maybe we have to say something about responses that are done from article
data similar to what's said below about articles.

Certainly, I agree with the above with respect to any data that isn't
taken from news articles themselves.  I think that all that data already
has ABNF in the draft that makes it clear that it's UTF-8, so I think this
is already taken care of from that perspective.  The paragraph you were
looking at was limited to multiline responses, which already rules out
nearly all of the data not taken from articles.

>   2) have some wording such as what section 4.3.1 in IMAP has, e.g.
>  	Article texts MAY contain 8-bit or multi-octet characters,
>  	but SHOULD do so when the [CHARSET] is identified via
>  	[MIME-IMB] and/or [MIME-HDRS].
>      [CHARSET] being a tag for RFC 2978 (or successor)
>      [MIME-IMB] being a tag for RFC 2045 (or successor)
>      [MIME-HDRS] being a tag for RFC 2047 (or successor).

> That leaves the deplorable practice of "just send 8-bits" for headers
> instead of using [MIME-HDRS].  We need a very strong SHOULD here that it
> should be UTF-8 and/or [MIME-HDRS] compliant for server responses, and a
> MUST for client postings (that is, a client which complies with this
> specification MUST either use UTF-8 or [MIME-HDRS]).

We were kind of hoping to bail on questions of article format by treating
NNTP as a transport mechanism that hands back an article as an opaque wad
of data and leaving the fights over character sets to the article format.
Do you think we have to jump into this area in the NNTP specification,
when NNTP itself doesn't particularly care?  The farther we can run away
from this the better, I think -- the article format standard *has* to deal
with it, but NNTP can hopefully treat some of it as opaque.

-- 
Russ Allbery (rra at stanford.edu)             <http://www.eyrie.org/~eagle/>



More information about the ietf-nntp mailing list