ietf-nntp NNTP and 16-bit charsets

Russ Allbery rra at stanford.edu
Tue May 1 19:44:08 PDT 2001


Charles Lindsey <chl at clw.cs.man.ac.uk> writes:

> I think the length has to be at least 998 octets, and preferably
> unlimited. Then any length retrictions can be applied by the format
> standards, but at least let it not be said that the NNTP transport
> protocol is the excuse for any restriction. One hopes such restrictions
> will eventually disappear. I gather that Russ is happy to bring INN into
> line with this.

I believe that INN already handles such articles just fine, and just has a
check in nnrpd for new posts that rejects articles with long lines (in the
"be conservative in what you generate" department).  So I think that all
that's needed is removal of that check.

> It might also be argued that forbidding NUL is unnecessary (does any
> current implementation in fact have a problem with it?),

Yes.

>     7. When examining a multi-line response, the client MUST check to see if
>        the line begins with the termination octet. If so and if octets
>        other than US-ASCII CRLF follow, the first octet of the line (the
>        termination octet) MUST be stripped away.

I don't really like this wording.  What does "MUST be stripped away" mean?
It's quite common these days for news servers to store articles in "wire
format," which means that they never remove the dot-stuffing added during
transit.  Only the end news reader needs to do so.

I think RFC 2119 language is inappropriate here.  Instead, we should
describe the bidirectional relationship between the canonical article and
the encoded article; if a piece of news software wants to show the encoded
article rather than the canonical article to its users, that's outside the
scope of the standard.

>        NOTE: Where such an octet stream represents the body of an
>        article, its interpretation as characters in some charset will be
>        as determined by the standards defining the format of articles
>        (i.e. [RFC 1036] or some successor thereof). It is, however,
>        useful to note that the charsets regularly used with news
>        articles (including in particular US-ASCII, the series defined by
>        [ISO 8859], and UTF-8) all have the property that the sequence
>        0x0a0d represents CRLF, and therefore denotes the end of a
>        line. On the other hand, charsets which represent characters as
>        sequences of 16 or 32 bits (e.g. UCS-2 and UCS-4) could not be
>        used as they stand, but would need to be encoded in some manner
>        (in fact, UTF-8 itself is such an encoding, and the encodings
>        defined by [RFC 2046] are also suitable for the purpose).

I don't think any of this needs to be said.  Interpretation of the body of
the article is outside the scope of the NNTP standard.  All that needs to
be said is that the octets NUL, CR, and LF may not be sent (except for
CRLF line terminators).  The rest of this is obvious fallout from that and
just adds verbosity that could confuse the issue since it actually doesn't
set any new requirements.

>        Note also that, although this standard does not limit the length
>        of a line in any way, the standards that define the format of
>        articles may do so.

Likewise here.

-- 
Russ Allbery (rra at stanford.edu)             <http://www.eyrie.org/~eagle/>



More information about the ietf-nntp mailing list