ietf-nntp NNTP and 16-bit charsets

Wed May 2 06:06:20 PDT 2001

In <yln18w1nd3.fsf at windlord.stanford.edu> Russ Allbery <rra at stanford.edu> writes:

>> It might also be argued that forbidding NUL is unnecessary (does any
>> current implementation in fact have a problem with it?),

>Yes.

Fairy Nuff! But it would be of interest to hear which ones, and what the
problem is.

>>     7. When examining a multi-line response, the client MUST check to see if
>>        the line begins with the termination octet. If so and if octets
>>        other than US-ASCII CRLF follow, the first octet of the line (the
>>        termination octet) MUST be stripped away.

>I don't really like this wording.

Well it's the wording in the present text :-( .

>  What does "MUST be stripped away" mean?
>It's quite common these days for news servers to store articles in "wire
>format," which means that they never remove the dot-stuffing added during
>transit.  Only the end news reader needs to do so.

OK, how about:

    7. When interpreting a multi-line response, the "byte stuffing" MUST
       be undone; i.e. the client MUST ensure that in, any line beginning
       with the termination octet followed by octets other than US-ASCII
       CRLF, that initial termination octet is disregarded. 
    8. Likewise, the terminating line ".CRLF" (in US-ASCII) MUST NOT be
       considered part of the multi-line response; i.e. the client MUST
       ensure that any line beginning with the termination octet followed
       immediately by US-ASCII CRLF is disregarded.

Observe the use of words like "interpreting" and "considered". It is up to
the client exactly when it does and "interpreting" or "considering", but
it still has a clear obligation to do it sooner or later, certainly before
any users get to see it.

>>        NOTE: Where such an octet stream represents the body of an
>>        article, its interpretation as characters in some charset will be
>>        as determined by the standards defining the format of articles
>>        (i.e. [RFC 1036] or some successor thereof). It is, however,
>>        useful to note that the charsets regularly used with news
>>        articles (including in particular US-ASCII, the series defined by
>>        [ISO 8859], and UTF-8) all have the property that the sequence
>>        0x0a0d represents CRLF, and therefore denotes the end of a
>>        line. On the other hand, charsets which represent characters as
>>        sequences of 16 or 32 bits (e.g. UCS-2 and UCS-4) could not be
>>        used as they stand, but would need to be encoded in some manner
>>        (in fact, UTF-8 itself is such an encoding, and the encodings
>>        defined by [RFC 2046] are also suitable for the purpose).

>I don't think any of this needs to be said.  Interpretation of the body of
>the article is outside the scope of the NNTP standard.

I think some of it needs to be said, but maybe not at such length. But it
is not a matter of interpretation of the body, but rather one of pointing
out a limitation of the NNTP transport which those building systems around
it need to be aware of.

What do others think?

-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133   Web: http://www.cs.man.ac.uk/~chl
Email: chl at clw.cs.man.ac.uk      Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5