ietf-nntp NNTP and 16-bit charsets

Charles Lindsey chl at clw.cs.man.ac.uk
Tue May 1 09:53:21 PDT 2001


In <20010430115931.S88807 at demon.net> "Clive D.W. Feather" <clive at demon.net> writes:
>I think that we need to be fairly conservative about what will work, but
>not overly so.

>Based on other related discussions, I think that the sender needs to
>assume:
>- NUL is unsafe;
>- CR and LF may only appear in CRLF pairs;
>- any other octet value will survive transmission unchanged;
>- the space between CRLF pairs should be no more than 510 octets apart;
>and needs to implement dot stuffing (that is, the sequence CRLF"." within
>the transmitted data is changed to CRLF"..").

I think the length has to be at least 998 octets, and preferably
unlimited. Then any length retrictions can be applied by the format
standards, but at least let it not be said that the NNTP transport
protocol is the excuse for any restriction. One hopes such restrictions
will eventually disappear. I gather that Russ is happy to bring INN into
line with this. It might also be argued that forbidding NUL is unnecessary
(does any current implementation in fact have a problem with it?), even
though it is unlikley to be permitted by the format standards.

Essentially, I am asking that this standard, as a transport standard,
should be as liberal as possible, even though the format standards may not
yet be able to use everything provided.

So time for some text, I think. Here is a proposal to replace the relevant
paragraph of Section 4 of the draft.


Each response MUST start with a three-digit response code that is
sufficient to distinguish all responses. Certain valid responses are
defined to be multi-line; for all others, the response is contained in a
single line. All multi-line responses MUST adhere to the following format:

    1. First comes the first line of the response, as with a single line
       response, terminated with an US-ASCII CRLF.
    2. This is followed by a stream of octets which MUST NOT include 0x00
       (US-ASCII NUL), NOR either of 0x0a (US-ASCII LF) or 0x0d (US-ASCII CR)
       EXCEPT in the combination 0x0a0d (CRLF).
    3. Each such CRLF marks the end of a "line" of the stream.
    4. If any such line begins with the "termination octet" (0x2e or US_ASCII
       "."), that line MUST be "byte-stuffed" by pre-pending an additional
       termination octet (0x2e) to that line of the response.
    5. The last line of the stream MUST be terminated by a CRLF, and be
       followed by a terminating line consisting of a single termination octet
       (0x2e or US_ASCII ".") followed by a CRLF. Hence, a multi-line response
       is always terminated with the five octets "CRLF.CRLF" (in US-ASCII).
    6. There is NO limit on the length of a line.
    7. When examining a multi-line response, the client MUST check to see if
       the line begins with the termination octet. If so and if octets other
       than US-ASCII CRLF follow, the first octet of the line (the termination
       octet) MUST be stripped away.
    8. If so and if US-ASCII CRLF immediately follows the termination
       character, then the response from the NNTP server is ended and the line
       containing ".CRLF" (in US-ASCII) MUST NOT be considered part of the
       multi-line response.

       NOTE: Where such an octet stream represents the body of an article, its
       interpretation as characters in some charset will be as determined by
       the standards defining the format of articles (i.e. [RFC 1036] or some
       successor thereof). It is, however, useful to note that the charsets
       regularly used with news articles (including in particular US-ASCII,
       the series defined by [ISO 8859], and UTF-8) all have the property that
       the sequence 0x0a0d represents CRLF, and therefore denotes the end of a
       line. On the other hand, charsets which represent characters as
       sequences of 16 or 32 bits (e.g. UCS-2 and UCS-4) could not be
       used as they stand, but would need to be encoded in some manner (in
       fact, UTF-8 itself is such an encoding, and the encodings defined by
       [RFC 2046] are also suitable for the purpose).

       Note also that, although this standard does not limit the length of a
       line in any way, the standards that define the format of articles may
       do so.

Where a response is multi-line, the description of the command will define the
format of the response before "byte-stuffing" takes place.



-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133   Web: http://www.cs.man.ac.uk/~chl
Email: chl at clw.cs.man.ac.uk      Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5



More information about the ietf-nntp mailing list