ietf-nntp NNTP and 16-bit charsets

Charles Lindsey chl at clw.cs.man.ac.uk
Thu May 3 03:57:19 PDT 2001


In <20010502094412.H61327 at demon.net> "Clive D.W. Feather" <clive at demon.net> writes:


>Charles Lindsey said:

>>     1. First comes the first line of the response, as with a single line
>>        response, terminated with an US-ASCII CRLF.
>>     2. This is followed by a stream of octets which MUST NOT include 0x00
>>        (US-ASCII NUL), NOR either of 0x0a (US-ASCII LF) or 0x0d (US-ASCII CR)
>>        EXCEPT in the combination 0x0a0d (CRLF).

>0x0d0a, surely ? And is that the right notation ? How about:

OOPS!

>      2. This is followed by a stream of octets which consists of zero
>         or more "lines". Each line ends with a US-ASCII CRLF; with that
>         exception, the stream MUST NOT include 0x00, 0x0a, or 0x0d
>         (US-ASCII NUL, LF, and CR).

>[There is then no item 3.]

OK, or something like that. Actually, it would be better to make that #1,
and then the initial response line is just the first line of the stream
(gets rid of all those "if any"s).

>>     5. The last line of the stream MUST be terminated by a CRLF, and be
>>        followed by a terminating line consisting of a single termination octet
>>        (0x2e or US_ASCII ".") followed by a CRLF. Hence, a multi-line response
>>        is always terminated with the five octets "CRLF.CRLF" (in US-ASCII).

>      5. The last line of the stream (if any) MUST be followed by a
>         terminating line consisting of a single termination octet
>         (0x2e; US_ASCII ".") followed by CRLF in the normal way. The
>         terminating line is not part of the stream but (except when there
>         are no lines in the stream) the CRLF at the end of the last line
>         is part of the stream. A multi-line response is therefore always
>         terminated with the five octets CRLF, dot, CRLF.

Actually, with your new (2), it is already clear that the "last line" must
be complete (i.e. CRLF terminated). So all we actually need to say is that
the lines are followed by a "terminating line".

>The next bit isn't part of the format, it's a note:

I think I still prefer the (7) and (8), but as I rewrote them yesterday,
and which Russ seems happy with.


>Looking back, I realize that this is worded purely in terms of responses.
>Probably it ought to be rewritten to work in both directions, but that can
>wait until we have the basics sorted out.

I have prepared texts for POST and IHAVE. Are there any others needed?

>>        NOTE: Where such an octet stream represents ...

>This is awfully verbose. Do we need it all, or do we want to just say that
>NNTP does not put any interpretation on the contents of some responses
>(it's not just bodies of articles, there are headers as well).

Shortened version follows.

So here is the whole thing, as I have now got it. Further comments
welcomed.


Each response MUST start with a three-digit response code that is
sufficient to distinguish all responses. Certain valid responses are
defined to be multi-line; for all others, the response is contained in a
single line. All multi-line responses MUST adhere to the following format:

    1. The resonse consists of a sequence of one or more "lines", each
       being a stream of octets ending with 0x0d0a (US-ASCII CRLF). Apart
       from those line endings, the stream MUST NOT include the octets
       0x00, 0x0a, or 0x0d (US-ASCII NUL, LF, and CR).
    2. The first such line contains the response code as with a single
       line response.
    3. If any subsequent line begins with the "termination octet" (0x2e or
       US_ASCII "."), that line MUST be "byte-stuffed" by pre-pending an
       additional termination octet (0x2e) to that line of the response.
    4. The lines of the response MUST be followed by a terminating line
       consisting of a single termination octet (0x2e or US_ASCII ".")
       followed by CRLF in the normal way. Thus a multi-line response is
       always terminated with the five octets "CRLF.CRLF" (in US-ASCII).
    5. There is NO limit on the length of a line.
    6. When interpreting a multi-line response, the "byte stuffing" MUST
       be undone; i.e. the client MUST ensure that, in any line beginning
       with the termination octet followed by octets other than US-ASCII
       CRLF, that initial termination octet is disregarded.
    7. Likewise, the terminating line ".CRLF" (in US-ASCII) MUST NOT be
       considered part of the multi-line response; i.e. the client MUST
       ensure that any line beginning with the termination octet followed
       immediately by US-ASCII CRLF is disregarded.

       NOTE: Texts using charsets which represent characters as sequences
       of 16 or 32 bits (e.g. UCS-2 and UCS-4) cannot be reliably conveyed
       in the above format. However, there is no problem with the charsets
       regularly used with news articles (including in particular
       US-ASCII, the series defined by [ISO 8859], and UTF-8) which all
       have the property that the sequence 0x0a0d represents CRLF, and
       therefore denotes the end of a line.

       Note also that, although this standard does not limit the length of
       a line in any way, the standards that define the format of articles
       may do so.

In the POST command:

If posting is permitted, the article MUST be presented to the server by
the client in the format specified by RFC 1036 (or by any of its
successors or extensions). The text forming the header and body of the
message to be posted MUST be sent by the client in the format already
defined for multi-line responses (except that there is no initial line
containing a response code).  Thus a single period (".") on a line
indicates the end of the text, and lines starting with a period in the
original text have that period doubled during transmission.

In the IHAVE commnd:

If transmission of the article is requested, the client MUST send the
entire article, including header and body, in the format already defined
for multi-line responses (except that there is no initial line containing
a response code). Thus a single period (".") on a line indicates the end
of the text, and lines starting with a period in the original text have
that period doubled during transmission. The server MUST then return a
response code indicating success or failure of the transferal of the
article.


-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133   Web: http://www.cs.man.ac.uk/~chl
Email: chl at clw.cs.man.ac.uk      Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5



More information about the ietf-nntp mailing list