ietf-nntp NNTP and 16-bit charsets

Clive D.W. Feather clive at demon.net
Wed May 2 01:44:12 PDT 2001


Charles Lindsey said:
>> Based on other related discussions, I think that the sender needs to
>> assume:
>> - the space between CRLF pairs should be no more than 510 octets apart;

> I think the length has to be at least 998 octets, and preferably
> unlimited.

I was trying to be conservative. I will, of course, bow to whatever the
list thinks is a sensible limit (if any).

> Essentially, I am asking that this standard, as a transport standard,
> should be as liberal as possible, even though the format standards may not
> yet be able to use everything provided.

That's an understandable position if it will work.

> Each response MUST start with a three-digit response code that is
> sufficient to distinguish all responses. Certain valid responses are
> defined to be multi-line; for all others, the response is contained in a
> single line. All multi-line responses MUST adhere to the following format:
> 
>     1. First comes the first line of the response, as with a single line
>        response, terminated with an US-ASCII CRLF.
>     2. This is followed by a stream of octets which MUST NOT include 0x00
>        (US-ASCII NUL), NOR either of 0x0a (US-ASCII LF) or 0x0d (US-ASCII CR)
>        EXCEPT in the combination 0x0a0d (CRLF).

0x0d0a, surely ? And is that the right notation ? How about:

      2. This is followed by a stream of octets which consists of zero
         or more "lines". Each line ends with a US-ASCII CRLF; with that
         exception, the stream MUST NOT include 0x00, 0x0a, or 0x0d
         (US-ASCII NUL, LF, and CR).

[There is then no item 3.]

>     4. If any such line begins with the "termination octet" (0x2e or US_ASCII
>        "."), that line MUST be "byte-stuffed" by pre-pending an additional
>        termination octet (0x2e) to that line of the response.
>     5. The last line of the stream MUST be terminated by a CRLF, and be
>        followed by a terminating line consisting of a single termination octet
>        (0x2e or US_ASCII ".") followed by a CRLF. Hence, a multi-line response
>        is always terminated with the five octets "CRLF.CRLF" (in US-ASCII).

      5. The last line of the stream (if any) MUST be followed by a
         terminating line consisting of a single termination octet
         (0x2e; US_ASCII ".") followed by CRLF in the normal way. The
         terminating line is not part of the stream but (except when there
         are no lines in the stream) the CRLF at the end of the last line
         is part of the stream. A multi-line response is therefore always
         terminated with the five octets CRLF, dot, CRLF.

The next bit isn't part of the format, it's a note:

    Note: when examining a multi-line response, the client should check
    each line to see if it begins with the termination octet. If so:
    * if that is the entire line, this is the end of the response and
      the line is not part of the response;
    * otherwise the initial dot is not part of the original line and should
      be stripped (the recipient may wish to defer this if the stream is
      likely to be forwarded using this protocol).

Looking back, I realize that this is worded purely in terms of responses.
Probably it ought to be rewritten to work in both directions, but that can
wait until we have the basics sorted out.

>        NOTE: Where such an octet stream represents the body of an article, its
>        interpretation as characters in some charset will be as determined by
>        the standards defining the format of articles (i.e. [RFC 1036] or some
>        successor thereof). It is, however, useful to note that the charsets
>        regularly used with news articles (including in particular US-ASCII,
>        the series defined by [ISO 8859], and UTF-8) all have the property that
>        the sequence 0x0a0d represents CRLF, and therefore denotes the end of a
>        line. On the other hand, charsets which represent characters as
>        sequences of 16 or 32 bits (e.g. UCS-2 and UCS-4) could not be
>        used as they stand, but would need to be encoded in some manner (in
>        fact, UTF-8 itself is such an encoding, and the encodings defined by
>        [RFC 2046] are also suitable for the purpose).
> 
>        Note also that, although this standard does not limit the length of a
>        line in any way, the standards that define the format of articles may
>        do so.

This is awfully verbose. Do we need it all, or do we want to just say that
NNTP does not put any interpretation on the contents of some responses
(it's not just bodies of articles, there are headers as well).

> Where a response is multi-line, the description of the command will define the
> format of the response before "byte-stuffing" takes place.

-- 
Clive D.W. Feather  | Work:  <clive at demon.net>   | Tel:  +44 20 8371 1138
Internet Expert     | Home:  <clive at davros.org>  | Fax:  +44 20 8371 1037
Demon Internet      | WWW: http://www.davros.org | DFax: +44 20 8371 4037
Thus plc            |                            | Mobile: +44 7973 377646 



More information about the ietf-nntp mailing list