[ietf-nntp] Further syntax

Clive D.W. Feather clive at demon.net
Fri Mar 5 10:02:07 PST 2004


Russ Allbery said:
> Note that LIST DISTRIBUTIONS in the current draft says one or more spaces
> or tabs.  If we want to stay strictly consistent, we should remove tabs
> from there.

Oops, the syntax said one or more spaces! Okay, I've changed the text to
forbid tabs.

>> Um, not quite. We have put some structure into an article, even though
>> it's much looser than 1036 or 2822:
> 
>>     article = 1*header CRLF body
>>     header = header-name ":" [CRLF] SP header-content CRLF
>>     header-content = *(P-CHAR / [CRLF] WS)
>>     body = *(*B-CHAR CRLF)
> 
>> P-CHAR is ASCII printable or UTF-8. B-CHAR is raw bytes except NUL CR
>> and LF. header-name is ASCII printable other than colon.
> 
>> So we've already limited header contents to UTF-8. See below.
> 
> Ack.  I had completely forgotten that, which is my fault for not being
> more on top of things.  I'm very sorry to be the source of so many delays.

'sokay.

> Okay, given that, the concern that I have is that in practice, many
> servers will not be doing this yet.  I think standardizing on UTF-8 is
> clearly the Right Thing To Do, but I don't really want to issue a standard
> that says this without any warnings at all to client authors that in
> practice they're going to get back stuff that is not in UTF-8.  We need to
> put in some warning somewhere.

Okay. In 3.4:

    [...] There MAY be more than one header line with the same name.
-   The content MUST NOT contain CRLF but is otherwise unrestricted;
-   in particular, it MAY be empty. A header may be
+   The content MUST NOT contain CRLF; it MAY be empty. A header may be
    "folded"; that is, a CRLF pair may be placed before any TAB or space
    [...]
    and servers MAY transfer it to the other without re-folding it.
+
+   The content of a header SHOULD be in UTF-8 and clients MUST only use
+   UTF-8 in header contents. However, if a server receives an article
+   from elsewhere that uses octets in the range 128 to 255 in some other
+   manner, it MAY pass it to a client without modification. Therefore
+   clients MUST be prepared to receive such headers and also data
+   derived from them (e.g. in the responses from the OVER extension)
+   and MUST NOT assume that they are always UTF-8.

    Each article MUST have a unique message-id; two articles offered by
    [...]

> If we want to accurately reflect current practice, I think we have to say
> something like SHOULD be UTF-8, clients MUST be prepared for character
> sets that are not in UTF-8.  I don't know what that does to the syntax.
> :/

I can sort the syntax. Basically, I'll replace relevant "a UTF-8 character"
entities in the syntax with "any octet but SHOULD be UTF-8" entities.

> We can say everything MUST be ASCII or UTF-8, but, well, that's not what
> Usenet looks like right now, and it's not going to change on a dime.

Okay.

>> For LIST ACTIVE.TIMES, I think we want UTF-8 to be consistent with
>> mailboxes.  Distributions are like newsgroups; I think - the names are
>> presently probably ASCII and should extend to UTF-8, while the
>> descriptions should be like LIST NEWSGROUPS.
> After thinking about it for a bit, I think you're entirely correct on both
> of those.

Okay, done.

> So the iffy parts for me are:
>  * Article headers, including the results of OVER and HDR.
>  * LIST NEWSGROUPS.
>  * The description portion in LIST DISTRIBUTIONS.
> 
> We can change the third without much impact, as that command is not widely
> used.  The first two have significant existing practice using random local
> character sets.

I did say that the third should be consistent with the second, but the
different delimiters is making it painful. So I've made it UTF-8.

The first two are now SHOULD be UTF-8, can be other encodings.

>> The remaining case-sensitivity in the grammar is:
>> * status field in LIST ACTIVE response;
>> * extension labels (but not their arguments).
> The first is fine.  I'm not entirely sure about the second, but I'm
> guessing we're just copying practice from other protocols, and that's
> great and what we should be doing.

It's been like that for a long time (before I took over) and nobody has
complained.

-- 
Clive D.W. Feather  | Work:  <clive at demon.net>   | Tel:    +44 20 8495 6138
Internet Expert     | Home:  <clive at davros.org>  | *** NOTE CHANGE ***
Demon Internet      | WWW: http://www.davros.org | Fax:    +44 870 051 9937
Thus plc            |                            | Mobile: +44 7973 377646



More information about the ietf-nntp mailing list