[ietf-nntp] Further syntax
Clive D.W. Feather
clive at demon.net
Fri Mar 5 10:02:07 PST 2004
Russ Allbery said:
> Note that LIST DISTRIBUTIONS in the current draft says one or more spaces
> or tabs. If we want to stay strictly consistent, we should remove tabs
> from there.
Oops, the syntax said one or more spaces! Okay, I've changed the text to
forbid tabs.
>> Um, not quite. We have put some structure into an article, even though
>> it's much looser than 1036 or 2822:
>
>> article = 1*header CRLF body
>> header = header-name ":" [CRLF] SP header-content CRLF
>> header-content = *(P-CHAR / [CRLF] WS)
>> body = *(*B-CHAR CRLF)
>
>> P-CHAR is ASCII printable or UTF-8. B-CHAR is raw bytes except NUL CR
>> and LF. header-name is ASCII printable other than colon.
>
>> So we've already limited header contents to UTF-8. See below.
>
> Ack. I had completely forgotten that, which is my fault for not being
> more on top of things. I'm very sorry to be the source of so many delays.
'sokay.
> Okay, given that, the concern that I have is that in practice, many
> servers will not be doing this yet. I think standardizing on UTF-8 is
> clearly the Right Thing To Do, but I don't really want to issue a standard
> that says this without any warnings at all to client authors that in
> practice they're going to get back stuff that is not in UTF-8. We need to
> put in some warning somewhere.
Okay. In 3.4:
[...] There MAY be more than one header line with the same name.
- The content MUST NOT contain CRLF but is otherwise unrestricted;
- in particular, it MAY be empty. A header may be
+ The content MUST NOT contain CRLF; it MAY be empty. A header may be
"folded"; that is, a CRLF pair may be placed before any TAB or space
[...]
and servers MAY transfer it to the other without re-folding it.
+
+ The content of a header SHOULD be in UTF-8 and clients MUST only use
+ UTF-8 in header contents. However, if a server receives an article
+ from elsewhere that uses octets in the range 128 to 255 in some other
+ manner, it MAY pass it to a client without modification. Therefore
+ clients MUST be prepared to receive such headers and also data
+ derived from them (e.g. in the responses from the OVER extension)
+ and MUST NOT assume that they are always UTF-8.
Each article MUST have a unique message-id; two articles offered by
[...]
> If we want to accurately reflect current practice, I think we have to say
> something like SHOULD be UTF-8, clients MUST be prepared for character
> sets that are not in UTF-8. I don't know what that does to the syntax.
> :/
I can sort the syntax. Basically, I'll replace relevant "a UTF-8 character"
entities in the syntax with "any octet but SHOULD be UTF-8" entities.
> We can say everything MUST be ASCII or UTF-8, but, well, that's not what
> Usenet looks like right now, and it's not going to change on a dime.
Okay.
>> For LIST ACTIVE.TIMES, I think we want UTF-8 to be consistent with
>> mailboxes. Distributions are like newsgroups; I think - the names are
>> presently probably ASCII and should extend to UTF-8, while the
>> descriptions should be like LIST NEWSGROUPS.
> After thinking about it for a bit, I think you're entirely correct on both
> of those.
Okay, done.
> So the iffy parts for me are:
> * Article headers, including the results of OVER and HDR.
> * LIST NEWSGROUPS.
> * The description portion in LIST DISTRIBUTIONS.
>
> We can change the third without much impact, as that command is not widely
> used. The first two have significant existing practice using random local
> character sets.
I did say that the third should be consistent with the second, but the
different delimiters is making it painful. So I've made it UTF-8.
The first two are now SHOULD be UTF-8, can be other encodings.
>> The remaining case-sensitivity in the grammar is:
>> * status field in LIST ACTIVE response;
>> * extension labels (but not their arguments).
> The first is fine. I'm not entirely sure about the second, but I'm
> guessing we're just copying practice from other protocols, and that's
> great and what we should be doing.
It's been like that for a long time (before I took over) and nobody has
complained.
--
Clive D.W. Feather | Work: <clive at demon.net> | Tel: +44 20 8495 6138
Internet Expert | Home: <clive at davros.org> | *** NOTE CHANGE ***
Demon Internet | WWW: http://www.davros.org | Fax: +44 870 051 9937
Thus plc | | Mobile: +44 7973 377646
More information about the ietf-nntp
mailing list