[NNTP] LISTGROUP
Clive D.W. Feather
clive at demon.net
Mon Mar 28 06:02:47 PST 2005
Charles Lindsey said:
>> The formal syntax uses special non-terminals S-CHAR, S-NONTAB, and S-TEXT
>> that have two separate definitions: one that MUST be accepted, and one that
>> SHOULD be generated.
>> * Header content in articles (when unfolded) is S-CHAR.
>> * Header contents in HDR/OVER responses is S-NONTAB.
>> * Newsgroup description is S-TEXT.
>> Apart from article bodies and the HELP output, the entire remaining syntax
>> is UTF-8 based.
>
>> For the record:
>
>> MUST accept SHOULD generate
>> S-CHAR %x21-FF any UTF-8 from U+0021 upwards
>> S-NONTAB any except TAB any UTF-8 except TAB
>> S-TEXT any but not beginning any UTF-8, but beginning
>> with TAB or SP with U+0021 or above
>
>> (in all the above, "any" excludes NUL, CR, and LF).
>
> Now I am even more confused, because one has to define carefully when
> "accept" applies and when "generate" applies.
The above was my paraphrasing. The actual text reads:
3.6:
The content of a header SHOULD be in UTF-8. However, if a server
receives an article from elsewhere that uses octets in the range 128
to 255 in some other manner, it MAY pass it to a client without
modification. Therefore clients MUST be prepared to receive such
headers and also data derived from them (e.g. in the responses from
the OVER (Section 8.3) command) and MUST NOT assume that they are
always UTF-8. How the client will then process those headers,
including identifying the encoding used, is outside the scope of this
document.
We don't say anything specific in the description of LIST NEWSGROUPS.
Finally, the formal syntax says:
The following non-terminals require special consideration. They
represent situations where material SHOULD be restricted to UTF-8,
but implementations MUST be able to cope with other character
encodings. Therefore there are two sets of definitions for them.
Implementations MUST accept any content that meets this syntax:
S-CHAR = %x21-FF
S-NONTAB = CTRL / SP / S-CHAR
S-TEXT = (CTRL / S-CHAR) *B-CHAR
Implementations SHOULD only generate content that meets this syntax:
S-CHAR = P-CHAR
S-NONTAB = U-NONTAB
S-TEXT = U-TEXT
Clearly this needs to be made clearer.
> So an article is POSTed containing the header "Subject: !@#$". That is a
> "MUST accept", so the server accepts it. What does the server then do with
> it? That is not really our business, but having accepted it we should not
> be surprised if it stores it and/or attempts to relay it to other sites.
The intent is that that behaviour is conforming.
> So the server has stored it, and now some other client tries to READ it.
> Are you saying that your "SHOULD generate" is violated if the article is
> now sent, including that "!@#$", in response to the READ. Likewise, is that
> "SHOULD generate" violated if the server becomes a client and says IHAVE
> that article to another server, and then sends it as-is (in which case
> it is a "MUST accept" for the other site).
No to both.
> In fact, I think it is clear that all existing implementations will simply
> include that "!@#$" in all the relevant places, simply because it it too
> much hassle and a waste of resources to try and detect these obscure
> happenings
Exactly.
I think the text in 3.6 is approximately correct, but it mixes up roles.
I will change it:
- The content of a header SHOULD be in UTF-8. However, if a server
+ The content of a header SHOULD be in UTF-8. However, if an implementation
receives an article from elsewhere that uses octets in the range 128
- to 255 in some other manner, it MAY pass it to a client without
+ to 255 in some other manner, it MAY pass it to a client or server without
- modification. Therefore clients MUST be prepared to receive such
+ modification. Therefore implementations MUST be prepared to receive such
headers and also data derived from them (e.g. in the responses from
the OVER (Section 8.3) command) and MUST NOT assume that they are
- always UTF-8. How the client will then process those headers,
+ always UTF-8. Any external processing of those headers,
including identifying the encoding used, is outside the scope of this
document.
I've added to LIST NEWSGROUPS:
The description SHOULD be in UTF-8. However, servers sometimes
obtain the information from an external source which has used a
different encoding (one that uses octets in the range 128 to 255
in some other manner). In this case they MAY pass it on unchanged
and clients MUST be prepared to receive such descriptions.
Finally, I've changed the formal syntax to:
Implementations MUST accept any content that meets this syntax:
S-CHAR = %x21-FF
S-NONTAB = CTRL / SP / S-CHAR
S-TEXT = (CTRL / S-CHAR) *B-CHAR
and MAY pass such content on unaltered.
When generating new content or re-encoding existing content,
implementations SHOULD conform to this syntax:
S-CHAR = P-CHAR
S-NONTAB = U-NONTAB
S-TEXT = U-TEXT
I could easily be convinced that that SHOULD should be a MUST.
--
Clive D.W. Feather | Work: <clive at demon.net> | Tel: +44 20 8495 6138
Internet Expert | Home: <clive at davros.org> | Fax: +44 870 051 9937
Demon Internet | WWW: http://www.davros.org | Mobile: +44 7973 377646
Thus plc | |
More information about the ietf-nntp
mailing list