[ietf-nntp] Further syntax

Tue Mar 9 01:46:50 PST 2004

Russ Allbery said:
>> Therefore, it would be much better to say something like:
> 
>>     "In such cases, clients will have to interpret such headers as best
>>     they can, possibly relying on out of band information not provided by
>>     the protocol."
> 
> I don't really care about the wording, but I do think that we shouldn't
> let this hold up last call.  Either one is fine; it's just a stylistic
> choice about how to word this as far as I'm concerned.  I'm happy to let
> Clive pick unless he feels he needs more input.

I'm happy with the existing wording.

We are specifying a *protocol*. What the client or server does with the
data is up to it. Some implementations will attempt to display it, and may
make various guesses or look for other headers. Some will assume UTF-8 or
$LOCALE. Some will just pass it to another process to worry about. But all
of this is outside our context.

>> Is it not clear that non-UTF-8 headers are in violation of the standard,
>> but that they MAY be tolerated (essentially that the server need not
>> waste its time doing a check)?
> They're a violation of a SHOULD, which means something slightly different
> than what you say, I think.

Indeed.

>> If a non-UTF-8 header is *not* to be a violation of the standard, then
>> we have to get rid of all that syntax for UTF8-non-ascii (or has that
>> syntax already gone)?
> I believe that Clive is changing the syntax.

Firstly, all the UTF8-non-ascii stuff is staying, because it's used in
places where we *have* said it MUST be UTF-8 (notably wildmats).

Secondly, the way I've addressed this is to have two alternative
productions for three key non-terminals:
* S-CHAR are the non-white-space characters allowed in a header.
* S-NONTAB are the characters in a HDR or OVER response.
* S-TEXT is the description field in a LIST NEWSGROUPS response.
Draft 22 reads:

   The following non-terminals require special consideration. They
   represent situations where material SHOULD be restricted to UTF-8,
   but implementations MUST be able to cope with other character
   encodings. Therefore there are two sets of definitions for them.

   Implementations MUST accept any content that meets this syntax:

     S-CHAR   = %x21-FF
     S-NONTAB = CTRL / SP / S-CHAR
     S-TEXT   = (CTRL / S-CHAR) *B-CHAR

   Implementations SHOULD only generate content that meets this syntax:

     S-CHAR   = P-CHAR
     S-NONTAB = U-NONTAB
     S-TEXT   = U-TEXT

[CTRL is a control character, excluding NUL, CR, LF, or tab. SP is space.
P-CHAR is a UTF-8 printable character. U-NONTAB is P-CHAR, CTRL, or SP.
B-CHAR is any octet except NUL, CR, or LF. U-TEXT is a UTF-8 string not
beginning with a space, tab, or CTRL.]

-- 
Clive D.W. Feather  | Work:  <clive at demon.net>   | Tel:    +44 20 8495 6138
Internet Expert     | Home:  <clive at davros.org>  | *** NOTE CHANGE ***
Demon Internet      | WWW: http://www.davros.org | Fax:    +44 870 051 9937
Thus plc            |                            | Mobile: +44 7973 377646