ietf-nntp draft-ietf-nntpext-base-17

Fri Mar 21 04:36:43 PST 2003

In <20030320122240.GI356 at finch-staff-1.thus.net> "Clive D.W. Feather" <clive at demon.net> writes:

>Charles Lindsey said:

>> P10. OUTSTANDING ISSUE (initial response lime)
>>       No.

>Why not?

I am agin all arbitrary limits. The 512 limit on command length is widely
seen to have been a bad choice. Let us not compound the error.

(I even suggested an extension to extend the command line limit once, but
it died.)

>> p18. '?' in wildmats
>>       This is supposed to work with UTF-8 chars. Clearly any existing
>>       implementation of '?' will not so work.

>True, but the change to UTF-8 is known to be incompatible.

>Someone is confusing this with [...]. We agreed the wildmat text about a
>year ago!

Yes I know, but I cannot remember _why_ we allowed '?' to remain. Not that
I oppose it, but I still think it is not Bruce Lilly-proof :-( .

>> p26. Extension labels and parameters
>>       It would be useful to say what characters are allowed in these.

>In the absence of any other specification they are just non-space/TAB
>sequences. I wouldn't want to restrict extensions as to what they can use,
>but I'll add a bit of text.

>>       The syntax at the end seems to say ASCII letters (hence uppercase
>>       letters) but nowhere is the syntax of parameters given.

>Um, what are you reading in the syntax that suggests just ASCII?

Sorry, it doesn't say it there, but in 6.1.2 it says the extension-label
MUST be in uppercase. Ergo it is a letter (but not necessarily an ASCII
one). I think it safest to stick with ASCII, and letters, digits and minus
should be quite sufficient (ditto for parameters). Otherwise, somebody is
going to try and do clever things with control characters, and the like.
Look at RFC 3066 for an example of restraint in what is "reasonable" to
allow.

>> p34-19   header, -> headers,
>>       Also, would it be better to say 'empty' rather than 'blank'? It
>>       needs to be absolutely clear that this line has no WS in it.

>Actually that's a Usefor matter: NNTP has the concept that there are two
>parts to an article, and ARTICLE returns both while HEAD and BODY return
>one only, but - as mentioned elsewhere - we aren't as picky.

Yes, but the semantics of the ARTICLE command says to take the headers and
the body and to join them together with a blank/empty line, rather than
saying to deliver the complete article as received. So you have to be very
careful to ensure that these amount to the same thing.

>> p55.  "If the optional wildmat parameter is specified, the list is limited to
>>       only the groups whose names match the wildmat (and therefore may
>>       be empty)."
>> 
>>       I believe an empty list is possible even when no wildmat is provided.

>If the server has no groups, right? Okay.

If no new groups have been added since the server was setup.

>> p67.  OUTSTANDING ISSUE
>>       Should this be 502 ("not permitted") or 503 ("there is no overview
>>       database")? In which case, why provide the command?
>>       
>>       Maybe the client did an AUTHINFO and acquired a higher privilege
>>       which made the command available to him.

>Hmm.

Just clutching at straws :-)

>> p73.  OUTSTANDING ISSUE
>>       Should this be changed to require the name to *begin* with a colon?
>> 
>>       Sorry! Don't understand the question.

>At present the proposed rule is "contains colon = metadata item,
>no colon = header". The question is whether this should be changed to
>"begins with colon = metadata item, no colon = header,
>colon elsewhere = error".

OK. In that case I agree.

>>       BTW, is this section normative?

>Yes, it is. It shouldn't contradict anything else. Do we need to say that,
>in case of discrepency, this section wins?

I would have thought the other sections wins, especially in the case of
defining commands.

>> p77. "UTF8-1"
>>       See draft-yergeau-rfc2279bis-04.txt which is currently up for IETF
>>       last call. It includes a full syntax for UTF8, and you need to
>>       check that you conform to that and use their notation. Note also
>>       that they now exclude entirely all those cases which could give
>>       rise to code points beyond U+10FFFF.

>I'll note this. What's the timescales on this?

It is currently at IETF Last Call, so my impression is that it will
happen, and happen soon.

>> p80.  "o replacing such sequences by a "guessed" valid sequence (based on
>>       properties of the UTF-8 encoding);"
>> 
>>       That "guessing" is a definite MUST NOT in both RFC 2279 and RFC
>>       2279bis (and also in Usefor). I suggest you take it out.

>Okay. I thought I got this from the last inter-list discussion.

>Are you saying that, if I receive 0xC1 0x96, I MUST NOT convert it to 0x56?

>From RFC 2279bis:

   Implementations of the decoding algorithm above MUST protect against
   decoding invalid sequences.  For instance, a naive implementation may
   decode the overlong UTF-8 sequence C0 80 into the character U+0000,
   or the surrogate pair ED A1 8C ED BE B4 into U+233B4. Decoding
   invalid sequences may have security consequences or cause other
   problems.  See Security Considerations (Section 10) below.

RFC 2279 said the same, and I wrote a similar wording into Usefor. There
are also serious suggestions that if you see something that is not valid
UTF-8 you MAY try to treat it as whatever Chinese charset you think it
might be (yes, such usage would be contrary to standards, but since when
has that been a bar to the Chinese?)

-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133   Web: http://www.cs.man.ac.uk/~chl
Email: chl at clw.cs.man.ac.uk      Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5