[NNTP] Future-proofing for including capabilities in responses

Mon Apr 4 01:12:05 PDT 2005

"Clive D.W. Feather" <clive at demon.net> writes:

> Is there really a need for this? Is the initial CAPABILITIES command really
> a problem? If you do it this way, you've still got the parsing problem,
> in that the line has to be parsed in a totally different manner to the
> normal CAPABILITIES response, and you've also got the question of what to
> do if the capabilities list is too long to fit into the initial
> greeting.

Right. The fewer parsers required, the better to implement.

> Oh, and we haven't reserved a character that can be used to replace the
> CRLF between capabilities. Finally, you're dumping a significant amount of
> extra data on the client when it hasn't even asked for it.

This is moot - as long as the client wasn't previously allowed to make
sense of what it is given, so that NNTP-1 clients can still cope with
the data, fine.

> I can see lots of mess building up and I just don't see the benefits.

Good point.

> That's not what I'm suggesting (I agree it's far too late). I'm suggesting
> a "multi-line response" extension, or - if you prefer - a "response can
> include capability information" extension. But it's an *extension* that
> only gets invoked when the client asks for it (and so expects it).

OK, this is an important point I'd support.

> Having said all that (yet again), let me point out that if you really do
> want to put data in the free text, then using a delimiter like [] is a bad
> idea for at least two reasons:
> - it's not unlikely that it's been used already (having semantics in
>   natural languages);
> - it leads to more questions, such as whether you can have multiple []
>   clauses and, if so, which ones have meaning.

These same arguments can also be used against using ANY "very unlikely"
statements. These don't hold in the real world, and whenever I hear
someone who appears to have grown up with English as native language
talk about character sets, I'm alarmed - nothing good comes of it,
usually.

> A better approach is to define a character sequence WHICH IS VERY UNLIKELY
> TO OCCUR IN THE WILD as an "end of free text" delimiter.

No. In the first place, the sequence must be readable without special
equipment. Hence, only printable (non-blank) ASCII characters are
eligible.

> Alternatively, say that the sequence only has special meaning at the
> *start* of the free text, and then indicates that the "free text"
> isn't.

Which overloads "free-text" fields and makes the responses likely to
appear in places where the user looks.

> I would suggest that suitable choices for the delimiter would be:
> - %x1C (the "field separator" control character);

not printable, unlike the rest of NNTP.

> - %xC0.9C (which is the same thing encoded in an invalid UFT-8 way; this
>   technically violates the existing syntax rules, which may or may not be
>   a good idea);

illegal and hence not an option. The client would have to have two UTF-8
decoders, one regular, and one - along with the whole set of library
functions or string operations - that treat broken data.
No-one will follow suit, IOW this is going to be a still birth.

> - a rare control code, such as %xC2.95 (Unicode U+0095 "Message Waiting")
>   or %xE2.81.A3 (U+2063 "Invisible Separator");

not printable ASCII, hence not an option.

> - %xCD.B3.CE.87 (U+0373 U+0387 "erotimatiko" followed by "ano teleia",
>   which I picked because these characters are unlikely to occur in that
>   order, are only two octets in UTF-8, *and* are not preserved by any of
>   the four Unicode normalisation forms);

Looks ugly. Probably is. And it's not printable ASCII...

> - if you really feel it has to be ASCII printable text, then something
>   like "~fReE!" or something equally unreadable.

Urgh. :-)

> But I repeat that I feel it's a bad idea, and you've not justified the
> need.

OK.

-- 
Matthias Andree