[ietf-nntp] Further syntax

Russ Allbery rra at stanford.edu
Thu Mar 4 09:49:28 PST 2004


Clive D W Feather <clive at demon.net> writes:
> Russ Allbery said:

>> We could specify the delimiter for LIST EXTENSIONS as one or more spaces,
> [...]

> Okay, made that one change and nothing else. This means that the
> "normal" delimiter is one or more spaces, with the others being
> exceptions for good reasons.

Note that LIST DISTRIBUTIONS in the current draft says one or more spaces
or tabs.  If we want to stay strictly consistent, we should remove tabs
from there.  (In practice, this probably means that servers should clean
up whatever is in the disk file, since the distributions file is often
hand-edited and news administrators may introduce tabs.  But this is such
a minor command that I think consistency wins over worrying about that
issue.)

> Um, not quite. We have put some structure into an article, even though
> it's much looser than 1036 or 2822:

>     article = 1*header CRLF body
>     header = header-name ":" [CRLF] SP header-content CRLF
>     header-content = *(P-CHAR / [CRLF] WS)
>     body = *(*B-CHAR CRLF)

> P-CHAR is ASCII printable or UTF-8. B-CHAR is raw bytes except NUL CR
> and LF. header-name is ASCII printable other than colon.

> So we've already limited header contents to UTF-8. See below.

Ack.  I had completely forgotten that, which is my fault for not being
more on top of things.  I'm very sorry to be the source of so many delays.

Okay, given that, the concern that I have is that in practice, many
servers will not be doing this yet.  I think standardizing on UTF-8 is
clearly the Right Thing To Do, but I don't really want to issue a standard
that says this without any warnings at all to client authors that in
practice they're going to get back stuff that is not in UTF-8.  We need to
put in some warning somewhere.

If we want to accurately reflect current practice, I think we have to say
something like SHOULD be UTF-8, clients MUST be prepared for character
sets that are not in UTF-8.  I don't know what that does to the syntax.
:/

We can say everything MUST be ASCII or UTF-8, but, well, that's not what
Usenet looks like right now, and it's not going to change on a dime.

> Whatever we do for article headers (see below) should apply to HDR and
> OVER.

Absolutely agreed.

> So you would say that "UTF-8" in each of the above should be relaxed to
> "arbitrary high-bit-set characters", but the remaining limitations
> (e.g. no spaces for distributions) should remain?

> I can do this, but onn the other hand we *do* say that a purpose of this
> update is to make UTF-8 the primary character set.

Yes, but for NNTP commands and their parameters, which we've done.
Multiline responses are somewhat of a different case in that the major
ones (article fragments of various kinds) are mostly outside the scope of
the NNTP standard and are governed by various other standards and
not-standards that haven't made this jump yet.

> For LIST ACTIVE.TIMES, I think we want UTF-8 to be consistent with
> mailboxes.  Distributions are like newsgroups; I think - the names are
> presently probably ASCII and should extend to UTF-8, while the
> descriptions should be like LIST NEWSGROUPS.

After thinking about it for a bit, I think you're entirely correct on both
of those.

So the iffy parts for me are:

 * Article headers, including the results of OVER and HDR.
 * LIST NEWSGROUPS.
 * The description portion in LIST DISTRIBUTIONS.

We can change the third without much impact, as that command is not widely
used.  The first two have significant existing practice using random local
character sets.

> The remaining case-sensitivity in the grammar is:
> * status field in LIST ACTIVE response;
> * extension labels (but not their arguments).

The first is fine.  I'm not entirely sure about the second, but I'm
guessing we're just copying practice from other protocols, and that's
great and what we should be doing.

-- 
Russ Allbery (rra at stanford.edu)             <http://www.eyrie.org/~eagle/>



More information about the ietf-nntp mailing list