[ietf-nntp] Further syntax

Wed Mar 3 22:39:09 PST 2004

Clive D W Feather <clive at demon.net> writes:

> Please note:

> * Delimiters between fields vary:
>   HDR                - single space
>   LIST ACTIVE        - one or more spaces
>   LIST ACTIVE.TIMES  - one or more spaces
>   LIST DISTRIB.PATS  - colon
>   LIST DISTRIBUTIONS - one or more spaces
>   LIST EXTENSIONS    - single space
>   LIST NEWSGROUPS    - white space
>   NEWGROUPS          - one or more spaces
>   OVER               - single tab

> I appreciate that many of these are for historical reasons, but is there
> any tidying up we want to do?

We could specify the delimiter for LIST EXTENSIONS as one or more spaces,
which would make all of the LIST commands consistent except for LIST
DISTRIB.PATS (which I don't care enough about to change and where a change
would be substantially incompatible) and LIST NEWSGROUPS (where we really
can't get rid of the tab for historical reasons).

HDR I think needs to be a single space so that we can accurately represent
headers that begin with whitespace.

OVER is unchangable at this point if we want implementations to be able to
largely reuse XOVER code with a few cleanups.

> * In general leading and trailing space is forbidden unless its part of
> the underlying data (e.g. it's allowed for ARTICLE but not for
> LISTGROUP; LIST NEWSGROUPS can't have leading space but can have
> trailing).

This sounds fine to me.

> * I had to guess what characters are allowed in some contents. Here's the
> choices I've made:
>   HDR and OVER - the header contents are UTF-8 printable, they cannot
>     contain tabs or other controls, but may contain spaces.

Hm.  In practice, HDR and OVER are going to return the raw octets from the
article.  This may or may not be in UTF-8.  I'm not sure what to do about
this.  Obviously, 8-bit characters in article headers mean that the
articles are violating the underlying article format standard, at least at
present (and non-UTF-8 characters will probably always mean that), but on
the other hand we've tried to keep NNTP agnostic about the underlying
articles and people are definitely using NNTP with locally-defined
character sets.

I think we should specify HDR and OVER as containing raw bytes excluding
the standard suspects (CR, LF, NUL) and, in the case of OVER, TAB.  Isn't
that what we basically do for ARTICLE and friends right now?

>   HELP - as for article body; in particular, no requirement to be UTF-8.

Seems fine.

>   LIST ACTIVE - fourth field is UTF-8 printable, no spaces or controls.

Right, since it can contain a newsgroup name and we've specified those to
be UTF-8.

>   LIST ACTIVE.TIMES - second field has no limit on the number of digits;
>     third field is UTF-8 and must start with a printable character, but
>     may contain spaces or controls.
>   LIST DISTRIBUTIONS - both fields are UTF-8; the distribution must not
>     contain spaces or controls, but the description can.
>   LIST NEWSGROUPS - the description is UTF-8 and must start with a
>     printable character, but may contain spaces or controls.

These are all other places where people may be using local character sets.
I'm not sure that we really want to dive into specifying a character set
here, although in practice if the newsgroup names are in UTF-8, it doesn't
work very well for the descriptions to be in any other character set.  On
the other hand, there is *substantial* existing practice for LIST
NEWSGROUPS containing random local character sets, and it's not clear what
servers should really do about that.

I think that really resolving this is a bit out of the scope of our
working group, since the newsgroup description is really a USEFOR thing.
I'm hesitant to dive into the middle of these particular fights, and would
prefer to bail and specify that they're in the same sort of unspecified
character set as we say for ARTICLE.  I know this isn't ideal, but for the
client author it *is* accurate and is what they will encouter in practice
when using these existing commands.

>   LIST OVERVIEW.FMT - all the text quoted in 8.5.2.2 is case-sensitive.

Hm.  It doesn't make a lot of sense to me for this text to be
case-sensitive when everywhere else in NNTP header names are
case-insensitive.

-- 
Russ Allbery (rra at stanford.edu)             <http://www.eyrie.org/~eagle/>