[NNTP] Suggestions for NNTP extensions (CAPABILITIES)

Sun Sep 21 20:45:04 PDT 2008

Julien ÉLIE <julien at trigofacile.com> writes:

>>>    * LIST DISTRIBUTIONS (with a wildmat for the area?)
>>>    * LIST MODERATORS
>>>    * LIST MOTD
>>>    * LIST SUBSCRIPTIONS
>>
>> I certainly agree with standardizing all of these based on the INN
>> implementation.  The implementations of these haven't changed in quite
>> some time and for whatever they're worth to the world, we may as well
>> write a specification for them.
>
> The same as what there is in INN?  So no second argument for LIST
> DISTRIBUTIONS or the others?  No new LIST keywords?

There definitely should be LIST capabilities added.  I didn't mean to
imply not doing that.

For any changes to the commands, well, if you feel inspired to change what
INN is currently doing and then document the new behavior, I think that's
certainly fine.  I think that these commands have existed for a long time
without a lot of interest in adding new parameters, though, so I don't
think it would be a big loss to just end up with what INN does now.  But I
suppose while we're writing standards, we should make them as general as
possible.

> All right.  Easy to do for nnrpd.
> I believe the same applies to innd (junk for j and ignore for x).

Yes.

The only possible concern is what it would do to some actsync setups, if
people are relying on replication of those flags.

>> So I think INN should actually not return any flags other than the ones
>> already standardized and the alias flag.  Standardizing aliasing
>> requires deciding how it's supposed to work, but I think it's a useful
>> capability.
>
> Isn't what INN does right for them?
>
>    If the <flag> field begins with an equal sign, the newsgroup is an
>    alias.  Articles cannot be posted to that newsgroup, but they can be
>    received from other sites.  Any articles received from peers for that
>    newsgroup are treated as if they were actually posted to the group
>    named after the equal sign.  Note that the Newsgroups: header of the
>    articles are not modified.  (Alias groups are typically used during a
>    transition and are typically created manually with ctlinnd(8).)  An
>    alias should not point to another alias.
>
> Just a bit of rewording for an RFC is needed.

INN doesn't currently return the aliased group in the POST error, I think,
which might be nice.  It used to, but we lost that with the new overview
mechanism.  Otherwise I think INN's implementation is probably a good
reference.

> We could say that a server which complies with PAT does not parse the
> pattern argument the same way it would for other commands.
>
> RFC 4643 suggests that something like that could be done:
>
>   Note that a server MAY (but is not required to) allow white space
>   characters in usernames and passwords.  A server implementation MAY
>   blindly split command arguments at white space and therefore may not
>   preserve the exact sequence of white space characters in the username
>   or password.
>
> It would lift the problem of spaces for a real good PAT.

Yeah, we could do that.  I don't really like it -- note that occurs only
in the deprecated AUTHINFO command and we talked about that some at the
time, IIRC -- but it's probably the only sensible thing that can be done
without changing the command completely.

>> The second, as you mention below, is the encoding problem, which is
>> very hairy and difficult to deal with.
>
> Only compare octets then.  It is what XPAT does by the way.
> Otherwise, if we care about encoding, we will not find a decent solution...

Well, you can require the pattern be in UTF-8 and do RFC 2047 decoding of
the header before doing the matching, which would then work properly for
all RFC-2822-compatible articles.  Doesn't help with articles with 8-bit
headers that aren't in UTF-8, but the standards say you're not supposed to
do that anyway.

The body is way harder.  You'd need a full-blown MIME implementation in
the server.

>> IMAP has addressed the search problem at some length, and my impression
>> was that it wasn't at all simple to deal with.  I'm afraid that doing a
>> good job of it is going to require quite a lot of work.

> It would be another extension (SEARCH for instance, if anybody wishes to
> write it) and not tackled by PAT.  It would then allow to standardize a
> real PAT, and not only an informational draft for XPAT; I do not know
> what is the best thing to do.

Well, I think that if we're going to wait for someone to write up SEARCH
for NNTP, we're going to be in for a long wait.  So if we want to
standardize PAT, we probably have to do it with most of its existing
limitations.  I think the main question is whether it's work worth doing,
but I can't see how it would hurt.

>>> 3/ Something to deal with large article numbers.  What can be done?
>>>   An extension?  But what kind of capability and use?

>> IIRC, Clive had an extension proposal for how to deal with this.

> Oh, great.  It should then be proposed to Giganews, as we discussed last
> month on news.software.nntp.

It would have been on the list shortly before the final publication, which
was when we discussed at the last minute what to do about this problem.
I'm pretty sure he wrote something up, although I don't remember how far
we got in discussing it.

>> XBATCH is worth documenting.  I don't know if it's worth standardizing
>> as BATCH, but I wouldn't mind at all.  Most of the work there will be
>> defining the batch format, and that will depend on how many different
>> transforms people feel like writing up (c7unbatch, cunbatch, gunbatch,
>> etc.).

> Why should it be specified what the user does with the batch?  Let's
> give an opaque stream of n octets.  What the news server does with that
> is of no concern.  POST does not define storage methods for instance.
> Or is there something I am missing?

A standard needs to specify how the data is used.  I suppose we could
standardize a pure transport mechanism, but that's not horribly
interesting and doesn't include the information required to write real
interoperable implementations that do anything useful.

In practice, there are five interesting batch formats: straight rnews,
the binary compressed formats for compress, gzip, and bzip2 (cunbatch,
gunbatch, and bunbatch respectively), and c7unbatch, which is cunbatch
with a transport encoding that I'm not sure anything else uses

    The encoding uses characters from 0x20 (' ') through 0x7A ('z').
    (That fits nicely into the UUCP 'f' protocol by Piet Beertema.)
    First, expand three eight-bit charcters into four six-bit ones.
    Collect until we have 13, and spread the last one over the first 12,
    so that we have 12 6.5-bit characters.  Since there are very few
    half-bit machines, collect them into pairs, making six 13-bit
    characters.  We can do this as A * 91 + B where A and B are less then
    91 after we add 0x20 to make it printable.

All of them except c7unbatch are fairly easy to document.  c7unbatch we
might be able to get away with just declaring obsolete.

>> Does the Diablo implementation do streaming for header feeds?  If so,
>> we need a header-only equivalent for TAKETHIS, maybe TAKEHEADER.
>
> I think it does streaming but I am not sure.  And CHECK also needs to be
> split in order to know whether the news server only has the headers (?)
> And IHAVEHDR?

We'd need more information about how it's used.  Does it rely on being
able to distinguish between CHECK for a header and CHECK for a full
article?  What does a Diablo implementation with a header feed do if it
gets a traditional CHECK for an article for which it has only the header?
It may be that it's not that complicated.

Given that it's a new protocol, it's not clear to me that there's any
reason to bother with an IHAVE equivalent.  Again, though, we'd have to
look at Diablo and see if it implements both.

> It makes me think that another extension could be to change the
> newsgroups subscriptions of the remote peer.  FEEDADD wildmat, FEEDDEL
> wildmat for instance.  Server A connects to server B and *asks* for the
> feed.  But the protocol will not be proper because A needs to ask for an
> article and then wait.  Server B could answer a code within n seconds to
> say it has no articles.  And A could ask again...  A bit complicated,
> though.

Yeah, this has been kicked around for a while, although I don't think
anyone's gone as far as writing up an I-D for it.

Also see http://packages.qa.debian.org/g/gup.html

>>> 8/ INN also implements a wider syntax for wildmats (with "[" and "]" for
>>>   instance).  Ranges like "-12" and "-" are also recognized in order to
>>>   be more symmetric with the already defined "12-".
>>>   Notwithstanding, I do not think it needs an extension for that...

>> The wildmat extensions would be nice to write up formally, but I'm not
>> sure it's worth a full extension.

> But in what kind of document could it be written?  An informational
> draft for only 12 lines of real content?  :-) Or in a new man page for
> INN?

Well, one of the things that would be nice to do is to move RFC 2980 to
Historic.  One of the things that's currently only in RFC 2980 is the
extended wildmat syntax with character sets.  (Another are the LIST
extensions that you mention above, and PAT.)  If we got those into some
other published RFCs, we could declare RFC 2980 historic; the other stuff
in it is all deprecated and no longer generally implemented.

> Thanks, Russ.
> I saw XML templates, which are a good starting point:
>    http://tools.ietf.org/tools/templates/

You can also grab the XML source of the current USEPRO draft as another
example at:

http://www.eyrie.org/~eagle/usefor/drafts/draft-ietf-usefor-usepro-12.xml

-- 
Russ Allbery (rra at stanford.edu)             <http://www.eyrie.org/~eagle/>