[NNTP] NNTP URI draft

Charles Lindsey chl at clerew.man.ac.uk
Tue Mar 8 09:29:33 PST 2005


In <87y8czuyd9.fsf at windlord.stanford.edu> Russ Allbery <rra at stanford.edu> writes:

>Charles Lindsey <chl at clerew.man.ac.uk> writes:


>So are you taking over as editor of this draft, or is Pete planning on
>publishing a new version if you get consensus on other mailing lists?  (I
>have no idea why usefor would be involved; this has nothing to do with
>usefor.)

Pete has agreed to incorporate my texts if I can persuade him that the
Netnews community have been consulted and are happy. I put it before
Usefor because they are a likely bunch of people who might want to
comment. Officially, it should be duscussed on the uri at w3.org list, but I
am happy for it to be tossed around here for a bit if you are willing.

>..., although that means the draft will have to wait until we publish the
>base NNTP standard so that there's a definition of wildmat to refer to.

I think we can rely on the uri at w3.org process to drag on until the
nntp-draft is well into its IESG review :-) .

>It's not at all clear to me that we want to support commas and "!",
>though.  Maybe the news URL should just use wildmat-item, appropriately
>escaped?

I don't see what harm the full wildmat would do. I would imagine that all
any system implementing this form of the URI would do would be to issue a
LIST ACTIVE command with that wildmat, and then say to the user "here are
all the groups you asked about - which one would you like to
read/subscribe/whatever?".


>> Anyway, here is my current working text.

>> 2.  The News URI Scheme

>>       newsgroup-name  = 1*%d33-126

>Use the definition from NNTP.  Wildmat patterns, if we add them, should be
>a separate production since they act differently.

OK, I now have:
      newsgroup-name  = %x21-29 / %x2B / %x2D-3E / %x40-5A / %x5E-7E 
                          ; excludes "*" "," "?" "[" "\" "]" 
I have omitted the UTF8-non-ascii, because that is covered by the next
bit.

>>    All other characters MAY be used freely to represent themselves. It
>>    is not precluded that future extensions to the Netnews standard may
>>    permit octets outside of the given ranges, in which case they too
>>    MUST be %-encoded (except perhaps when used in an IRI [RFC 3987]).

>Referring to UTF-8 here would be a good idea to provide some motivation as
>to why this might change.

Hmmmm! I think a low-profile on UTF-8 would be advisable given previous
furores on that issue. How about:

   .... It is not
   precluded that future extensions for internationalized <newsgroup-name>s
   may permit octets outside of the given ranges, in which case they too MUST
   be %-encoded (except perhaps when used in an IRI [RFC 3987]).

>> 2.1  The newsURI contains an <article>

>>    A <message-id> corresponds to the <msg-id> of [RFC 2822] and to the
>>    Message-ID of section 2.1.5 of [RFC 1036], but without the enclosing
>>    "<" and ">". It MUST be the message identifier of an actual Netnews
>>    article

>Bad use of MUST.  Referring to a non-existent article is not a violation
>of the standard.  As you've worded it right now, a client would need to
>ensure through some other means that the article it's asking for actually
>exists before using a news URI referencing it.


Hmmmm! It is really a requirement to be observed by users (for sure, if
they violate it they ain't going to get any article returned). I was
trying to avoid arguments about the syntactic form of a message
identifier, and this seemed a simple way of saying "it MUST be a valid
message-id" without needing to say what "valid" actually meant. Would you
settle for "SHOULD"?


>>    The resource retrieved by this URI is the Netnews article with the
>>    given <message-id>.  In a properly working Netnews system, the same
>>    article will be obtained whatever server is accessed for the purpose
>>    (assuming the server in question carried that article in the first
>>    place and that it has not expired).

>There's got to be a better way of phrasing this.

It was really aimed at URI-savvy people who might not understand fully the
nature of Usenet. I am open to suggestions for rewording.

>> 2.2  The newsURI contains a <group>

>>    According to [RFC 1036], the <newsgroup-name> will in practice be a
>>    period-delimited hierarchical name, such as "comp.lang.perl.modules".

>I don't see any need to refer to 1036 here or, really, anywhere else in
>this document.

I think mention of RFC 1036 (or, one day, of USEFOR) is essential
somewhere (after all, the nntpext draft references 1036). What is daid
here is similar to what was said above regarding message-ids. You can
write any <newsgroup-name> you like, but it ain't going to work unless it is
a real one. The wording there actually comes from RFC 1738, which was
exceedingly vague on the whole issue. How about:

   The <newsgroup-name> SHOULD be that of an existing newsgroup, such as
   "comp.lang.perl.modules", and hence will in practice conform to the
   syntax defined in [RFC 1036] or in any subsequent standard for
   Netnews articles.

(cf the wording for message-ids).

> ....  Since this is a URI scheme for NNTP, it should be
>sufficient to just refer to the NNTP draft, which already defines such
>things as message IDs and newsgroup names.

Actually, it isn't just a URI scheme for NNTP. It might be used to access
a local server directly, or it might be used to retrieve groups/articles
from an IMAP server. Which, come to think of it, is a good reason NOT to
use wildmats.

>> 2.3  The newsURI contains an <all-groups>

>>    If the newsURI is of one of the following forms:
>>       <URI:news:*>
>>       <URI:news://news.example.com/*>
>>       <URI:news://news.example.com/>
>>       <URI:news://news.example.com>
>>    it refers to "all available news groups"....

>> [Issue: Do we really want all those forms? Only the first was in [RFC
>> 1738], but many agents currently accept the others. Moreover, some
>> agents are known to barf on anything with '*' in it.

>Tough for them.  It's very clearly in the standard.  I'm comfortable with
>adding the forth forms; I think the second form should only be added if
>we're adding wildmat support in general, as in:

It was in RFC 1738, but has anybody actually encountered it in the wild?

It is quite unlike anything in any other URI scheme, whereas the 3rd and
4th forms are typical of many schemes with the meaning of "show me
everything you have", or "show me the default". I am suggesting that we
either drop the "*" bit entirely (my preference) or turn it into something
useful (like a wildmat).

[In any case, the danger with all of those forms is that they may
institute a download of the complete active file. Try that on supernews,
and you will sit there for 5 minuts waiting for it all to appear :-( .]

>> [2nd alternative]

>>       newsURI     = "news:" ( article / group )
>>       article     = [ news-server "/" ] message-id
>>       group       = [ news-server "/" ] wildmat

>although you need to distinguish between a literal group name and a
>wildmat, since the latter retrieves a different resource normally (namely
>a list of possible newsgroups, rather than taking you directly to a
>newsgroup).

Yes, that would have to come in the rewording if we adopt that
alternative. But, having realized the possible use of IMAP and other
servers with this URI, I am getting rather doubtful. Is the facility
likely to be useful enough to be worth the trouble? As I said above, did
the "*" ever actually happen in the wild?

>It's not clear to me whether we should allow the trailing slash to be
>optional.  What did HTTP do here?  I know that browsers support leaving it
>off, but is that fixed internal to the browser, or actually allowed in the
>protocol?

As Clive has just shown, HTTP did all sorts of amazing things, the net
effect of which was that you got the same effect with or without the "/".
Consequently, the average user now believes that a "/" in that position is
optional. I think it better to go along with that belief, because that is
what people are going to do anyway, and existing servers seem to agree.

>> It would be readily implemented, but it is quite certain that nowhere is
>> it implemented currently.

>Whoops.  :)

Indeed. Which is why I ask whether it is really useful enough to bring it
in.

>> 3.  The nntp URI scheme

>>    The nntp URI scheme is used to refer to individual Netnews articles,
>>    as defined in [RFC 1036].

>Again, refer to NNTP not RFC 1036.  (Even more so here, since the whole
>concept of an article number is purely an NNTP construction.)

Yes, but this is the introductory remarks introducing this URI, and
getting hold of Netnews articles is its whole purpose. So I don't think
mention of 1036 can be omitted. OTOH, I agree there ought to be a mention
of nntpext.

   The nntp URI scheme is used to refer to individual Netnews articles,
   as defined in [RFC 1036], enabling them to be retrieved via the NNTP
   protocol [draft-ietf-nntpext-base-*.txt].


>Hm, I see that our current draft doesn't allow -nnn or -.  INN does and
>has for basically forever, but RFC 2980 doesn't mention it either.  I'll
>start a separate thread about that.

OK. I gather that the conclusion was to make no change there.

-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133   Web: http://www.cs.man.ac.uk/~chl
Email: chl at clerew.man.ac.uk      Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5



More information about the ietf-nntp mailing list