[NNTP] Re: Updated news-nntp-uri I-D

Charles Lindsey chl at clerew.man.ac.uk
Tue Nov 6 09:06:51 PST 2007


In <Jr15sp.MMo at clerew.man.ac.uk> "Charles Lindsey" <chl at clerew.man.ac.uk> writes:

>In <fg9ngk$1q3$1 at ger.gmane.org> "Frank Ellermann" <nobody at xyzzy.claranet.de> writes:

>>Hi, I've posted an updated news-nntp-uri I-D, and will
>>fix a new ltr-bug "local part (right hand side)" later:
>>http://tools.ietf.org/html/draft-ellermann-news-nntp-uri

>>Counting all attempts to update the news URI scheme in
>>RFC 1738 that's number 16, and I guess it's now at a
>>point where further modifications would likely make it
>>worse.

>Russ agreed somewhile back that this document could be discussed on the
>nntp mailing list (ietf-nntp at lists.eyrie.org) which, now that RFC3977 is
>agreed, is still maintained to watch over future nntp developments such as
>this. This message is therefore copied to that list, and I shall comment on
>the draft there.

OK, so my comments follow. They are mostly wording niggles, but there are
some serious issues also.

>                    The 'news' and 'nntp' URI Schemes
>                     draft-ellermann-news-nntp-uri-06

> 2.  Background

>    User agents like Web browsers supporting these schemes use the NNTP
>    protocol to access the corresponding resources.  The details how they
                                                                 ^
                                                                 of
>    do this, e.g. employing a separate or integrated newsreader, depend
>    on the implementation.  The default <port> associated with NNTP in
>    [RFC3977] is 119.

> 2.1.  'nntp' URIs

>    For these reasons the use of the 'nntp' URI scheme is limited, and
>    it's less widely supported by user agents than the similar 'news' URI
>    scheme.

s/it's/it is/
generally speaking, such abbreviations are deprecated in written
English (though common in speech), especially in formal documents such as
technical specifications. There are lots more example of this in your
text, so I won't mention it again.

> 2.2.  'news' URIs

>    ....  More general user agents use the 'news' URI
>    scheme to distinguish "Message-IDs" from similar constructs like
>    other URI schemes in contexts like a plain text message body.

s/like/such as/ (twice)

>    cases involving gateways not withstanding.  To distinguish
>    "Message-IDs" and newsgroup names the 'news' URI scheme uses the "@"
                                                             ^^^^
                                                           relies on
>    between local part (left hand side) and domain part (right hand side)
>    of "Message-IDs".

>    [RFC1738] offered only one wildcard for sets of newsgropus in 'news'
>    URIs, a "*" used to refer to "all available newsgroups".  In common
>    practice this was extended to varying degrees by some user agents, an
                                                      ^^^^
                                                    different
>    NNTP extension known as <wildmat> specified in [RFC2980] and now part
>    of the base NNTP specification allows pattern matching in the style
>    of the "find" command.  For the purpose of this memo this means that
           ^
         UNIX
>    some additional special characters have to be allowed in 'news' URIs,
>    some of them percent-encoded as required by the overall [RFC3986] URI
>    syntax.  User agents and NNTP servers might not (yet) implement all
                                  ^                  ^^^^^
                not yet compliant with [RFC3977]     XXXXX
>    parts of this new feature.

>    Another commonly supported addition to the [RFC1738] syntax is the
>    optional specification of a server at the begin of 'news' URIs. ....
                                               ^^^^^
                                             beginning

> 2.3.  Query parts, fragments, and normalization

>    There are no special "." or ".." path segments in 'news' and 'nntp'
>    URLs.  Please note that "." and ".." are no valid <newsgroup-name>s.
                                              ^^
                                              not

>    URI producers have to percent-encode some characters as specified
>    below (Section 4), otherwise they MUST treat a "Message-ID" without
>    angle brackets for 'news' URLs as is, i.e. case-sensitive, preserving
>    quoted pairs and quoted strings.

However that might yet change. RFC2822bis seems to have removed quoted
strings from msg-ids (though they seem more reluctant to fix the problems
with quoted pairs). So we may yet try to bring [USEFOR] into line with
that before it gets published. So it might be wiser not to submit this
draft for a proposed standard until these uncertainties have been
resolved.

> 3.  Syntax of 'nntp' URIs

>    An 'nntp' URI identifies an article by its number in a given
>    newsgroup of a specified server, or it identifies the newsgroup
               ^^
               on
>    without article number.

>    A <wildmat-exact> newsgroup name as specified in [RFC3977] allows (in
>    theory) any <UTF8-non-ascii> and most printable US-ASCII characters
>    excluding "!", "*", ",", "?", "[", "\", and "]".  To keep the syntax
>    here simple all additional characters in <wildmat-exact> not (yet)
>    allowed in [I-D.ietf-usefor-usefor] are covered by <pct-encoded> as
>    defined in [RFC3986], although percent-encoding is not strictly
>    necessary for some of these additional characters like ":", ";", and
>    "~".  Most of the additional characters have to be percent-encoded,
>    example:

I don't think that is sufficiently clear for those who may not have
detailed knowledge of the details of [RFC3977] and [USEFOR]. so I would
suggest to reword it as follows:

    A <wildmat-exact> newsgroup name as specified in [RFC3977] allows (in
    theory) any <UTF8-non-ascii> and most printable US-ASCII characters
    excluding "!", "*", ",", "?", "[", "\", and "]". However,
    [I-D.ietf-usefor-usefor] does not (yet) permit characters outside of
    <group-char> and so, to keep the syntax simple, the additional
    characters are here covered by <pct-encoded> as defined in [RFC3986],
    since most of them have to be percent-encoded anyway (with a few
    exceptions such as ":", ";", and "~"). For example:

> 4.  Syntax of 'news' URIs

>      newsURL         = "news:" [ server "/" ] ( article / newsgroups )
>      article         = mid-left "@" mid-right
>      newsgroups      = *( group-char / pct-encoded / "*" )
> 
>      mid-left        = 1*( mid-atext / "." ) /      ; <dot-atom-text>
>                        ( "%22" mid-quote "%22" )    ; <no-fold-quote>
>      mid-quote       = 1*( mid-atext / "." /        ; <mqtext> incl.
>                            mid-special /            ; '\"' / "[" / "]"
>                            "%5C%22" / "%5B" / "%5D" )
> 
>      mid-right       = 1*( mid-atext / "." ) /      ; <dot-atom-text>
>                        ( "%5B" mid-literal "%5D" )  ; <no-fold-literal>
>      mid-literal     = 1*( mid-atext / "." /        ; <mdtext> incl.
>                            mid-special /            ; '"' / "\[" / "\]"
>                            "%22" / "%5C%5B" / "%5C%5D" )
> 
>      mid-special     = "(" / ")" / "," / ":" / ";" /
>                        "%3C" / "%40" / "%5C%5C"     ; "<" / "@" / "\\"
> 
>      mid-atext       = ALPHA / DIGIT /              ; RFC 2822 <atext>
>                        "!" / "$" / "&" / "'" /      ; allowed sub-delims
>                        "*" / "+" / "=" /            ; allowed sub-delims
>                        "-" / "_" / "~" /            ; allowed unreserved
>                        "%23" / "%25" / "%2F" /      ; "#" / "%" / "/"
>                        "%3F" / "%5E" / "%60" /      ; "?" / "^" / "`"
>                        "%7B" / "%7C" / "%7D"        ; "{" / "|" / "}"

No! Please do not attempt to write (yet another) full syntax for msg-ids
here. We have already done that once in [USEFOR] (ugly but necessary).
Moreover, if RFC2822bis abolishes quoted strings much of that ugly syntax
might yet get removed. So all you need to say here is:

      newsURL         = "news:" [ server "/" ] ( article / newsgroups )
      article         = msg-id-core  ; defined in [I-D.ietf-usefor-usefor]
      newsgroups      = wildmat      ; defined in [RFC3977]

>    The form identifying an <article> corresponds to the <msg-id-core>
>    construct in [I-D.ietf-usefor-usefor], it's a "Message-ID" without
>    angle brackets.  Characters not directly allowed in this part of an
>    [RFC3986] URI have to be percent-encoded, minimally anything that is
>    not <unreserved>, no ":" (colon), and doesn't belong to the
>    <sub-delims>.

That parapgraph is then no lnger needed, though you might want to say
something like

    A <msg-id-core> is simply a <msg-id> without the angle brackets. It
    may contain a few characters that need to be percent-encoded, notably
    "[" and "]" if a <no-fold-quote> is present. However, it is never
    necessary to percent-encode the "@" within a <msg-id-core>, since it
    constitutes a <path> from the viewpoint of [RFC3986] and "@" is a
    permitted character within a <path>.

Actually, I believe "[" and "]" would also be safe unencoded, since they
only have a special meaning within an <authority>, and any <authority>
will already have been parsed by the time we get here. All other things
that might need to be percent-encoded are in improbable contexts, such as
<quoted-string>s (if they survive) and in <no-fold-quote>s of a form now
deprecated by 2822bis.

>    Several details of a canonical <msg-id-core> are omitted here, e.g.
>    leading, adjacent, or trailing dots are not allowed in
>    <dot-atom-text>.  The syntax mainly shows which characters MUST be
>    percent-encoded in a <mid-left> (local part) or <mid-right> (domain
>    part).

Obviously, that paragraph can be omitted if the detailed syntax is
omitted.

>    Please note that "%20" (space) and "%3E" (">") are not allowed.  A
>    "%5C" (backslash "\") can only occur in four combinations as shown
>    above.  Examples:

Again, that would be the three (not four) combinations allowed in
<no-fold-quote>.
> 
>        news://server.example/ab.cd@example.com
>        news:%22do..ts%22 at example.com

Why do the DQUOTEs need to be percent-encoded? Actually, 3986 seems to
omit all mention of them.

>        news:ab.cd@%5B2001:DB8::CD30%5D

And I am not convinced that the $5B and %5D are necessary there.

>    The form identifying <newsgroups> corresponds to the [RFC3977]
>    <wildmat-pattern>, a newsgroup name with wildcards "*" and "?".  Any
>    "?" has to be be percent-encoded as "%3F" in this part of an URI.
>    Examples, the first two are equivalent:

I disagree entirely with your use of <wildmat-pattern> here, when
<wildmat> would do perfectly well.

These two URIs are both intended to interface with NNTP. Hitherto, there
has been much variation in the wildcarding allowed by various
implementations (only a single "*" was allowed by RFC1738, but most now
allow more than that). However, now that we have an agreed standard for
NNTP, we have to assume that, as implementations come to be upgraded, they
will be ugraded to conform to that new standard. Therefore, the proper
course is to allow exactly what the new standard allows (since all that
implementations will need to do is to give whatever <newsgroups> is
provided in the URL to the NNTP LIST ACTIVE command, and process whatever
comes back). The last thing we need is arbitrary restrictions for which
there is no technical justification.

>    Without wildcards this form of the URL identifies a single group if
>    it's not empty, and user agents would typically try to present an
>    overview of the articles available in this group, probably somehow
                                                       ^^^^^^^^
                                                       possibly
>    limiting this overview to the newest unread articles up to a
>    configured maximum.

>    With wildcards user agents could try to list matching group names on
>    the specified or default server.  Some user agents support only a
>    specific <group> without wildcards, or an optional single "*".

Add "This situation nay be expected to improve as agents are upgraded to
comply with RFC3977."

> 5.  Acknowledgments

This section is more verbose than is customary. It should be pruned to,
just a list of names.

> 6.  Internationalization Considerations

>    The URI schemes were updated to support percent-encoded UTF-8
>    characters in NNTP newsgroup names as specified in [RFC3977] and
>    [RFC3987].

Not quite. RFC3977 provided this capability if and when Usefor chooses to
implement it (which may well happen once EAI is done). So it would be
useful to point this out, and also to mention that the use of IRIs rather
than URIs would then become appropriate to save having to percent-encode
such UTF-8.

>    The work on E-mail Address Internationalization (EAI) started in
>    [RFC4952] most likely won't change the syntax of a "Message-ID".  The
               ^^^^^^^^^^^^^^^^^
               is not expected to
>    work on a successor of [RFC2822] might end up with a significantly
>    simplified syntax for at least the local part (right hand side) of a
>    "Message-ID".

As mentioned before, it would be better not to offer this draft as a
proposed standard until the matter of 2822bis is clarified, which should
not be more than a few months.

> 8.  IANA Considerations
> 
>    The IANA registry of URI schemes could be updated to point to this
                                      ^^^^^
                                      should
>    memo instead of [RFC1738] for the 'news' and 'nntp' URI schemes.

> 8.1.  'snews' URIs
> 
>    This section contains the [RFC4395] template for the registration of
>    the historical 'snews' scheme specified in [I-D.gilman-news-url].

I think we need to decide whether to mention this one at all, even as
"historical", since it never formally got beyond an I.D. Was it ever
implemented, and are implementations still around? If not, then I suggest
it is best forgotten.

> 8.2.  nntp.uri.arpa NAPTR
> 
>    This section contains the [RFC3405] template for the registration of
>    the 'nntp' URI scheme with the Dynamic Delegation Discovery System.
> 
>    Key:               nntp
>    Authority:         RFCXXXX
>    Record:
>      nntp IN NAPTR 0 0 "" "" "!^nntp://([^/?#]*@)?([^:/?#]*).*$!\\2!i" .

This is toally obscure unless you provide some motivation for it and a
reference to wherever these things are defined. And are they actually used
anywhere in the Real World? If not, then it is arguable whether they
should be mentioned.

> 8.3.  'news-message-ID' access type
> 
>    The MIME 'news-message-ID' access type was erroneously listed as
>    subtype.  IANA should remove 'news-message-ID' from the application
>    subtype registry, and add it to the access type registry defined in
>    [RFC4289]: <http://www.iana.org/assignments/access-types>.

For sure this needs to be removed from the wrong registry, but we need
some discussion as to whether it needs to be replaced, or whether it is
another candidate for forgetting. Did anybody ever use it, based on
son-of-1036. At the most, it should probably be "historical".

-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133   Web: http://www.cs.man.ac.uk/~chl
Email: chl at clerew.man.ac.uk      Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5


More information about the ietf-nntp mailing list