ietf-nntp Wildmats (was: draft posted)

Russ Allbery rra at stanford.edu
Tue Nov 27 15:28:47 PST 2001


Charles Lindsey <chl at clw.cs.man.ac.uk> writes:

> Last thing I remember is that Clive had two texts (both should still be
> on his website) and we had reached consensus on the shorter one, without
> the character classes.

I'm including that change for reference.  It seems fine to me.  Does
anyone have any objections with making the following changes in the draft?

(Note that the mention of LIST NEWSGROUPS being very efficient with a
single group name has since been removed from the draft and shouldn't be
reintroduced by this change, and the other changes to the text elsewhere
in the document should be reviewed against -14.)

NNTP proposed text
Section 5
Working copy
Last changed 2001-05-02 14:45 UTC

This text consists of my updated proposal of 2001-04-30 with wildmat
sets removed. I've diff-marked the changes in section 5 relative to the
updated proposal. This proposal is a strict subset of the previous one
(that is, every wildmat in the previous proposal either has the same
meaning in this one or is not permitted).


  5. The WILDMAT format

  The WILDMAT format described here is based on the version
  first developed by Rich Salz [5], which in turn was derived from
  the format used in the UNIX "find" command to articulate file names.
  It was developed to provide a uniform mechanism for matching
  patterns in the same manner that the UNIX shell matches filenames.

  5.1 Wildmat syntax

  A wildmat is described by the following augmented BNF[9] syntax
  (note that this syntax contains ambiguities and special cases described
  at the end):

    wildmat = wildmat-pattern *("," ["!"] wildmat-pattern)

    wildmat-pattern = 1*wildmat-item

    wildmat-item = wildmat-exact / wildmat-wild

!   wildmat-exact = %x21-29 / %x2B / %x2D-3E / %x40-5A / %x5E-7F /
!       UTF-8-non-ascii  ; exclude * , ? [ \ ]

!   wildmat-wild = "*" / "?"
-
  UTF-8-non-ascii is defined in section 13.

  This syntax must be interpreted subject to the following rule:

  - Where a wildmat-pattern is not immediately preceded by "!", it shall
    not begin with a "!".
-
! NOTE: the characters \ , [ and ] are not allowed in wildmats, while *
! and ? are always wildcards. This should not be a problem since these
! characters cannot occur in newsgroup names, which is the only current
! use of wildmats. Backslash is commonly used to supress the special
! meaning of characters and brackets to introduce sets, but there is no
! existing standard practice for these in wildmats and so they were omitted
! from this specification. A future extension to this document may provide
! semantics for these characters.

  5.2 Wildmat semantics

  A wildmat is tested against a string, and either matches or does not
  match. To do this, each constituent wildmat-pattern is matched against
  the string and the rightmost pattern that matches is identified. If
  that wildmat-pattern is not preceded with "!", the whole wildmat matches.
  If it is preceded by "!", or if no wildmat-pattern matches, the whole
  wildmat does not match.

  For example, consider the wildmat "a*,!*b,*c*":

    the string "aaa" matches because the rightmost match is with "a*"
    the string "abb" does not match because the rightmost match is with "*b"
    the string "ccb" matches because the rightmost match is with "*c*"
    the string "xxx" does not match because no wildmat-pattern matches

  A wildmat-pattern matches a string if the string can be broken into
  components, each of which matches the corresponding wildmat-item in
  the pattern; the matches must be in the same order, and the whole string
  must be used in the match. The pattern is "anchored"; that is, the first
  and last characters in the string must match the first and last item
  respectively (unless that item is an asterisk matching zero characters).

  A wildmat-exact matches the same character (which may be more than one
  octet in UTF-8).

  "?" matches exactly one character (which may be more than one octet).

  "*" matches zero or more characters. It can match an empty string, but
  it cannot match a subsequence of a UTF-8 sequence that is not aligned
  to the character boundaries.
-
  5.3  Extensions

  An NNTP server or extension MAY extend the syntax or semantics of
  wildmats provided that all wildmats that meet the requirements of
  section 5.1 have the meaning ascribed to them by section 5.2.
  Future editions of this document may also extend wildmats.

  5.4  Examples

  In these examples, $ and @ are used to represent the two octets 0xC2
  and 0xA3 respectively; $@ is thus the UTF-8 encoding for the pound
  sterling symbol, shown as # in the descriptions.

  Wildmat    Description of strings that match

  abc        the one string "abc"
  abc,def    the two strings "abc" and "def"
  $@         the one character string "#"
  a*         any string that begins with "a"
  a*b        any string that begins with "a" and ends with "b"
  a*,*b      any string that begins with "a" or ends with "b"
  a*,!*b     any string that begins with "a" and does not end with "b"
  a*,!*b,c*  any string that begins with "a" and does not end with "b", and
             any string that begins with "c" no matter what it ends with
  a*,c*,!*b  any string that begins with "a" or "c" and does not end
             with "b"
  ?a*        any string with "a" as its second character
  ??a*       any string with "a" as its third character
  *a?        any string with "a" as its penultimate character
  *a??       any string with "a" as its antepenultimate character
-

========

[The following changes also need be made to other sections for consistency.
In addition the formal grammar will need updating.]


  6. Format for Keyword Descriptions

[...]

  The name "wildmat" for a parameter indicates that it is a
  wildmat format pattern as defined in section 5. If the parameter
  does not meet the requirements of that section (for example, if
  it does not fit the grammar of 5.1) the NNTP server MAY place some
  interpretation on it (not specified by this document) or otherwise
  MUST generate a 501 response.

  9.4 The LIST Keyword

  9.4.1 LIST

[...]

  If the optional wildmat parameter is specified, the list is
  limited to only those groups whose names match the wildmat. This
  will normally be very efficient if the wildmat is a simple group
  name.

  9.4.2 LIST ACTIVE.TIMES

  LIST ACTIVE.TIMES [wildmat]

[...]

  If the optional wildmat parameter is specified, the list is
  limited to only those groups whose names match the wildmat. This
  will normally be very efficient if the wildmat is a simple group
  name.

  9.4.4 LIST DISTRIB.PATS

  LIST DISTRIB.PATS

  The distrib.pats file is maintained by some news transport
  systems to allow clients to choose a value for the
  Distribution: line in the header of a news article being
  posted. The information returned consists of lines, in no
  particular order, each of which contains three fields
  separated by colons: a weight, a wildmat (which may be a simple
  group name), and a Distribution: value, in that order.

[...]

  9.4.5 LIST NEWSGROUPS

     LIST NEWSGROUPS [wildmat]

[...]
     If the information is not available, the
     server will return the 503 response. If the server does not
     recognize the command it should return a 501 response. If
     the optional wildmat parameter is specified, the list is
     limited to only those groups that match the wildmat (no
     matching is done on the group descriptions). This will
     normally be very efficient if the wildmat is a simple group
     name. If nothing is matched
     an empty list is returned, not an error.

  11.4 NEWNEWS

  NEWNEWS wildmat date time [GMT]

  The message-ids of all articles added to a set of newsgroups
  since the given date-time will be listed. The set consists
  of all newsgroups whose name matches the wildmat.
  The format of the listing will be one message-id per line, as
  though text were being sent. Each message-id MUST appear only
  once in a response. The order of the response has no specific
  significance and may vary from response to response in the
  same session. Date and time are in the same format as the
  NEWGROUPS command.

  Note that an empty list (i.e., the text body returned by this
  command consists only of the terminating period) is a possible
  valid response, and indicates that there is currently no new
  news.

  Clients SHOULD make all queries in Coordinated Universal Time
  when possible.

-- 
Russ Allbery (rra at stanford.edu)             <http://www.eyrie.org/~eagle/>



More information about the ietf-nntp mailing list