ietf-nntp Wildmats

Clive D.W. Feather clive at demon.net
Tue Mar 13 03:27:07 PST 2001


I've uploaded a new version of the text, based on comments so far.
It's <http://www.davros.org/nntp-texts/section-5a.txt> and is included here
for convenience.


This text consists of my proposal of 2001-03-07 with changes following
discussion. I've diff-marked the changes in section 5 relative to that
document.


  5. The WILDMAT format

  The WILDMAT format described here is based on the version
  first developed by Rich Salz [5], which in turn was derived from
  the format used in the UNIX "find" command to articulate file names.
  It was developed to provide a uniform mechanism for matching
  patterns in the same manner that the UNIX shell matches filenames.

  5.1 Wildmat syntax
-
  A wildmat is described by the following augmented BNF[9] syntax
  (note that this syntax contains ambiguities and special cases described
  at the end):

    wildmat = wildmat-pattern *("," ["!"] wildmat-pattern)

    wildmat-pattern = 1*wildmat-item

!   wildmat-item = wildmat-exact / wildmat-wild

    wildmat-exact = %x21-29 / %x2B / %x2D-3E / %x40-5A / %x5D-7F /
        UTF-8-non-ascii  ; exclude * , ? [ \

    wildmat-wild = "*" / "?" / wildmat-set

    wildmat-set = "[" ["^"] wildmat-set-body "]"

    wildmat-set-body = 1*wildmat-set-item

    wildmat-set-item = wildmat-set-char / wildmat-set-range

!   wildmat-set-char = %x21-2B / %x2D-5B / %x5D-7F / UTF-8-non-ascii
!       ; exclude , \

    wildmat-set-range = wildmat-set-char wildmat-range-delim wildmat-set-char

    wildmat-range-delim = "-"

  UTF-8-non-ascii is defined in section 13.

  This syntax must be interpreted subject to the following rules:

  - Where a wildmat-pattern is not immediately preceded by "!", it shall
    not begin with a "!".

  - Where a wildmat-set-body is not immediately preceded by "^", it shall
    not begin with a "^".

  - The character "]" may only appear in a wildmat-set-body if it is the
    very first character of that body.

  - Within a wildmat-set-body, the character "-" shall be parsed as being
    a wildmat-range-delim unless:
    * it is the first or last character in the wildmat-set-body, or
    * either of the two immediately preceding characters is a "-" that can
!     be parsed as a wildmat-range-delim (this determination is made from
      left to right, so that in "[%----b-c]" only the first and fourth
      dashes are wildmat-range-delims).

+ NOTE: the characters \ and , are not allowed within a wildmat, and a
+ literal * ? or [ can only be included by making them a wildmat-set of
+ one character. This should not be a problem since these characters
+ cannot occur in newsgroup names, which is the only current use of
+ wildmats.
-
  5.2 Wildmat semantics

  A wildmat is tested against a string, and either matches or does not
  match. To do this, each constituent wildmat-pattern is matched against
  the string and the rightmost pattern that matches is identified. If
  that wildmat-pattern is not preceded with "!", the whole wildmat matches.
  If it is preceded by "!", or if no wildmat-pattern matches, the whole
  wildmat does not match.

  For example, consider the wildmat "a*,!*b,*c*":

    the string "aaa" matches because the rightmost match is with "a*"
    the string "abb" does not match because the rightmost match is with "*b"
    the string "ccb" matches because the rightmost match is with "*c*"
    the string "xxx" does not match because no wildmat-pattern matches

  A wildmat-pattern matches a string if the string can be broken into
  components, each of which matches the corresponding wildmat-item in
  the pattern; the matches must be in the same order, and the whole string
  must be used in the match. The pattern is "anchored"; that is, the first
  and last characters in the string must match the first and last item
  respectively (unless that item is an asterisk matching zero characters).

  A wildmat-exact matches the same character (which may be more than one
  octet in UTF-8).

  "?" matches exactly one character (which may be more than one octet).

  "*" matches zero or more characters. It can match an empty string, but
! it cannot match a subsequence of a UTF-8 sequence that is not aligned
! to the character boundaries.
-
  5.2.1 Wildmat sets

  A wildmat-set matches exactly one character in the string. Which
  characters are matched depend on the wildmat-set-body.

  If the body is preceded by "^", the set is "inverted". That is, it
  matches a character if and only if the set without a "^" prefix would
  not match the character, and vice versa.

  The body is split into wildmat-set-ranges and wildmat-set-chars.
  Each wildmat-set-char specifies a single character that the set will
  match. Each wildmat-set-range specifies a range of characters that the
  set will match; this range consists of every character whose code lies
  between the two characters in the range, inclusive. Thus "[a-dg]"
  is equivalent to "[abcdg]"; each match any of the five characters "a",
  "b", "c", "d", or "g". Note that the codes are always those of
! ISO 10646, no matter what the local character set is; it is helpful
! to note that UTF-8 collating order is the same as that of ISO 10646.

  If the first char in a range has a higher code than the second one, the
  characters represented by the range are determined by the implementation.
  This must be done in a consistent manner, so that, for example,
  "[d-a],[^d-a]" will match every possible character.

  Implementers must be careful to apply the pattern-matching process
  to whole characters encoded in UTF-8, and not to individual octets.

+ 5.3  Extensions
+
+ An NNTP server or extension MAY extend the syntax or semantics of
+ wildmats provided that all wildmats that meet the requirements of
+ section 5.1 have the meaning ascribed to them by section 5.2.
+ Future editions of this document may also extend wildmats.

! 5.4  Examples

  In these examples, $ and @ are used to represent the two octets 0xC2
  and 0xA3 respectively; $@ is thus the UTF-8 encoding for the pound
  sterling symbol, shown as # in the descriptions.

  Wildmat    Description of strings that match

  abc        the one string "abc"
  abc,def    the two strings "abc" and "def"
  $@         the one character string "#"
  a*         any string that begins with "a"
  a*b        any string that begins with "a" and ends with "b"
  a*,*b      any string that begins with "a" or ends with "b"
  a*,!*b     any string that begins with "a" and does not end with "b"
  a*,!*b,c*  any string that begins with "a" and does not end with "b", and
             any string that begins with "c" no matter what it ends with
  a*,c*,!*b  any string that begins with "a" or "c" and does not end
             with "b"
  ?a*        any string with "a" as its second character
  ??a*       any string with "a" as its third character
  *a?        any string with "a" as its penultimate character
  *a??       any string with "a" as its antepenultimate character
  [abc]      the three strings "a", "b", and "c"
  [^abc]     any one character string except the three "a", "b", and "c"
  [a-zA-Z]   any one character string consisting of an ASCII letter
  [0-9]*     any string beginning with an ASCII digit
  [a$@]      the two strings "a" and "#"
  [a-$@]     the 67 one character strings from "a" to "#"
  [a^bc]     the four strings "a", "^", "b", and "c"
  [a-c-]     the four strings "a", "b", "c", and "-"
  [a-c-f]    the five strings "a", "b", "c", "-", and "f"
  [-a0-]     the three strings "-", "a", and "0"
  [-a0-2]    the five strings "-", "a", "0", "1", and "2"
  [--0]      the four strings "-", ".", "/", and "0"
  []abc]     the four strings "]", "a", "b", and "c"
  [ab]c]     the two strings "ac]" and "bc]"
  [b-a]      some unspecified set of one character strings
  [^b-a]     all one character strings not matched by the previous pattern

========

[The following changes also need be made to other sections for consistency.
In addition the formal grammar will need updating.]


  6. Format for Keyword Descriptions

[...]

  The name "wildmat" for a parameter indicates that it is a
! wildmat format pattern as defined in section 5. If the parameter
! does not meet the requirements of that section (for example, if
! it does not fit the grammar of 5.1) the NNTP server MAY place some
! interpretation on it (not specified by this document) or otherwise
! MUST generate a 501 response.

  9.4 The LIST Keyword

  9.4.1 LIST

[...]

  If the optional wildmat parameter is specified, the list is
! limited to only those groups whose names match the wildmat. This
! will normally be very efficient if the wildmat is a simple group
! name.

  9.4.2 LIST ACTIVE.TIMES

  LIST ACTIVE.TIMES [wildmat]

[...]

  If the optional wildmat parameter is specified, the list is
! limited to only those groups whose names match the wildmat. This
! will normally be very efficient if the wildmat is a simple group
! name.

  9.4.4 LIST DISTRIB.PATS

  LIST DISTRIB.PATS

  The distrib.pats file is maintained by some news transport
  systems to allow clients to choose a value for the
  Distribution: line in the header of a news article being
  posted. The information returned consists of lines, in no
  particular order, each of which contains three fields
! separated by colons: a weight, a wildmat (which may be a simple
! group name), and a Distribution: value, in that order.

[...]

  9.4.5 LIST NEWSGROUPS

     LIST NEWSGROUPS [wildmat]

[...]
     If the information is not available, the
     server will return the 503 response. If the server does not
     recognize the command it should return a 501 response. If
!    the optional wildmat parameter is specified, the list is
!    limited to only those groups that match the wildmat (no
!    matching is done on the group descriptions). This will
!    normally be very efficient if the wildmat is a simple group
!    name. If nothing is matched
     an empty list is returned, not an error.

  11.4 NEWNEWS

! NEWNEWS wildmat date time [GMT]

! The message-ids of all articles added to a set of newsgroups
! since the given date-time will be listed. The set consists
! of all newsgroups whose name matches the wildmat.
  The format of the listing will be one message-id per line, as
  though text were being sent. Each message-id SHALL appear only
  once in a response. The order of the response has no specific
  significance and may vary from response to response in the
! same session. Date and time are in the same format as the
! NEWGROUPS command.

  Note that an empty list (i.e., the text body returned by this
  command consists only of the terminating period) is a possible
  valid response, and indicates that there is currently no new
  news.

  Clients SHOULD make all queries in Coordinated Universal Time
  when possible.


-- 
Clive D.W. Feather  | Work:  <clive at demon.net>   | Tel:  +44 20 8371 1138
Internet Expert     | Home:  <clive at davros.org>  | Fax:  +44 20 8371 1037
Demon Internet      | WWW: http://www.davros.org | DFax: +44 20 8371 4037
Thus plc            |                            | Mobile: +44 7973 377646 



More information about the ietf-nntp mailing list