ietf-nntp Wildmats
Clive D.W. Feather
clive at demon.net
Tue Mar 13 03:27:07 PST 2001
I've uploaded a new version of the text, based on comments so far.
It's <http://www.davros.org/nntp-texts/section-5a.txt> and is included here
for convenience.
This text consists of my proposal of 2001-03-07 with changes following
discussion. I've diff-marked the changes in section 5 relative to that
document.
5. The WILDMAT format
The WILDMAT format described here is based on the version
first developed by Rich Salz [5], which in turn was derived from
the format used in the UNIX "find" command to articulate file names.
It was developed to provide a uniform mechanism for matching
patterns in the same manner that the UNIX shell matches filenames.
5.1 Wildmat syntax
-
A wildmat is described by the following augmented BNF[9] syntax
(note that this syntax contains ambiguities and special cases described
at the end):
wildmat = wildmat-pattern *("," ["!"] wildmat-pattern)
wildmat-pattern = 1*wildmat-item
! wildmat-item = wildmat-exact / wildmat-wild
wildmat-exact = %x21-29 / %x2B / %x2D-3E / %x40-5A / %x5D-7F /
UTF-8-non-ascii ; exclude * , ? [ \
wildmat-wild = "*" / "?" / wildmat-set
wildmat-set = "[" ["^"] wildmat-set-body "]"
wildmat-set-body = 1*wildmat-set-item
wildmat-set-item = wildmat-set-char / wildmat-set-range
! wildmat-set-char = %x21-2B / %x2D-5B / %x5D-7F / UTF-8-non-ascii
! ; exclude , \
wildmat-set-range = wildmat-set-char wildmat-range-delim wildmat-set-char
wildmat-range-delim = "-"
UTF-8-non-ascii is defined in section 13.
This syntax must be interpreted subject to the following rules:
- Where a wildmat-pattern is not immediately preceded by "!", it shall
not begin with a "!".
- Where a wildmat-set-body is not immediately preceded by "^", it shall
not begin with a "^".
- The character "]" may only appear in a wildmat-set-body if it is the
very first character of that body.
- Within a wildmat-set-body, the character "-" shall be parsed as being
a wildmat-range-delim unless:
* it is the first or last character in the wildmat-set-body, or
* either of the two immediately preceding characters is a "-" that can
! be parsed as a wildmat-range-delim (this determination is made from
left to right, so that in "[%----b-c]" only the first and fourth
dashes are wildmat-range-delims).
+ NOTE: the characters \ and , are not allowed within a wildmat, and a
+ literal * ? or [ can only be included by making them a wildmat-set of
+ one character. This should not be a problem since these characters
+ cannot occur in newsgroup names, which is the only current use of
+ wildmats.
-
5.2 Wildmat semantics
A wildmat is tested against a string, and either matches or does not
match. To do this, each constituent wildmat-pattern is matched against
the string and the rightmost pattern that matches is identified. If
that wildmat-pattern is not preceded with "!", the whole wildmat matches.
If it is preceded by "!", or if no wildmat-pattern matches, the whole
wildmat does not match.
For example, consider the wildmat "a*,!*b,*c*":
the string "aaa" matches because the rightmost match is with "a*"
the string "abb" does not match because the rightmost match is with "*b"
the string "ccb" matches because the rightmost match is with "*c*"
the string "xxx" does not match because no wildmat-pattern matches
A wildmat-pattern matches a string if the string can be broken into
components, each of which matches the corresponding wildmat-item in
the pattern; the matches must be in the same order, and the whole string
must be used in the match. The pattern is "anchored"; that is, the first
and last characters in the string must match the first and last item
respectively (unless that item is an asterisk matching zero characters).
A wildmat-exact matches the same character (which may be more than one
octet in UTF-8).
"?" matches exactly one character (which may be more than one octet).
"*" matches zero or more characters. It can match an empty string, but
! it cannot match a subsequence of a UTF-8 sequence that is not aligned
! to the character boundaries.
-
5.2.1 Wildmat sets
A wildmat-set matches exactly one character in the string. Which
characters are matched depend on the wildmat-set-body.
If the body is preceded by "^", the set is "inverted". That is, it
matches a character if and only if the set without a "^" prefix would
not match the character, and vice versa.
The body is split into wildmat-set-ranges and wildmat-set-chars.
Each wildmat-set-char specifies a single character that the set will
match. Each wildmat-set-range specifies a range of characters that the
set will match; this range consists of every character whose code lies
between the two characters in the range, inclusive. Thus "[a-dg]"
is equivalent to "[abcdg]"; each match any of the five characters "a",
"b", "c", "d", or "g". Note that the codes are always those of
! ISO 10646, no matter what the local character set is; it is helpful
! to note that UTF-8 collating order is the same as that of ISO 10646.
If the first char in a range has a higher code than the second one, the
characters represented by the range are determined by the implementation.
This must be done in a consistent manner, so that, for example,
"[d-a],[^d-a]" will match every possible character.
Implementers must be careful to apply the pattern-matching process
to whole characters encoded in UTF-8, and not to individual octets.
+ 5.3 Extensions
+
+ An NNTP server or extension MAY extend the syntax or semantics of
+ wildmats provided that all wildmats that meet the requirements of
+ section 5.1 have the meaning ascribed to them by section 5.2.
+ Future editions of this document may also extend wildmats.
! 5.4 Examples
In these examples, $ and @ are used to represent the two octets 0xC2
and 0xA3 respectively; $@ is thus the UTF-8 encoding for the pound
sterling symbol, shown as # in the descriptions.
Wildmat Description of strings that match
abc the one string "abc"
abc,def the two strings "abc" and "def"
$@ the one character string "#"
a* any string that begins with "a"
a*b any string that begins with "a" and ends with "b"
a*,*b any string that begins with "a" or ends with "b"
a*,!*b any string that begins with "a" and does not end with "b"
a*,!*b,c* any string that begins with "a" and does not end with "b", and
any string that begins with "c" no matter what it ends with
a*,c*,!*b any string that begins with "a" or "c" and does not end
with "b"
?a* any string with "a" as its second character
??a* any string with "a" as its third character
*a? any string with "a" as its penultimate character
*a?? any string with "a" as its antepenultimate character
[abc] the three strings "a", "b", and "c"
[^abc] any one character string except the three "a", "b", and "c"
[a-zA-Z] any one character string consisting of an ASCII letter
[0-9]* any string beginning with an ASCII digit
[a$@] the two strings "a" and "#"
[a-$@] the 67 one character strings from "a" to "#"
[a^bc] the four strings "a", "^", "b", and "c"
[a-c-] the four strings "a", "b", "c", and "-"
[a-c-f] the five strings "a", "b", "c", "-", and "f"
[-a0-] the three strings "-", "a", and "0"
[-a0-2] the five strings "-", "a", "0", "1", and "2"
[--0] the four strings "-", ".", "/", and "0"
[]abc] the four strings "]", "a", "b", and "c"
[ab]c] the two strings "ac]" and "bc]"
[b-a] some unspecified set of one character strings
[^b-a] all one character strings not matched by the previous pattern
========
[The following changes also need be made to other sections for consistency.
In addition the formal grammar will need updating.]
6. Format for Keyword Descriptions
[...]
The name "wildmat" for a parameter indicates that it is a
! wildmat format pattern as defined in section 5. If the parameter
! does not meet the requirements of that section (for example, if
! it does not fit the grammar of 5.1) the NNTP server MAY place some
! interpretation on it (not specified by this document) or otherwise
! MUST generate a 501 response.
9.4 The LIST Keyword
9.4.1 LIST
[...]
If the optional wildmat parameter is specified, the list is
! limited to only those groups whose names match the wildmat. This
! will normally be very efficient if the wildmat is a simple group
! name.
9.4.2 LIST ACTIVE.TIMES
LIST ACTIVE.TIMES [wildmat]
[...]
If the optional wildmat parameter is specified, the list is
! limited to only those groups whose names match the wildmat. This
! will normally be very efficient if the wildmat is a simple group
! name.
9.4.4 LIST DISTRIB.PATS
LIST DISTRIB.PATS
The distrib.pats file is maintained by some news transport
systems to allow clients to choose a value for the
Distribution: line in the header of a news article being
posted. The information returned consists of lines, in no
particular order, each of which contains three fields
! separated by colons: a weight, a wildmat (which may be a simple
! group name), and a Distribution: value, in that order.
[...]
9.4.5 LIST NEWSGROUPS
LIST NEWSGROUPS [wildmat]
[...]
If the information is not available, the
server will return the 503 response. If the server does not
recognize the command it should return a 501 response. If
! the optional wildmat parameter is specified, the list is
! limited to only those groups that match the wildmat (no
! matching is done on the group descriptions). This will
! normally be very efficient if the wildmat is a simple group
! name. If nothing is matched
an empty list is returned, not an error.
11.4 NEWNEWS
! NEWNEWS wildmat date time [GMT]
! The message-ids of all articles added to a set of newsgroups
! since the given date-time will be listed. The set consists
! of all newsgroups whose name matches the wildmat.
The format of the listing will be one message-id per line, as
though text were being sent. Each message-id SHALL appear only
once in a response. The order of the response has no specific
significance and may vary from response to response in the
! same session. Date and time are in the same format as the
! NEWGROUPS command.
Note that an empty list (i.e., the text body returned by this
command consists only of the terminating period) is a possible
valid response, and indicates that there is currently no new
news.
Clients SHOULD make all queries in Coordinated Universal Time
when possible.
--
Clive D.W. Feather | Work: <clive at demon.net> | Tel: +44 20 8371 1138
Internet Expert | Home: <clive at davros.org> | Fax: +44 20 8371 1037
Demon Internet | WWW: http://www.davros.org | DFax: +44 20 8371 4037
Thus plc | | Mobile: +44 7973 377646
More information about the ietf-nntp
mailing list