ietf-nntp Wildmats (was: draft posted)
Russ Allbery
rra at stanford.edu
Tue Nov 27 15:28:47 PST 2001
Charles Lindsey <chl at clw.cs.man.ac.uk> writes:
> Last thing I remember is that Clive had two texts (both should still be
> on his website) and we had reached consensus on the shorter one, without
> the character classes.
I'm including that change for reference. It seems fine to me. Does
anyone have any objections with making the following changes in the draft?
(Note that the mention of LIST NEWSGROUPS being very efficient with a
single group name has since been removed from the draft and shouldn't be
reintroduced by this change, and the other changes to the text elsewhere
in the document should be reviewed against -14.)
NNTP proposed text
Section 5
Working copy
Last changed 2001-05-02 14:45 UTC
This text consists of my updated proposal of 2001-04-30 with wildmat
sets removed. I've diff-marked the changes in section 5 relative to the
updated proposal. This proposal is a strict subset of the previous one
(that is, every wildmat in the previous proposal either has the same
meaning in this one or is not permitted).
5. The WILDMAT format
The WILDMAT format described here is based on the version
first developed by Rich Salz [5], which in turn was derived from
the format used in the UNIX "find" command to articulate file names.
It was developed to provide a uniform mechanism for matching
patterns in the same manner that the UNIX shell matches filenames.
5.1 Wildmat syntax
A wildmat is described by the following augmented BNF[9] syntax
(note that this syntax contains ambiguities and special cases described
at the end):
wildmat = wildmat-pattern *("," ["!"] wildmat-pattern)
wildmat-pattern = 1*wildmat-item
wildmat-item = wildmat-exact / wildmat-wild
! wildmat-exact = %x21-29 / %x2B / %x2D-3E / %x40-5A / %x5E-7F /
! UTF-8-non-ascii ; exclude * , ? [ \ ]
! wildmat-wild = "*" / "?"
-
UTF-8-non-ascii is defined in section 13.
This syntax must be interpreted subject to the following rule:
- Where a wildmat-pattern is not immediately preceded by "!", it shall
not begin with a "!".
-
! NOTE: the characters \ , [ and ] are not allowed in wildmats, while *
! and ? are always wildcards. This should not be a problem since these
! characters cannot occur in newsgroup names, which is the only current
! use of wildmats. Backslash is commonly used to supress the special
! meaning of characters and brackets to introduce sets, but there is no
! existing standard practice for these in wildmats and so they were omitted
! from this specification. A future extension to this document may provide
! semantics for these characters.
5.2 Wildmat semantics
A wildmat is tested against a string, and either matches or does not
match. To do this, each constituent wildmat-pattern is matched against
the string and the rightmost pattern that matches is identified. If
that wildmat-pattern is not preceded with "!", the whole wildmat matches.
If it is preceded by "!", or if no wildmat-pattern matches, the whole
wildmat does not match.
For example, consider the wildmat "a*,!*b,*c*":
the string "aaa" matches because the rightmost match is with "a*"
the string "abb" does not match because the rightmost match is with "*b"
the string "ccb" matches because the rightmost match is with "*c*"
the string "xxx" does not match because no wildmat-pattern matches
A wildmat-pattern matches a string if the string can be broken into
components, each of which matches the corresponding wildmat-item in
the pattern; the matches must be in the same order, and the whole string
must be used in the match. The pattern is "anchored"; that is, the first
and last characters in the string must match the first and last item
respectively (unless that item is an asterisk matching zero characters).
A wildmat-exact matches the same character (which may be more than one
octet in UTF-8).
"?" matches exactly one character (which may be more than one octet).
"*" matches zero or more characters. It can match an empty string, but
it cannot match a subsequence of a UTF-8 sequence that is not aligned
to the character boundaries.
-
5.3 Extensions
An NNTP server or extension MAY extend the syntax or semantics of
wildmats provided that all wildmats that meet the requirements of
section 5.1 have the meaning ascribed to them by section 5.2.
Future editions of this document may also extend wildmats.
5.4 Examples
In these examples, $ and @ are used to represent the two octets 0xC2
and 0xA3 respectively; $@ is thus the UTF-8 encoding for the pound
sterling symbol, shown as # in the descriptions.
Wildmat Description of strings that match
abc the one string "abc"
abc,def the two strings "abc" and "def"
$@ the one character string "#"
a* any string that begins with "a"
a*b any string that begins with "a" and ends with "b"
a*,*b any string that begins with "a" or ends with "b"
a*,!*b any string that begins with "a" and does not end with "b"
a*,!*b,c* any string that begins with "a" and does not end with "b", and
any string that begins with "c" no matter what it ends with
a*,c*,!*b any string that begins with "a" or "c" and does not end
with "b"
?a* any string with "a" as its second character
??a* any string with "a" as its third character
*a? any string with "a" as its penultimate character
*a?? any string with "a" as its antepenultimate character
-
========
[The following changes also need be made to other sections for consistency.
In addition the formal grammar will need updating.]
6. Format for Keyword Descriptions
[...]
The name "wildmat" for a parameter indicates that it is a
wildmat format pattern as defined in section 5. If the parameter
does not meet the requirements of that section (for example, if
it does not fit the grammar of 5.1) the NNTP server MAY place some
interpretation on it (not specified by this document) or otherwise
MUST generate a 501 response.
9.4 The LIST Keyword
9.4.1 LIST
[...]
If the optional wildmat parameter is specified, the list is
limited to only those groups whose names match the wildmat. This
will normally be very efficient if the wildmat is a simple group
name.
9.4.2 LIST ACTIVE.TIMES
LIST ACTIVE.TIMES [wildmat]
[...]
If the optional wildmat parameter is specified, the list is
limited to only those groups whose names match the wildmat. This
will normally be very efficient if the wildmat is a simple group
name.
9.4.4 LIST DISTRIB.PATS
LIST DISTRIB.PATS
The distrib.pats file is maintained by some news transport
systems to allow clients to choose a value for the
Distribution: line in the header of a news article being
posted. The information returned consists of lines, in no
particular order, each of which contains three fields
separated by colons: a weight, a wildmat (which may be a simple
group name), and a Distribution: value, in that order.
[...]
9.4.5 LIST NEWSGROUPS
LIST NEWSGROUPS [wildmat]
[...]
If the information is not available, the
server will return the 503 response. If the server does not
recognize the command it should return a 501 response. If
the optional wildmat parameter is specified, the list is
limited to only those groups that match the wildmat (no
matching is done on the group descriptions). This will
normally be very efficient if the wildmat is a simple group
name. If nothing is matched
an empty list is returned, not an error.
11.4 NEWNEWS
NEWNEWS wildmat date time [GMT]
The message-ids of all articles added to a set of newsgroups
since the given date-time will be listed. The set consists
of all newsgroups whose name matches the wildmat.
The format of the listing will be one message-id per line, as
though text were being sent. Each message-id MUST appear only
once in a response. The order of the response has no specific
significance and may vary from response to response in the
same session. Date and time are in the same format as the
NEWGROUPS command.
Note that an empty list (i.e., the text body returned by this
command consists only of the terminating period) is a possible
valid response, and indicates that there is currently no new
news.
Clients SHOULD make all queries in Coordinated Universal Time
when possible.
--
Russ Allbery (rra at stanford.edu) <http://www.eyrie.org/~eagle/>
More information about the ietf-nntp
mailing list