ietf-nntp wildmat routines and text

Clive D.W. Feather clive at demon.net
Thu Jul 27 15:32:51 PDT 2000


Russ Allbery said:
> Below is the documentation that I wrote for INN on how wildmat patterns
> work, with all the references to @ removed.  This may be suitable for the
> standard, although it could probably use some pruning before put into an
> RFC since it's intended to be wordy and clear right now.

I've taken this, plus the other comments made on wildmats, and attempted to
write a new section 5. This is it. People may find it rather formal, but I
felt that that was better than leaving ambiguities in. I've included a
*lot* more examples.

I added \u at the outer level, but not special meanings for \ inside
character classes. The wording can easily be adjusted if we want to
remove the first or add the second.


  5. The WILDMAT format

! The WILDMAT format described here is based on the version
! first developed by Rich Salz [5] which was derived from the format
  used in the UNIX "find" command to articulate file names. It
  was developed to provide a uniform mechanism for matching
  patterns in the same manner that the UNIX shell matches
! filenames.

  5.1 Wildmat structure

! A wildmat pattern consists of one or more component patterns. If
! there is more than one, they are separated by commas. Each component
! pattern can optionally be prefixed with an exclamation mark (which is
! not part of the component pattern). A string is tested against a
! wildmat pattern as follows:
! * test the string against each component and note which match;
! * if none match, the string does not match the wildmat;
! * if the rightmost component that matches is prefixed with an exclamation
!   mark, the string does not match the wildmat;
! * otherwise the string matches the wildmat.
!
! A component pattern consists of one or more units (there is no separator
! between the units). A unit consists of any of the following:
! [1] any ASCII character in the range %x22 to %x7E except for %x2A, %x2C,
!     %x2F, %x5B, and %x5C (thus the excluded characters are control codes,
!     space, exclamation, asterisk, comma, question mark, open square
!     bracket, backslash, and delete);
! [2] any multi-octet UTF-8 character;
! [3] backslash, "u", and then four hexadecimal digits;
! [4] backslash, "U", and then eight hexadecimal digits;
! [5] asterisk;
! [6] question mark;
! [7] backslash followed by any non-alphanumeric ASCII character in the
!     range %x21 to %x7E;
! [8] a set specifier.
!
! A string is matched against a component pattern by matching each
! character in the string against a corresponding unit in the pattern.
! (Apart from asterisk, each unit matches exactly one character;
! asterisk matches any number of characters including zero.) The
! pattern is "anchored"; that is, the first and last characters in the
! string must match the first and last unit respectively (unless that
! unit is an asterisk matching zero characters). The various units
! match characters as follows:
! [1] and [2] match precisely that character.
! [3] and [4] match the character that has the ISO 10646 code given by
!     the hexadecimal number (so "\u00a3" matches the pound sterling
!     character, which is represented as the two octets %xC2 %xA3 in UTF-8).
! [5] matches any number of characters, including zero.
! [6] matches any one character.
! [7] matches the ASCII character following the backslash (this may itself
!     be a backslash).
! [8] matches any of the characters in the set.
!
! A set specifier consists of:
! * open square bracket ([)
! * an optional caret (^)
! * one or more set values, which are either:
!   - an individual character (which may be multioctet)
!   - a range specifier, given by two characters (which may be multioctet)
!     separated with a minus (-)
! * close square bracket (])
!
! A caret, minus, or close square bracket is always taken to have its
! special meaning where possible. Thus a close square bracket can only
! be the first character in the set values, a minus can only be the
! first, the last, or the second character in a range specifier, and
! a caret cannot be the first when the optional caret was not specified.
!
! If the set specifier includes the optional caret, the set consists
! of all the characters that would not be in the set if the caret were
! omitted.
!
! If the set specifier does not include the optional caret, then the
! set consists of:
! - all the individual characters;
! - for each range, all the characters whose codes are greater than or
!   equal to that of the first character and less than or equal to that
!   of the second character.
! In character ranges, the codes used are those of ISO 10646, no matter
! what the local character set is. If the first character has a higher
! code than the second, the meaning is undefined.

  Implementers must be careful to apply the pattern-matching process
  to whole characters encoded in UTF-8, and not to individual octets.

  5.1  Examples

! Wildmat    Description of strings that match
!
! abc        the one string "abc"
! abc,def    the two strings "abc" and "def"
! a*         any string that begins with "a"
! a*b        any string that begins with "a" and ends with "b"
! a*,*b      any string that begins with "a" or ends with "b"
! a*,!*b     any string that begins with "a" and does not end with "b"
! a*,!*b,c*  any string that begins with "a" and does not end with "b",
!            or any string that begins with "c"
! a*,c*,!*b  any string that begins with "a" or "c" and does not end
!            with "b"
! a\u0062c   the one string "abc"
! a\u002a    the one string "a*"
! a\*        the one string "a*"
! abc\,def   the one string "abc,def"
! ?a*        any string with "a" as its second character
! ??a*       any string with "a" as its third character
! *a?        any string with "a" as its penultimate character
! *a??       any string with "a" as its antepenultimate character
! [abc]      the three strings "a", "b", and "c"
! [^abc]     any one character string except the three "a", "b", and "c"
! [a-zA-Z]   any one character string consisting of an ASCII letter
! [0-9]*     any string beginning with an ASCII digit
! [a^bc]     the four strings "a", "^", "b", and "c"
! [a-c-]     the four strings "a", "b", "c", and "-"
! []abc]     the four strings "]", "a", "b", and "c"
! [ab]c]     the two strings "ac]" and "bc]"
! [a\]c]     the two strings "ac]" and "\c]"

-- 
Clive D.W. Feather  | Work:  <clive at demon.net>   | Tel:  +44 20 8371 1138
Internet Expert     | Home:  <clive at davros.org>  | Fax:  +44 20 8371 1037
Demon Internet      | WWW: http://www.davros.org | DFax: +44 20 8371 4037
Thus plc            |                            | Mobile: +44 7973 377646 



More information about the ietf-nntp mailing list