ietf-nntp OVER and PAT

Clive D.W. Feather clive at demon.net
Sat Nov 18 08:09:14 PST 2000


Charles Lindsey said:
> I promised to summarise where I thought we had reached when we last
> discussed these things at the end of August.

Thanks for this.

> OVER
> ----
[...]

> However, noone seemed to have read what the draft actually said (and still
> says in 9.5.2.2), which is:
> 
> 	"Any sequence of US-ASCII space or non-printing characters in a
> 	field MUST be replaced by a single US-ASCII space."
> 
> IOW, all whitespace, folding and control characters is collapsed to a
> single SP, which is a nice simple rule to understand, and makes it easy to
> do pattern matching against it (especially when using wildmats in XPAT
> style, where there is no means to specify multiple spaces in the wildmat,
> nor any means to specify arbitrary whitespace, even if \020 is
> introduced).

Yes, I like that.

> So can we first of all agree to accept that rule for OVER, even though
> noone currently implements it, on the grounds that there is no consistent
> interpretation in use anyway?

Fine by me.

> I would however add a NOTE:
> 
> 	NOTE: Neither a non-breaking space character (such as that
> 	represented by the two octets 0xC2A0 in UTF-8) nor any
> 	non-printing character outside of the strict US-ASCII subset of
> 	UTF-8 is to be replaced in this manner.
> 
> Indeed, the text I quoted should actually say "... or US-ASCII non-printing
> characters ...".

I would phrase it in the positive:

    NOTE: the only characters to replaced in this manner are those with
    ASCII codes 0 to 32 (inclusive) and 127. Characters 128 upwards
    (which are encoded with 2 or more octets in UTF-8) are not replaced,
    even when described as "non-printing".

> PAT
> ---
> 
> The text in the latest draft is clearly wrong, since the sysntax allows
> exactly one wildmat, whereas the text speaks of "one or more wildmats".

I'd noticed that, but was waiting for this argument to resolve.

> I think the consensus we reached was that we should try to implement the
> intention of XPAT. I.e. we would allow several wildmats, and the gap
> between the wildmats would match some form of whitespace/folding, which is
> how XPAT is usually implemented currently.
[...]
> The Bad News is that it does not work well with "comma as indication of
> alternative" for the reasons I have already posted. Please read that
> example carefully.

I did, but I still don't understand it properly.

I think we should do one of two things.

(1) PAT takes one wildmat argument, which is just like everyone else's.

(2) PAT takes multiple wildmat arguments, just like everyone elses,
and each one is tested separately; all must pass.

[In both cases we use \, to escape commas, and \u0020 or something to
escape space.]

    PAT Subject 123- foo*,bar*,baz* *jim,*fred

then matches the six subjects:

    Subject: foo any junk here jim
    Subject: foo any junk here fred
    Subject: bar any junk here jim
    Subject: bar any junk here fred
    Subject: baz any junk here jim
    Subject: baz any junk here fred

> I grant you that talk in recent days has been going in the opposite
> direction, but I think that sticking with XPAT has much to commend it,
> other things being equal (which they probably are not).

I thought the problems were that nobody really understood XPAT in the hard
cases.

> The way we were describing it back in August was "first canonicalize the
> header, then match against the wildmat(s)". The canonicalization needed to
> be the same as that in OVER, so that implementors could use the overview
> database.

Agreed.

> It was agreed that the matching against the wildmat(s) needed to
> be "anchored at both ends" of the header, because that is standard wildmat
> semantics.

Seems fine to me (though is the header *name* stripped off or not) ?

> So if we accept the canonicalization in OVER as collapsing all
> whitespace to single SP (see above), then the semantics of the whole thing
> become clear and even clean (relatively speaking).

Agreed.

> There should be an explicit statement that implementors need only do PAT
> matching against the headers they choose to include in their Overviews
> (they MAY do more, of course). So we need an extra response code, for
> which I suggest:
> 	521  Sorry, we don't do that header

Also agreed.

> There was also a suggestion to return an "article number" of zero when using
> PAT with an article specified by <message-id>, rather than returning a
> "412 no newsgroup selected" which is really a bit silly.

That was my suggestion; the use of zero comes from ARTICLE/HEAD/BODY/STAT
in the same situation.

-- 
Clive D.W. Feather  | Work:  <clive at demon.net>   | Tel:  +44 20 8371 1138
Internet Expert     | Home:  <clive at davros.org>  | Fax:  +44 20 8371 1037
Demon Internet      | WWW: http://www.davros.org | DFax: +44 20 8371 4037
Thus plc            |                            | Mobile: +44 7973 377646 



More information about the ietf-nntp mailing list