ietf-nntp OVER and PAT

Russ Allbery rra at stanford.edu
Sat Nov 18 12:17:49 PST 2000


Charles Lindsey <chl at clw.cs.man.ac.uk> writes:

> However, noone seemed to have read what the draft actually said (and still
> says in 9.5.2.2), which is:

> 	"Any sequence of US-ASCII space or non-printing characters in a
> 	field MUST be replaced by a single US-ASCII space."

> IOW, all whitespace, folding and control characters is collapsed to a
> single SP, which is a nice simple rule to understand, and makes it easy
> to do pattern matching against it (especially when using wildmats in
> XPAT style, where there is no means to specify multiple spaces in the
> wildmat, nor any means to specify arbitrary whitespace, even if \020 is
> introduced).

> So can we first of all agree to accept that rule for OVER, even though
> noone currently implements it, on the grounds that there is no
> consistent interpretation in use anyway?

I think that given the charter of the working group, when we're dealing
with existing concepts (rather than something new like PAT that's just
based on XPAT, XHDR, and a few other things), the phrase "even though no
one currently implements it" should raise all sorts of warning bells.

If Andrew says doing this causes problems in practice, I'm inclined to go
with that as a good reason not to do things this way.  It's theoretically
cleaner, but theory and practice have had a long time to diverge.

I think a survey of implementations found that most people were replacing
tabs and newlines with spaces and most of the disagreement was over
whether CRLF was replaced by two spaces or one space, yes?  It seems to me
that we should standardize on the results of that survey, then, and pick
how many spaces CRLF should map to (I vote for one; I think it's
conceptually a single character).

Note that mapping non-printing characters to space may break some
character encodings that are introduced by ESC.

> I would however add a NOTE:

> 	NOTE: Neither a non-breaking space character (such as that
> 	represented by the two octets 0xC2A0 in UTF-8) nor any
> 	non-printing character outside of the strict US-ASCII subset of
> 	UTF-8 is to be replaced in this manner.

We want this regardless, yup.

> I think the consensus we reached was that we should try to implement the
> intention of XPAT. I.e. we would allow several wildmats, and the gap
> between the wildmats would match some form of whitespace/folding, which
> is how XPAT is usually implemented currently.

Well, my memory may be hazy, but that's definitely not a consensus that I
agree with.  I think that XPAT as currently implemented in INN is just way
too ugly to standardize on.

-- 
Russ Allbery (rra at stanford.edu)             <http://www.eyrie.org/~eagle/>



More information about the ietf-nntp mailing list