ietf-nntp OVER and PAT

Russ Allbery rra at stanford.edu
Tue Nov 21 15:07:40 PST 2000


Clive D W Feather <clive at demon.net> writes:

> I think the concern was that the result isn't either predictable or
> convenient for a client. For example, the following headers:

>     Subject: Re: This is a test header

>     Subject: Re: This is a
>         test header

> should be equivalent,

Why do you feel that those to headers should be equivalent?  DRUMS
explicitly says that they're not.  DRUMS says that the following two
headers are equivalent:

    Subject: Re: This is a test header

    Subject: Re: This is a
     test header

and the headers that you gave are not.  Quoting DRUMS:

| 2.2.3. Long Header Fields
| 
| Each header field is logically a single line of characters comprising 
| the field name, the colon, and the field body. For convenience however, 
| and to deal with the 998/78 character limitations per line, the field 
| body portion of a header field can be split into a multiple line 
| representation; this is called "folding". The general rule is that 
| wherever this standard allows for folding white space (not simply WSP 
| characters), a CRLF may be inserted before any WSP. For example, the 
| header field:
| 
|         Subject: This is a test
| 
| can be represented as:
| 
|         Subject: This
|          is a test
| 
| The process of moving from this folded multiple-line representation of a 
| header field to its single line representation is called "unfolding". 
| Unfolding is accomplished by simply removing any CRLF that is 
| immediately followed by WSP. Each header field should be treated in its 
| unfolded form for further syntactic and semantic evaluation.

News has a tradition of following the lead of mail here, so I think we
should use the same approach.

*Ideally*, I'd like to use exactly the above definition of unfolding for
overview data and then additionally convert any tab to a space and
otherwise not make any changes to the header (headers containing lone NUL,
CR, or LF should really be rejected, but if we want to be complete, we
could say that those three characters are also replaced with space if
present).  I think we should be able to preserve all the other control
characters.

> and I want to write a single wildmat that matches either. Personally I'd
> also like to be able to match:

>     Subject: Re:  This is a test header

> as well.

I think we're getting towards regular expressions here, honestly.  Maybe
PAT is an inherently bad idea and what we really want to do is standardize
XHDR and introduce a separate command that uses regexes.  Bleh.

>> Note that we certainly don't have any mechanism for matching arbitrary
>> whitespace in a header right now, and I've not seen any huge cry for
>> one.

> Should we be prevented from looking ahead ?

I'm just really leery of it, as I'd like to see us kick this out as an RFC
before too much longer.

-- 
Russ Allbery (rra at stanford.edu)             <http://www.eyrie.org/~eagle/>



More information about the ietf-nntp mailing list