ietf-nntp Draft summary of IETF 48 meeting
Russ Allbery
rra at stanford.edu
Thu Aug 17 15:34:47 PDT 2000
Andrew Gierth <andrew at erlenstar.demon.co.uk> writes:
> Are there any cases where non-printing controls (other than CF+LF, NUL
> or TAB) are legitimately used in headers?
Much to my surprise, RFC 1036 appears to allow them by way of RFC 822:
3.1.2. STRUCTURE OF HEADER FIELDS
Once a field has been unfolded, it may be viewed as being com-
posed of a field-name followed by a colon (":"), followed by a
field-body, and terminated by a carriage-return/line-feed.
The field-name must be composed of printable ASCII characters
(i.e., characters that have values between 33. and 126.,
decimal, except colon). The field-body may be composed of any
ASCII characters, except CR or LF. (While CR and/or LF may be
present in the actual text, they are removed by the action of
unfolding the field.)
It explicitly says "any ASCII characters" not "printable ASCII characters"
like it says just above when describing the header names. Obviously a NUL
in practice would break, but apart from that it sounds like they're at
least theoretically allowed. (In fact, RFC 822 allows NUL as well, along
with bare CR and bare LF.)
optional-field =
/ "Message-ID" ":" msg-id
/ "Resent-Message-ID" ":" msg-id
/ "In-Reply-To" ":" *(phrase / msg-id)
/ "References" ":" *(phrase / msg-id)
/ "Keywords" ":" #phrase
/ "Subject" ":" *text
/ "Comments" ":" *text
/ "Encrypted" ":" 1#2word
/ extension-field ; To be defined
/ user-defined-field ; May be pre-empted
text = <any CHAR, including bare ; => atoms, specials,
CR & bare LF, but NOT ; comments and
including CRLF> ; quoted-strings are
; NOT recognized.
; ( Octal, Decimal.)
CHAR = <any ASCII character> ; ( 0-177, 0.-127.)
DRUMS allows much the same except for forbidding NUL:
2.2. Header Fields
Header fields are lines composed of a field name, followed by a colon
(":"), followed by a field body, and terminated by CRLF. A field name
MUST be composed of printable US-ASCII characters (i.e., characters that
have values between 33 and 126, inclusive), except colon. A field body
may be composed of any US-ASCII characters, except for CR and LF.
However, a field body may contain CRLF when used in header "folding" and
"unfolding" as described in section 2.2.3. All field bodies MUST conform
to the syntax described in sections 3 and 4 of this standard.
and
NO-WS-CTL = %d1-8 / ; US-ASCII control characters
%d11 / ; that do not include the
%d12 / ; carriage return, line feed,
%d14-31 / ; and white space characters
%d127
FWS = ([*WSP CRLF] 1*WSP) / ; Folding white space
obs-FWS
utext = NO-WS-CTL / ; Non white space controls
%d33-126 / ; The rest of US-ASCII
obs-utext
unstructured = *([FWS] utext) [FWS]
The only non-printing control that I've seen in practice in headers apart
from CR, LF, and TAB is BS, and that generally as a mistake. However,
from the above, it wouldn't surprise me if at least some people were using
this ability for one thing or another. Isn't there a Japanese text
encoding that uses ESC as one of the introduction characters?
I don't see any obvious reasons not to just include control characters
other than NUL, TAB, CR, and LF in the overview data verbatim.
--
Russ Allbery (rra at stanford.edu) <http://www.eyrie.org/~eagle/>
More information about the ietf-nntp
mailing list