[NNTP] Internationalisation, attempt 2
Matthias Andree
matthias.andree at gmx.de
Tue May 3 09:31:22 PDT 2005
"Clive D.W. Feather" <clive at demon.net> writes:
> Okay, here's the second attempt.
>
>
> 10. Internationalisation Considerations
>
> 10.1 Introduction and historical situation
>
> RFC 977 [RFC977] was written at a time when internationalisation was
> not seen as a significant issue.
That's a pretty friendly wording of US American i18n ignorance. Nevermind.
> As such, it was written on the
> assumption that all communication would be in ASCII and use only a
> 7-bit transport layer, although in practice all known implementations
> are 8-bit clean.
Suggest to replace "all known implementations" by "most known
implementations" -- unless you can produce proof you've known all at the
time of writing.
> Since then, Usenet and NNTP have spread throughout the world. In the
> absence of standards for handling the issues of language and
> character sets, countries, newsgroup hierarchies, and individuals
> have found a variety of solutions that work for them but are not
> necessarily appropriate elsewhere. For example, some have adopted a
> default 8-bit character set appropriate to their needs (such as
> ISO8859-1 in Western Europe or KOI-8 in Russia), others have used
This should be spaced "ISO 8859-1" or even better use the official IANA
spelling "ISO-8859-1". I haven't checked on KOI-8.
> ASCII (either US-ASCII or national variants) in headers but local 16-
> bit character sets in article bodies, and still others have gone for
> a combination of MIME [RFC2045] and UTF-8. With the increased use of
> MIME in email, it is becoming more common to find NNTP articles
> containing MIME headers identifying the character set of the body,
> but this is far from universal.
>
> The resulting confusion does not help interoperability.
>
> One point that has been generally accepted is that articles can
> contain octets with the top bit set, and NNTP is only expected to
> operate on 8-bit clean transport paths.
>
> 10.2 This specification
>
> Part of the role of this present specification is to eliminate this
> confusion and promote interoperability as far as possible. At the
> same time, it is necessary to accept the existence of the present
> situation and not gratuitously break existing implementations and
> arrangements, even if they are less than optimal. Therefore the
> current practice described above has been taken into consideration in
> producing this specification.
>
> This specification extends NNTP from US-ASCII [ANSI1986] to UTF-8
> [RFC3629]. Except in the two areas discussed below, UTF-8 (which is
> a superset of ASCII)
This should probably be US-ASCII not just ASCII.
> is mandatory and implementations MUST NOT use
> any other encoding.
>
> Firstly, the use of MIME for article headers and bodies is strongly
> recommended. However, given widely divergent existing practices, an
> attempt to require a particular encoding and tagging standard would
> be premature at this time. Accordingly, this specification allows
> the use of arbitrary 8-bit data in articles subject to the following
> requirements and recommendations.
>
> o The names of headers (e.g. "From" or "Subject") MUST be in US-
> ASCII.
>
> o Header values SHOULD use US-ASCII or an encoding based on it such
> as RFC 2047 [RFC2047] until such time as another approach has been
> standardised. 8-bit encodings (including UTF-8) MAY be used but
> are likely to cause interoperability problems.
I don't think this is strong enough. 8-bit encoded Subjects are almost
guaranteed to cause trouble, I'd think the paragraph should only allow
UTF-8 and RFC-2047. Suggestion:
" Header values SHOULD use US-ASCII or an encoding based on it such
as RFC 2047 [RFC2047] until such time as another approach has been
standardised. 8-bit encodings other than UTF-8 SHOULD NOT be used,
because they are likely to cause interoperability problems."
> o The character set of article bodies SHOULD be indicated in the
> article headers, and this SHOULD be done in accordance with
> MIME.
Please consider "MUST" for the second "SHOULD". I see no sane solution
other than MIME at this time, and MIME is the widest spread.
> Secondly, the following requirements are placed on the newsgroups
> list returned by the LIST NEWSGROUPS (Section 7.6.6) command:
>
> o Although this specification allows UTF-8 for newsgroup names, they
> SHOULD be restricted to US-ASCII until a successor to RFC 1036
> [RFC1036] standardises another approach. 8-bit encodings MAY be
> used but are likely to cause interoperability problems.
>
> o The newsgroup description SHOULD be in US-ASCII or UTF-8 unless
> and until a successor to RFC 1036 standardised other encoding
> arrangements. 8-bit encodings other than UTF-8 MAY be used but are
> likely to cause interoperability problems.
Again, these paragraphs ought to read "8-bit encodings other than UTF-8
SHOULD NOT be used, because they are ...(interoperability)..."
--
Matthias Andree
More information about the ietf-nntp
mailing list