[NNTP] Internationalisation, attempt 2

Matthias Andree matthias.andree at gmx.de
Tue May 3 09:31:22 PDT 2005


"Clive D.W. Feather" <clive at demon.net> writes:

> Okay, here's the second attempt.
>
>
> 10.  Internationalisation Considerations
>
> 10.1  Introduction and historical situation
>
>    RFC 977 [RFC977] was written at a time when internationalisation was
>    not seen as a significant issue.

That's a pretty friendly wording of US American i18n ignorance. Nevermind.

>    As such, it was written on the
>    assumption that all communication would be in ASCII and use only a
>    7-bit transport layer, although in practice all known implementations
>    are 8-bit clean.

Suggest to replace "all known implementations" by "most known
implementations" -- unless you can produce proof you've known all at the
time of writing.

>    Since then, Usenet and NNTP have spread throughout the world.  In the
>    absence of standards for handling the issues of language and
>    character sets, countries, newsgroup hierarchies, and individuals
>    have found a variety of solutions that work for them but are not
>    necessarily appropriate elsewhere.  For example, some have adopted a
>    default 8-bit character set appropriate to their needs (such as
>    ISO8859-1 in Western Europe or KOI-8 in Russia), others have used

This should be spaced "ISO 8859-1" or even better use the official IANA
spelling "ISO-8859-1". I haven't checked on KOI-8.

>    ASCII (either US-ASCII or national variants) in headers but local 16-
>    bit character sets in article bodies, and still others have gone for
>    a combination of MIME [RFC2045] and UTF-8.  With the increased use of
>    MIME in email, it is becoming more common to find NNTP articles
>    containing MIME headers identifying the character set of the body,
>    but this is far from universal.
>
>    The resulting confusion does not help interoperability.
>
>    One point that has been generally accepted is that articles can
>    contain octets with the top bit set, and NNTP is only expected to
>    operate on 8-bit clean transport paths.
>
> 10.2  This specification
>
>    Part of the role of this present specification is to eliminate this
>    confusion and promote interoperability as far as possible.  At the
>    same time, it is necessary to accept the existence of the present
>    situation and not gratuitously break existing implementations and
>    arrangements, even if they are less than optimal.  Therefore the
>    current practice described above has been taken into consideration in
>    producing this specification.
>
>    This specification extends NNTP from US-ASCII [ANSI1986] to UTF-8
>    [RFC3629].  Except in the two areas discussed below, UTF-8 (which is
>    a superset of ASCII)

This should probably be US-ASCII not just ASCII.

> is mandatory and implementations MUST NOT use
>    any other encoding.
>
>    Firstly, the use of MIME for article headers and bodies is strongly
>    recommended.  However, given widely divergent existing practices, an
>    attempt to require a particular encoding and tagging standard would
>    be premature at this time.  Accordingly, this specification allows
>    the use of arbitrary 8-bit data in articles subject to the following
>    requirements and recommendations.
>
>    o  The names of headers (e.g.  "From" or "Subject") MUST be in US-
>       ASCII.
>
>    o  Header values SHOULD use US-ASCII or an encoding based on it such
>       as RFC 2047 [RFC2047] until such time as another approach has been
>       standardised. 8-bit encodings (including UTF-8) MAY be used but
>       are likely to cause interoperability problems.

I don't think this is strong enough. 8-bit encoded Subjects are almost
guaranteed to cause trouble, I'd think the paragraph should only allow
UTF-8 and RFC-2047. Suggestion:

" Header values SHOULD use US-ASCII or an encoding based on it such
  as RFC 2047 [RFC2047] until such time as another approach has been
  standardised. 8-bit encodings other than UTF-8 SHOULD NOT be used,
  because they are likely to cause interoperability problems."

>    o  The character set of article bodies SHOULD be indicated in the
>       article headers, and this SHOULD be done in accordance with
>    MIME.

Please consider "MUST" for the second "SHOULD". I see no sane solution
other than MIME at this time, and MIME is the widest spread.

>    Secondly, the following requirements are placed on the newsgroups
>    list returned by the LIST NEWSGROUPS (Section 7.6.6) command:
>
>    o  Although this specification allows UTF-8 for newsgroup names, they
>       SHOULD be restricted to US-ASCII until a successor to RFC 1036
>       [RFC1036] standardises another approach. 8-bit encodings MAY be
>       used but are likely to cause interoperability problems.
>
>    o  The newsgroup description SHOULD be in US-ASCII or UTF-8 unless
>       and until a successor to RFC 1036 standardised other encoding
>       arrangements. 8-bit encodings other than UTF-8 MAY be used but are
>       likely to cause interoperability problems.

Again, these paragraphs ought to read "8-bit encodings other than UTF-8
SHOULD NOT be used, because they are ...(interoperability)..."

-- 
Matthias Andree



More information about the ietf-nntp mailing list