[NNTP] Internationalisation, attempt 2

Clive D.W. Feather clive at demon.net
Fri Apr 29 00:18:19 PDT 2005


Okay, here's the second attempt.


10.  Internationalisation Considerations

10.1  Introduction and historical situation

   RFC 977 [RFC977] was written at a time when internationalisation was
   not seen as a significant issue.  As such, it was written on the
   assumption that all communication would be in ASCII and use only a
   7-bit transport layer, although in practice all known implementations
   are 8-bit clean.

   Since then, Usenet and NNTP have spread throughout the world.  In the
   absence of standards for handling the issues of language and
   character sets, countries, newsgroup hierarchies, and individuals
   have found a variety of solutions that work for them but are not
   necessarily appropriate elsewhere.  For example, some have adopted a
   default 8-bit character set appropriate to their needs (such as
   ISO8859-1 in Western Europe or KOI-8 in Russia), others have used
   ASCII (either US-ASCII or national variants) in headers but local 16-
   bit character sets in article bodies, and still others have gone for
   a combination of MIME [RFC2045] and UTF-8.  With the increased use of
   MIME in email, it is becoming more common to find NNTP articles
   containing MIME headers identifying the character set of the body,
   but this is far from universal.

   The resulting confusion does not help interoperability.

   One point that has been generally accepted is that articles can
   contain octets with the top bit set, and NNTP is only expected to
   operate on 8-bit clean transport paths.

10.2  This specification

   Part of the role of this present specification is to eliminate this
   confusion and promote interoperability as far as possible.  At the
   same time, it is necessary to accept the existence of the present
   situation and not gratuitously break existing implementations and
   arrangements, even if they are less than optimal.  Therefore the
   current practice described above has been taken into consideration in
   producing this specification.

   This specification extends NNTP from US-ASCII [ANSI1986] to UTF-8
   [RFC3629].  Except in the two areas discussed below, UTF-8 (which is
   a superset of ASCII) is mandatory and implementations MUST NOT use
   any other encoding.

   Firstly, the use of MIME for article headers and bodies is strongly
   recommended.  However, given widely divergent existing practices, an
   attempt to require a particular encoding and tagging standard would
   be premature at this time.  Accordingly, this specification allows
   the use of arbitrary 8-bit data in articles subject to the following
   requirements and recommendations.

   o  The names of headers (e.g.  "From" or "Subject") MUST be in US-
      ASCII.

   o  Header values SHOULD use US-ASCII or an encoding based on it such
      as RFC 2047 [RFC2047] until such time as another approach has been
      standardised. 8-bit encodings (including UTF-8) MAY be used but
      are likely to cause interoperability problems.

   o  The character set of article bodies SHOULD be indicated in the
      article headers, and this SHOULD be done in accordance with MIME.

   o  Where an article is obtained from an external source an
      implementation MAY pass it on, and derive data from it (such as
      the response to the HDR command), even though the article or the
      data does not meet the above requirements.  Implementations MUST
      transfer such articles and data correctly.  (Nevertheless, a
      client or server MAY elect not to post or forward the article if,
      after further examination of the article, it deems it
      inappropriate to do so.)

   This requirement affects the ARTICLE (Section 6.2.1), BODY
   (Section 6.2.3), HDR (Section 8.5), HEAD (Section 6.2.2), IHAVE
   (Section 6.3.2), OVER (Section 8.3), and POST (Section 6.3.1)
   commands.

   Secondly, the following requirements are placed on the newsgroups
   list returned by the LIST NEWSGROUPS (Section 7.6.6) command:

   o  Although this specification allows UTF-8 for newsgroup names, they
      SHOULD be restricted to US-ASCII until a successor to RFC 1036
      [RFC1036] standardises another approach. 8-bit encodings MAY be
      used but are likely to cause interoperability problems.

   o  The newsgroup description SHOULD be in US-ASCII or UTF-8 unless
      and until a successor to RFC 1036 standardised other encoding
      arrangements. 8-bit encodings other than UTF-8 MAY be used but are
      likely to cause interoperability problems.

   o  Implementations which obtain this data from an external source
      MUST correctly handle it even if it does not meet the above
      requirements.  Implementations (in particular, clients) MUST
      handle such data correctly.

10.3  Outstanding issues

   While the primary use of NNTP is for transmitting articles that
   conform to RFC 1036 (Netnews articles), it is also used for other
   formats (see Appendix A).  It is therefore most appropriate that
   internationalisation issues related to article formats be addressed
   in the relevant specifications.  For Netnews articles, this is any
   successor to RFC 1036.  For email messages, it is RFC 2822 [RFC2822].

   Of course, any article transmitted via NNTP needs to conform to this
   specification as well.

   Restricting newsgroup names to UTF-8 is not a complete solution.  In
   particular, when new newsgroup names are created or a user is asked
   to enter a newsgroup name, some form of canonicalisation will need to
   take place.  This specification does not attempt to define that
   canonicalization; servers are expected to match newsgroup names
   octet-by-octet for the time being.  Further work is needed in this
   area in conjunction with the article format specifications.

   In the meantime, any implementation experimenting with UTF-8
   newsgroup names is strongly cautioned that a future specification may
   require that those names be canonicalized when used with NNTP in a
   way that is not compatible with their experiments.

   Since the primary use of NNTP is with Netnews, and since newsgroup
   descriptions are normally distributed through specially formatted
   articles, it is recommended that the internationalisation issues
   related to them be addressed in any successor to RFC 1036.

-- 
Clive D.W. Feather  | Work:  <clive at demon.net>   | Tel:    +44 20 8495 6138
Internet Expert     | Home:  <clive at davros.org>  | Fax:    +44 870 051 9937
Demon Internet      | WWW: http://www.davros.org | Mobile: +44 7973 377646
Thus plc            |                            |



More information about the ietf-nntp mailing list