[NNTP] Internationalisation, attempt 2
Clive D.W. Feather
clive at demon.net
Fri Apr 29 00:18:19 PDT 2005
Okay, here's the second attempt.
10. Internationalisation Considerations
10.1 Introduction and historical situation
RFC 977 [RFC977] was written at a time when internationalisation was
not seen as a significant issue. As such, it was written on the
assumption that all communication would be in ASCII and use only a
7-bit transport layer, although in practice all known implementations
are 8-bit clean.
Since then, Usenet and NNTP have spread throughout the world. In the
absence of standards for handling the issues of language and
character sets, countries, newsgroup hierarchies, and individuals
have found a variety of solutions that work for them but are not
necessarily appropriate elsewhere. For example, some have adopted a
default 8-bit character set appropriate to their needs (such as
ISO8859-1 in Western Europe or KOI-8 in Russia), others have used
ASCII (either US-ASCII or national variants) in headers but local 16-
bit character sets in article bodies, and still others have gone for
a combination of MIME [RFC2045] and UTF-8. With the increased use of
MIME in email, it is becoming more common to find NNTP articles
containing MIME headers identifying the character set of the body,
but this is far from universal.
The resulting confusion does not help interoperability.
One point that has been generally accepted is that articles can
contain octets with the top bit set, and NNTP is only expected to
operate on 8-bit clean transport paths.
10.2 This specification
Part of the role of this present specification is to eliminate this
confusion and promote interoperability as far as possible. At the
same time, it is necessary to accept the existence of the present
situation and not gratuitously break existing implementations and
arrangements, even if they are less than optimal. Therefore the
current practice described above has been taken into consideration in
producing this specification.
This specification extends NNTP from US-ASCII [ANSI1986] to UTF-8
[RFC3629]. Except in the two areas discussed below, UTF-8 (which is
a superset of ASCII) is mandatory and implementations MUST NOT use
any other encoding.
Firstly, the use of MIME for article headers and bodies is strongly
recommended. However, given widely divergent existing practices, an
attempt to require a particular encoding and tagging standard would
be premature at this time. Accordingly, this specification allows
the use of arbitrary 8-bit data in articles subject to the following
requirements and recommendations.
o The names of headers (e.g. "From" or "Subject") MUST be in US-
ASCII.
o Header values SHOULD use US-ASCII or an encoding based on it such
as RFC 2047 [RFC2047] until such time as another approach has been
standardised. 8-bit encodings (including UTF-8) MAY be used but
are likely to cause interoperability problems.
o The character set of article bodies SHOULD be indicated in the
article headers, and this SHOULD be done in accordance with MIME.
o Where an article is obtained from an external source an
implementation MAY pass it on, and derive data from it (such as
the response to the HDR command), even though the article or the
data does not meet the above requirements. Implementations MUST
transfer such articles and data correctly. (Nevertheless, a
client or server MAY elect not to post or forward the article if,
after further examination of the article, it deems it
inappropriate to do so.)
This requirement affects the ARTICLE (Section 6.2.1), BODY
(Section 6.2.3), HDR (Section 8.5), HEAD (Section 6.2.2), IHAVE
(Section 6.3.2), OVER (Section 8.3), and POST (Section 6.3.1)
commands.
Secondly, the following requirements are placed on the newsgroups
list returned by the LIST NEWSGROUPS (Section 7.6.6) command:
o Although this specification allows UTF-8 for newsgroup names, they
SHOULD be restricted to US-ASCII until a successor to RFC 1036
[RFC1036] standardises another approach. 8-bit encodings MAY be
used but are likely to cause interoperability problems.
o The newsgroup description SHOULD be in US-ASCII or UTF-8 unless
and until a successor to RFC 1036 standardised other encoding
arrangements. 8-bit encodings other than UTF-8 MAY be used but are
likely to cause interoperability problems.
o Implementations which obtain this data from an external source
MUST correctly handle it even if it does not meet the above
requirements. Implementations (in particular, clients) MUST
handle such data correctly.
10.3 Outstanding issues
While the primary use of NNTP is for transmitting articles that
conform to RFC 1036 (Netnews articles), it is also used for other
formats (see Appendix A). It is therefore most appropriate that
internationalisation issues related to article formats be addressed
in the relevant specifications. For Netnews articles, this is any
successor to RFC 1036. For email messages, it is RFC 2822 [RFC2822].
Of course, any article transmitted via NNTP needs to conform to this
specification as well.
Restricting newsgroup names to UTF-8 is not a complete solution. In
particular, when new newsgroup names are created or a user is asked
to enter a newsgroup name, some form of canonicalisation will need to
take place. This specification does not attempt to define that
canonicalization; servers are expected to match newsgroup names
octet-by-octet for the time being. Further work is needed in this
area in conjunction with the article format specifications.
In the meantime, any implementation experimenting with UTF-8
newsgroup names is strongly cautioned that a future specification may
require that those names be canonicalized when used with NNTP in a
way that is not compatible with their experiments.
Since the primary use of NNTP is with Netnews, and since newsgroup
descriptions are normally distributed through specially formatted
articles, it is recommended that the internationalisation issues
related to them be addressed in any successor to RFC 1036.
--
Clive D.W. Feather | Work: <clive at demon.net> | Tel: +44 20 8495 6138
Internet Expert | Home: <clive at davros.org> | Fax: +44 870 051 9937
Demon Internet | WWW: http://www.davros.org | Mobile: +44 7973 377646
Thus plc | |
More information about the ietf-nntp
mailing list