[NNTP] Internationalisation, attempt 2

Clive D.W. Feather clive at demon.net
Wed May 4 01:07:23 PDT 2005


Matthias Andree said:
>>    RFC 977 [RFC977] was written at a time when internationalisation was
>>    not seen as a significant issue.
> That's a pretty friendly wording of US American i18n ignorance. Nevermind.

Not ignorance, disinterest.

>>    As such, it was written on the
>>    assumption that all communication would be in ASCII and use only a
>>    7-bit transport layer, although in practice all known implementations
>>    are 8-bit clean.
> 
> Suggest to replace "all known implementations" by "most known
> implementations" -- unless you can produce proof you've known all at the
> time of writing.

All the ones known to me are.

>>    (such as
>>    ISO8859-1 in Western Europe or KOI-8 in Russia), others have used
> 
> This should be spaced "ISO 8859-1" or even better use the official IANA
> spelling "ISO-8859-1". I haven't checked on KOI-8.

ISO themselves call it "ISO/IEC 8859-1", so that's what I'll use.

The terms appear to be "KOI-8" but "KOI8-R" (or other suffices).

>>    This specification extends NNTP from US-ASCII [ANSI1986] to UTF-8
>>    [RFC3629].  Except in the two areas discussed below, UTF-8 (which is
>>    a superset of ASCII)
> This should probably be US-ASCII not just ASCII.

True.

>>    o  Header values SHOULD use US-ASCII or an encoding based on it such
>>       as RFC 2047 [RFC2047] until such time as another approach has been
>>       standardised. 8-bit encodings (including UTF-8) MAY be used but
>>       are likely to cause interoperability problems.
> 
> I don't think this is strong enough. 8-bit encoded Subjects are almost
> guaranteed to cause trouble, I'd think the paragraph should only allow
> UTF-8 and RFC-2047.

However, other encodings happen in real life and UTF-8 is as likely to
cause problems as any other.

> Suggestion:
> 
> " Header values SHOULD use US-ASCII or an encoding based on it such
>   as RFC 2047 [RFC2047] until such time as another approach has been
>   standardised. 8-bit encodings other than UTF-8 SHOULD NOT be used,
>   because they are likely to cause interoperability problems."

The SHOULD NOT is probably better wording, but I'm not willing to call out
UTF-8 as special in this situation.

>>    o  The character set of article bodies SHOULD be indicated in the
>>       article headers, and this SHOULD be done in accordance with
>>    MIME.
> Please consider "MUST" for the second "SHOULD". I see no sane solution
> other than MIME at this time, and MIME is the widest spread.

But it's not universal. We don't know enough about usage in (say) Japan to
say whether that MUST will be another thing for users to ignore.

-- 
Clive D.W. Feather  | Work:  <clive at demon.net>   | Tel:    +44 20 8495 6138
Internet Expert     | Home:  <clive at davros.org>  | Fax:    +44 870 051 9937
Demon Internet      | WWW: http://www.davros.org | Mobile: +44 7973 377646
Thus plc            |                            |



More information about the ietf-nntp mailing list