ietf-nntp Dealing with internationalization in NNTP

Martin J. Dürst mduerst at ifi.unizh.ch
Wed Oct 22 12:22:43 PDT 1997


On Wed, 22 Oct 1997, Brian Hernacki wrote:

> Martin J. Dürst wrote:
> > > A couple of issues I'd like to start with are:
> > >
> > > o the charset
> > >
> > > The current 977bis draft includes the CHARSET command to allow the
> > > client and server to negotiate a charset. A couple folks (including me)
> > > have posted as to why we think this is a bad idea. I would much rather
> > > see us defined something like UTF8 as the default charset used. While I
> > > heard from people who agreed with me on this, I didn't hear any
> > > objections. Is this OK? Do poeple think this would be a bad thing? If
> > > not should we change the draft?
> > 
> > As far as I understand, this only refers to protocol elements
> > (parameters) in the NNTP protocol itself, not to the news articles
> > themselves? I don't have much of an idea of what protocol elements
> > are used and which of these need or may benefit from internationalization
> > or localization (by the way, these words can be shortened to i18n
> > and l10n). Also, I think there is some overlap with usenet, e.g.
> > in respect to newsgroup names.
> 
> It affects things like error strings and search tokens which are
> strictly protocol. I'll check the usenet-format stuff to see how they
> are handling NG names.

There is a tendency to go towards UTF-8, too, so this fits well.
But there are also local experiments with iso-8859-1. That's
one of the things why I would advocate: As clearly UTF-8 only
as possible, but don't make software break if it's something
else. Similar things where done in FTP, but it is true that
there, there was much more of a legacy problem than with NG names.


> > I would definitely prefer to use UTF-8 only for these things, but
> > UTF-8 should be prescribed in a way that doesn't completely
> > forbid local existing customs. For examlpe, an NNTP server
> > should not refuse a command with a newsgroup name or something
> > else just because it does not meet the syntactic constraints
> > that an UTF-8 octet sequence does.
> 
> The current standard is ASCII only so I don't see a backwards
> compatibility problem.

It's not only what the standard says. The FTP standard also was
ASCII only, but people used all kinds of other things for filenames.
The solution there was basically to say that it's okay to use
other things among "private parties", but for the Internet as a
whole, it's UTF-8. Such a policy was facilitated by the fact that
it's fairly easy and safe to identify UTF-8 (for some details,
see my report at
	http://www.ifi.unizh.ch/mml/mduerst/papers.html#IUC11-UTF-8).


> Are suggesting a future revision might want to
> use a different local encoding (like SJIS)?

It's more iso-8859-1 than SJIS; in Japan, there is also EUC
and JIS, which keeps them pretty much to ASCII for such things.

Explicitly allowing it would make things much more complicated
(CHARSET,...). But to some extent, it might be tolerated.


> I'd be inclinded to have
> 977bis forbid that. It would just generate the kind of interoperability
> problems we're trying to solve.

Of course, interoperability is a core point. Interoperability in
NNTP and for NG names clearly means something different (long-term,
very wide-reaching) than for FTP (short-time, point-to-point).
Also, there migth be less faits accomplis in NNTP than in FTP.
You and others are probably in a much better position to judge
that than me.


> > > o a language specification/negotiation extension
> > The IMAP model can definitely be used. For more information, please
> > also see Harald Alvestrand's draft about the IETF charset policy,
> > which contains quite a few recommendations about this topic (e.g.
> > default language stuff which can be highly political,...).
> 
> Yeah. I kept up on that. I don't really have too much of a problem with
> it. The biggest constraint is that any new IETF standard must "deal"
> with charset in a definitive way. If we make the requirement that NNTP
> is UTF-8 we cover that.

Well, I think it really shouldn't be seen as "we have to do something,
so that it looks as if we did something". If after serious considerations,
we came to the conclusion that for NNTP, nothing in that direction at
all is needed, then I think that should be fine. But at present, I don't
see the arguments for such a procedure.


Regards,	Martin.




More information about the ietf-nntp mailing list