ietf-nntp Commetns on draft-15.pdf

Thu Jan 3 04:33:44 PST 2002

In <20020102113110.H72355 at demon.net> "Clive D.W. Feather" <clive at demon.net> writes:

>  UTF-8-non-ascii = UTF8-2 / UTF8-3 / UTF8-4 / UTF8-5 / UTF8-6 
>  UTF8-1 = %x80-BF
>  UTF8-2 = %xC2-DF UTF8-1 
>  UTF8-3 = %xE0 %xA0-BF  UTF8-1 / %xE1-EF 2UTF8-1
>  UTF8-4 = %xF0 %x90-BF 2UTF8-1 / %xF1-F7 3UTF8-1
>  UTF8-5 = %xF8 %x88-BF 3UTF8-1 / %xF9-FB 4UTF8-1
>  UTF8-6 = %xFC %x84-BF 4UTF8-1 / %xFD 5UTF8-1

>* You can eliminate "surrogates" by changing one line of that:

>  UTF8-3 = %xE0 %xA0-BF UTF8-1 / %xE1-EC 2UTF8-1 /
>           %xED %x80-9F UTF8-1 / %xEE-EF 2UTF8-1

>The choice is ours !

The main advantage of giving a full syntax, such as the last one, is that
it strongly suggests how to construct a filter that will detect
non-correct usage of UTF-8 in injecting and reading agents (for example,
if someone tries to use 8859-1 in headers).

That is indeed more of an issue for USEFOR, which is why we went to the
trouble of doing it there (and we learned quite a lot about UTF-8 in the
process, which was a Good Thing).

However, now that the work has been done, it would make some sense to use
it again in the NNTP draft. Even though it matters less here, I think
people are going to notice the difference between the two documents and
try to read some sinister meaning into it, where none was intended.

However, I might be persuaded that eliminating the surrogates was a bridge
too far.

-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133   Web: http://www.cs.man.ac.uk/~chl
Email: chl at clw.cs.man.ac.uk      Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5