[ietf-nntp] :bytes metadata

Andrew - Supernews andrew at supernews.net
Thu Dec 18 17:16:10 PST 2003


>>>>> "Ken" == Ken Murchison <ken at oceana.com> writes:

 >> (1) In a cluster system, the number of octets in the article may
 >> vary between cluster members (e.g. because folding happened
 >> differently or because they have different Path: headers).
 >> (2) Should dot-stuffing be counted?
 >> (3) Should the final '.' CRLF be counted?

 >> In other words, do we see :bytes as indicating the size of the
 >> article in "canonical" form, or the number of octets that will
 >> come down the line in response to ARTICLE?

 >> My inclination is to say that the answers to (2) and (3) are "no";
 >> the value is a storage octet count rather than a wire octet count.

 Ken> Exactly.  Dot stuffing and the final '.' are related to how the
 Ken> content gets tranferred, NOT the content itself.  If an
 Ken> implementation decides to store articles in wire-format, they
 Ken> MUST NOT include the extra '.'s in the calculation.

That's a substantial implementation overhead - an implementation that
stores articles in wire format never has any need to compute or store
the "canonical" size of the article.

There are three possible sizes of an article: the size as stored
locally, the canonical size, and the wire-format size. There are two
common cases:

1) The server stores data in a local format. e.g. "traditional" spool.
The local format may be the canonical format, or it may be possible to
derive the canonical size (or a close approximation thereof) from the
local size (e.g. with a traditional spool in Unix text format, adding
the linecount+7 to the local size will result in a bytecount which is
off by only the number of optional header lines). There is likely to
be no easy way to derive the wire-format size without additional
information.

2) The server stores data in wire format. In this case, the canonical
size differs from the stored size by a value equal to 3 + the number
of lines requiring dot-stuffing. This is additional information that
the server has no other reason to store.

So from the server's point of view, the easiest solution is to say
that the :bytes header should be _either_ the canonical size or the
wire-format size (but should not be any other local size - if the
server stores articles in some other format, it should correct :bytes
to be as close as possible to either the canonical or wire sizes).
Since the client cannot rely on the precise accuracy in either case,
there seems to be no obvious reason why one of these formats should be
mandated and the other forbidden.

 Ken> I'd say (A).  A cluster is a specific implementation and we
 Ken> shouldn't munge the protocol to account for idiosyncrasies
 Ken> specific to them.

 Ken> IMO all members of a cluster should be configured and perform
 Ken> identically, whether it be article canonicalization/storage or
 Ken> article expiration, which would make this a non-issue.  All
 Ken> members of a cluster should present a unified via of the
 Ken> newsspool, a client shouldn't have to deal with cluster specific
 Ken> junk.  You couldn't get away with this in an IMAP cluster.

IMAP clusters don't normally have to handle multi-gigabit loads.

But the issue isn't really about clusters, it's about the fact that
things like the variation in Path headers means that there can be
protocol-equivalent, but non-identical, copies of articles (which is
not normally true of mail). The assumption that a server has or can
have access to only one copy of a given article is a false one in the
general case, though it's a natural assumption for people who have
never had to consider the mechanics of large-scale systems.

-- 
Andrew, Supernews
http://www.supernews.com




More information about the ietf-nntp mailing list