[ietf-nntp] :bytes metadata

Mon Dec 22 02:34:28 PST 2003

Andrew - Supernews said:
> That's a substantial implementation overhead - an implementation that
> stores articles in wire format never has any need to compute or store
> the "canonical" size of the article.

I can see the "never has any need" bit, but I don't see that it's a
substantial overhead. In order to calculate the :lines metadata, the server
has to parse the entire article anyway. Checking for leading dots is not a
big overhead on that.

> 1) The server stores data in a local format. e.g. "traditional" spool.
> The local format may be the canonical format, or it may be possible to
> derive the canonical size (or a close approximation thereof) from the
> local size (e.g. with a traditional spool in Unix text format, adding
> the linecount+7 to the local size will result in a bytecount which is
> off by only the number of optional header lines).

And this last number can be found just by scanning the article up to the
first empty line.

> 2) The server stores data in wire format. In this case, the canonical
> size differs from the stored size by a value equal to 3 + the number
> of lines requiring dot-stuffing. This is additional information that
> the server has no other reason to store.

Um, whence that 3? Or are you saying that the dot-CRLF is also stored?

Our text takes the view that that dot-CRLF is *not* part of the article,
and so it MUST not be included in any counts. Take this article:

====
Header: the only header

Body line 1.
. Body line 2.
====

As far as I'm concerned, :lines MUST be 2 and :bytes SHOULD be 57 but MAY
be 53. Even if we count dot-stuffing, that only changes 57 to 58.

> Since the client cannot rely on the precise accuracy in either case,

I don't see why not, particularly if limited to one session.

> But the issue isn't really about clusters, it's about the fact that
> things like the variation in Path headers means that there can be
> protocol-equivalent, but non-identical, copies of articles (which is
> not normally true of mail). The assumption that a server has or can
> have access to only one copy of a given article is a false one in the
> general case, though it's a natural assumption for people who have
> never had to consider the mechanics of large-scale systems.

It's not totally clear to me why these have to be non-identical. But, even
so, it seems reasonable to require only one copy to be accessed during any
one session.

-- 
Clive D.W. Feather  | Work:  <clive at demon.net>   | Tel:    +44 20 8495 6138
Internet Expert     | Home:  <clive at davros.org>  | *** NOTE CHANGE ***
Demon Internet      | WWW: http://www.davros.org | Fax:    +44 870 051 9937
Thus plc            |                            | Mobile: +44 7973 377646