[NNTP] NNTP Compression

Sabahattin Gucukoglu mail at sabahattin-gucukoglu.com
Sun Jan 10 06:18:25 PST 2010


On 10 Jan 2010, at 12:16, Ade Lovett wrote:
> On Jan 10, 2010, at 02:41 , Sabahattin Gucukoglu wrote:
>> From where I'm sitting, which is at this very moment trying to find a portable way to implement a version of the GigaNews Accelerator (I'll call it "GigaNoise"), it would be most welcome!  As it stands, I have to track what state the client and server are in, filter out capabilities, and whatnot, and all because the compression isn't clearly and obviously negotiated, and at what point I'll be getting compressed versus regular output (ARTICLE, for example, is still sent uncompressed).
> 
> That would be an issue you would need to take up with your NSP of choice, and not relevant to protocol discussion.

The details of Giganews' implementation are indeed an issue for Giganews, the point that I made, which was that simple negotiation is better than non-simple negotiation, makes sense in protocol discussion.  This simply takes needless logic of the "Upper layer" out of the discussion (though if compression ever happens it will probably mean servers have to restrict how it's used, so there'll probably have to be some way to tell the client what is and isn't acceptable).  Even then there are flip sides to this, as we know from SASL, mostly relating to post-negotiation logic.  Usual trials, usual tribulations.  But in any case it was just an illustration of how keeping state out of it is a good thing, and nothing's been set in stone yet.  I just think that when multiple NSPs do it differently, we better pay attention.

>> And I'm not sure precisely what it is about toggling compression that's so terrible, because you haven't explained it.
> 
> Simplicity.  Either a stream is negotiated compressed, or it isn't.  You mention having to track state as being an issue, now view that from the point of view of a server farm, dealing with tens of thousands of simultaneous connections, potentially flipping in and out of compression.  Why?  You have, essentially, unlimited connections, so use a couple for compression, and the rest normal.  Trivial state tracking on both ends.  If it is needed ...
> 
I can disable compression as easily as I enable it.  So can servers, which already manage countless clients' states.  This is an issue of implementation, not protocol design.  You can choose not to support the features if it hurts to do so.  I merely insist that the state tracking be straightforward, not requiring advance knowledge of when to expect compression and when not to and other such unspecified nonsense.  Perhaps it could be as simple as returning separate response codes to indicate whether the data following is compressed or not.

>> Look, I just spent half an hour downloading the headers of alt.binaries.cd.image.  I just reassured myself that the 400+ MB of raw XOVER information could, in fact, be reduced to the about 55MB that zlib deflate offers, and that it cuts the download time to about a tenth.  I'm sorry, mate, but you aren't telling me, now, that I can't find a use for this feature.
> 
> Sure you can.  In the degenerate case of exceptionally large binary groups, use one of the many indexer sites out there, pull the NZB file, and pull down the articles via message ID that you actually want.  Or autogenerate the DMCA takedown notice from the NZB file (code available on request for the right price).
> 
You're right, of course, that was just a convenient example -- but you're still passing the buck to suggest it only applies to binary groups.  It doesn't, in fact, since there are often as many retained articles on heavy text groups, retained for longer because their size allows for it.  In any event, it's a failure case that can be addressed using compression.

>> Ironically, the only time I don't consider compression is when I'm on my 400MHz Pentium-II box; then the overhead of decompression fails utterly to overtake the raw 20mbit connection I have here (of which probably only a fraction is used, both because of my ISP's utter failure to deliver as promised and because of basic properties of TCP).
> 
> And this is where we go back to the age old "server vs client author wars" :)  Whilst I'm sure your home system has more than enough capability to hold terabytes of data, with on-the-fly compression, for _home_ needs, scale that up to what you're talking to on the other end, and it becomes non-trivial very very quickly.
> 
I have minimal storage requirements and fairly low-end hardware, because most of my subscriptions are text groups, and until recently I wasn't in for large, modern disks.  I'm almost not on fortunate income, and subscribe to the lowest tier my NSP offers.  My NSP, on the other hand, is still looking for customers (many of them ISPs or x-ISP customers whose original Usenet service forsook them) because all the binaries it carries, plus the support that it has for compressing headers, apparently doesn't bother it in the least.  It keeps getting bigger and hungrier and more aggressive, it consistently maintains its top spot on top1000.org, it's always favourite among the binary downloader darlings - in short, it defies anybody (and I know how much of an advertisement this is turning into) to challenge its storage and CPU requirements by suggesting that it is incapable of selling what it markets at premium prices. :-)

>> I will paste the Tcl script that did the tests.
> 
> Sadly, irrelevant.
> 
> 1.  Show a _real_world_ case, of relevance to the protocol in its entirety, and not a subset, where compression is useful at all, given the major burden it will put on the server side.
> 
> 2.  Show a case why compression (in whatever form) should be able to be toggled within a single stream.

I think these are answered by the above, except to note that GigaNews only does compression on headers, not article bodies.  The toggling is thus implicit, and probably more optimal than an across-the-board compression.  However, we need a simple, uniform way to control compression, and a simple way to get the client and server to agree on what and when, perhaps under the server's control using return codes (as above).

Cheers,
Sabahattin



More information about the ietf-nntp mailing list