ietf-nntp Overview database
Charles Lindsey
chl at clw.cs.man.ac.uk
Tue Mar 25 06:11:11 PST 2003
In <yl7kaoidrc.fsf at windlord.stanford.edu> Russ Allbery <rra at stanford.edu> writes:
>It looks like I wasn't remembering correctly. Charles, do you know if C
>News comes with anything?
All I find in CNews is the following:
The Design of a Common Newsgroup Overview Database
for Newsreaders
Geoff Collyer
Software Tool & Die
ABSTRACT
Every new newsreader seems to come with a
requirement for a private database of tens of
megabytes of article headers. Some of these data-
base maintenance programs are inordinately costly
to run. We present the design and rationale of a
newsreader database that is sharable by multiple
newsreaders and relatively cheap to maintain.
Background
Two of the most popular newsreaders, nn and trn, have
been around for several years and each has its own private
binary database of article headers (and other information),
which are intended to avoid the considerable expense of
opening all the articles in a newsgroup in order to present
a menu of choices to the user. nn's database maintainer,
nnmaster, has been getting faster over the years and appears
not to be much of a load on the system. trn's database
maintainer, mthreads, manages to consume vast quantities of
both CPU and disk bandwidth. Neither database format is
documented, even by comments in the source code of its
maintenance program (and trn's is truly bizarre). Since the
maintenance programs tended to be slow and run asynchro-
nously with the rest of the news transport, new articles
tended to be unavailable for some time after arrival to
users of these news readers. The binary nature of the data-
base files means that access over networks involves byte-
swapping and dealing with the differing sizes of various
data types across machine architectures.
To add to this discouraging scenario, new newsreaders,
such as tin and tass, have been appearing more recently,
requiring their own private databases. Clearly something
had to be done before private databases dwarfed the actual
news spool and their maintenance programs consumed most of
the resources of their host systems.
March 25, 2003
- 2 -
The New Scheme
relaynews (the C News analogue of B News's rnews) has
the article headers in hand during processing of that arti-
cle, so having it simply write a stream of all the article
headers onto the end of a file is sufficient to make that
information available cheaply and without making policy
decisions about which headers to omit. Writing a little
program that massages that stream into a more compact for-
mat, and updates the common database, completes the updating
of the database. A nightly expire that deletes obsolete
database entries completes database maintenance.
The database itself consists of a text file in each
news spool directory with a fixed name (.overview). Each
line of such a file consists of commonly-needed header
fields, separated by tabs. There is provision for exten-
sions beyond the commonly-needed set, and these require only
cook-book changes to the program that massages the header
stream. Experimentation suggests that on-the-fly threading
by References: headers is cheap enough that there is no
benefit to storing threading information in the database,
thereby avoiding the costly updating of said threading
information.
A library to read and thread the overview files is pro-
vided.
Comparison of the Old and New Schemes
nn and trn have successfully had their database-reading
routines replaced by calls on the common database reader
library; most of the work here was emulating the peculiar
assumptions each reader made about the work done by its
database maintainer. Somewhat less work was understanding
the interface presented by the database-reading routines to
the rest of the reader. The new versions of these readers
appear to consume somewhat more memory, but this may be cur-
able by someone more knowledgable of the readers internals,
and seems a small price to pay for the ability to use a com-
mon database. vnews has had thread-following added; the
work here was primarily understanding the workings of vnews.
It seems that the new database and its access library
are sufficient to meet the needs of modern newsreaders. The
new database has no byte-ordering problems and should be
easier to access over networks than the old databases. The
new database is updated after each relaynews invocation in
newsrun, so articles are available to users quickly, and the
incremental database updates seem to be cheap. The new
database format is extensible, so future demands should not
strain the database format.
The old private databases were generally not accessible
March 25, 2003
- 3 -
via NNTP (unless one added the trn XTHREAD modifications to
one's NNTP server), so the databases were generally exported
via NFS. This seems like a sensible way to proceed and sim-
ply exporting /usr/spool/news via NFS (read-only if you
like) will export the new database too. However, it is pos-
sible that the NNTP v2 or NNRP committees may make the new
database accessible via NNTP or NNRP.
March 25, 2003
--
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133 Web: http://www.cs.man.ac.uk/~chl
Email: chl at clw.cs.man.ac.uk Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9 Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5
More information about the ietf-nntp
mailing list