ietf-nntp Overview database

Charles Lindsey chl at clw.cs.man.ac.uk
Tue Mar 25 06:11:11 PST 2003


In <yl7kaoidrc.fsf at windlord.stanford.edu> Russ Allbery <rra at stanford.edu> writes:

>It looks like I wasn't remembering correctly.  Charles, do you know if C
>News comes with anything?

All I find in CNews is the following:










     The Design	of a Common Newsgroup Overview Database
		      for Newsreaders


		       Geoff Collyer

		    Software Tool & Die


			  ABSTRACT

	  Every	new newsreader seems  to  come	with  a
     requirement  for  a  private  database  of	tens of
     megabytes of article headers.  Some of these data-
     base  maintenance programs	are inordinately costly
     to	run.  We present the design and	rationale of  a
     newsreader	 database  that	is sharable by multiple
     newsreaders and relatively	cheap to maintain.



Background

     Two of the	most popular newsreaders, nn and  trn,	have
been  around  for several years	and each has its own private
binary database	of article headers (and	other  information),
which  are  intended  to  avoid	 the considerable expense of
opening	all the	articles in a newsgroup	in order to  present
a  menu	 of  choices to	the user.  nn's	database maintainer,
nnmaster, has been getting faster over the years and appears
not  to	 be  much  of  a load on the system.  trn's database
maintainer, mthreads, manages to consume vast quantities  of
both  CPU  and	disk  bandwidth.  Neither database format is
documented, even by comments  in  the  source  code  of	 its
maintenance program (and trn's is truly	bizarre).  Since the
maintenance programs tended to be  slow	 and  run  asynchro-
nously	with  the  rest	 of the	news transport,	new articles
tended to be unavailable for  some  time  after	 arrival  to
users of these news readers.  The binary nature	of the data-
base files means that access over  networks  involves  byte-
swapping  and  dealing	with  the differing sizes of various
data types across machine architectures.

     To	add to this discouraging scenario, new	newsreaders,
such  as  tin  and  tass, have been appearing more recently,
requiring their	own private  databases.	  Clearly  something
had  to	 be done before	private	databases dwarfed the actual
news spool and their maintenance programs consumed  most  of
the resources of their host systems.





		       March 25, 2003





			   - 2 -


The New	Scheme

     relaynews (the C News analogue of B News's	 rnews)	 has
the  article headers in	hand during processing of that arti-
cle, so	having it simply write a stream	of all	the  article
headers	 onto  the  end	of a file is sufficient	to make	that
information available  cheaply	and  without  making  policy
decisions  about  which	 headers  to omit.  Writing a little
program	that massages that stream into a more  compact	for-
mat, and updates the common database, completes	the updating
of the database.  A nightly  expire  that  deletes  obsolete
database entries completes database maintenance.

     The database itself consists of a	text  file  in	each
news  spool  directory	with a fixed name (.overview).	Each
line of	such  a	 file  consists	 of  commonly-needed  header
fields,	 separated  by	tabs.  There is	provision for exten-
sions beyond the commonly-needed set, and these	require	only
cook-book  changes  to	the program that massages the header
stream.	 Experimentation suggests that on-the-fly  threading
by  References:	  headers  is  cheap enough that there is no
benefit	to storing threading information  in  the  database,
thereby	 avoiding  the	costly	updating  of  said threading
information.

     A library to read and thread the overview files is	pro-
vided.

Comparison of the Old and New Schemes

     nn	and trn	have successfully had their database-reading
routines  replaced  by	calls  on the common database reader
library; most of the work here was  emulating  the  peculiar
assumptions  each  reader  made	 about	the work done by its
database maintainer.  Somewhat less work  was  understanding
the  interface presented by the	database-reading routines to
the rest of the	reader.	 The new versions of  these  readers
appear to consume somewhat more	memory,	but this may be	cur-
able by	someone	more knowledgable of the readers  internals,
and seems a small price	to pay for the ability to use a	com-
mon database.  vnews has  had  thread-following	 added;	 the
work here was primarily	understanding the workings of vnews.

     It	seems that the new database and	its  access  library
are sufficient to meet the needs of modern newsreaders.	 The
new database has no byte-ordering  problems  and  should  be
easier	to access over networks	than the old databases.	 The
new database is	updated	after each relaynews  invocation  in
newsrun, so articles are available to users quickly, and the
incremental database updates seem  to  be  cheap.   The	 new
database  format is extensible,	so future demands should not
strain the database format.

     The old private databases were generally not accessible



		       March 25, 2003





			   - 3 -


via  NNTP (unless one added the	trn XTHREAD modifications to
one's NNTP server), so the databases were generally exported
via NFS.  This seems like a sensible way to proceed and	sim-
ply exporting /usr/spool/news  via  NFS	 (read-only  if	 you
like) will export the new database too.	 However, it is	pos-
sible that the NNTP v2 or NNRP committees may make  the	 new
database accessible via	NNTP or	NNRP.


















































		       March 25, 2003



-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133   Web: http://www.cs.man.ac.uk/~chl
Email: chl at clw.cs.man.ac.uk      Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5



More information about the ietf-nntp mailing list