format: v1

name: control-archive
maintainer: Russ Allbery <eagle@eyrie.org>
version: 1.8.0
synopsis: processing and archiving of Netnews control messages

license:
  name: Expat
  notices: |
    This product includes software developed by UUNET Technologies, Inc.
copyrights:
  - holder: Russ Allbery <eagle@eyrie.org>
    years: 2002-2004, 2007-2014, 2016-2018
  - holder: Marco d'Itri
    years: '2001'
  - holder: UUNET Technologies, Inc.
    years: '1996'

build:
  install: false
  type: make
distribution:
  section: usenet
  tarname: control-archive
  version: control-archive
support:
  email: eagle@eyrie.org
  extra: |
    Configuration updates should be sent to usenet-config@isc.org.
  github: rra/control-archive
  web: https://www.eyrie.org/~eagle/software/control-archive/
vcs:
  browse: https://git.eyrie.org/?p=usenet/control.archive.git
  github: rra/control-archive
  type: Git
  url: https://git.eyrie.org/git/usenet/control-archive.git

quote:
  author: Gene Spafford
  text: |
    Usenet is like a herd of performing elephants with diarrhea — massive,
    difficult to redirect, awe-inspiring, entertaining, and a source of
    mind-boggling amounts of excrement when you least expect it.
docs:
  user:
    - name: control-summary
      title: control-summary manual page
    - name: export-control
      title: export-control manual page
    - name: generate-files
      title: generate-files manual page
    - name: process-control
      title: process-control manual page
    - name: update-control
      title: update-control manual page

blurb: |
  This software generates an INN control.ctl configuration file from
  hierarchy configuration fragments, verifies control messages using GnuPG
  where possible, processes new control messages to update a newsgroup list,
  archives new control messages, and exports the list of newsgroups in a
  format suitable for synchronizing the newsgroup list of a Netnews news
  server.  It is the software that maintains the control message and
  newsgroup lists available from ftp.isc.org.

description: |
  This package contains three major components:

  * All of the configuration used to generate a `control.ctl` file for INN
    and the `PGPKEYS` and `README.html` files distributed with pgpcontrol,
    along with the script to generate those files.

  * Software to process control messages, verify them against that
    authorization information, and maintain a control message archive and
    list of active newsgroups.  Software is also included to generate
    reports of recent changes to the list of active newsgroups.

  * The documentation files included in the control message archive and
    newsgroup lists on ftp.isc.org.

  Manual changes to the canonical newsgroup list are supported in a way that
  generates the same log messages and uses the same locking structure so
  that they can co-exist with automated changes and be included in the same
  reports.

  This is the software that generates the [active newsgroup
  lists](ftp://ftp.isc.org/pub/usenet/CONFIG/) and [control message
  archive](ftp://ftp.isc.org/pub/usenet/control/) hosted on ftp.isc.org, and
  the source of the `control.ctl` file provided with INN.

  For a web presentation of the information recorded here, as well as other
  useful information about Usenet hierarchies, please see the [list of
  Usenet managed hierarchies](http://usenet.trigofacile.com/hierarchies/).
    
requirements: |
  Perl 5.6 or later plus the following additional Perl modules are required:

  * Compress::Zlib (included in Perl 5.10 and later)
  * Date::Parse (part of TimeDate)
  * Net::NNTP (included in Perl 5.8 and later)
  * Text::Template

  [gzip](https://www.gnu.org/software/gzip/) and
  [bzip2](http://www.bzip.org/) are required.  Both are generally available
  with current operating systems, possibly as supplemental packages.

  process-control expects to be fed file names and message IDs of control
  messages on standard input and therefore needs to be run from a news
  server or some other source of control messages.  A minimalist news server
  like tinyleaf is suitable for this (I wrote tinyleaf, available as part of
  [INN](https://www.eyrie.org/~eagle/software/inn/), for this purpose).

sections:
  - title: Versioning
    body: |
      This package uses a three-part version number.  The first number
      will be incremented for major changes, major new functionality,
      incompatible changes to the configuration format (more than just
      adding new keys), or similar disruptive changes.  For lesser
      changes, the second number will be incremented for any change to the
      code or functioning of the software.  A change to the third part of
      the version number indicates a release with changes only to the
      configuration, PGP keys, and documentation files.
  - title: Layout
    body: |
      The configuration data is in one file per hierarchy in the `config`
      directory.  Each file has the format specified in FORMAT and is
      designed to be readable by INN's new configuration parser in case
      this can be further automated down the road.  The `config/special`
      directory contains overrides, raw `control.ctl` fragments that
      should be used for particular hierarchies instead of
      automatically-generated entries (usually for special comments).
      Eventually, the format should be extended to handle as many of these
      cases as possible.

      The `keys` directory contains the PGP public keys for every
      hierarchy that has one.  The user IDs on these keys must match the
      signer expected by the configuration data for the corresponding
      hierarchy.

      The `forms` directory contains the basic file structure for the
      three generated files.

      The `scripts` directory contains all the software that generates the
      configuration and documentation files, processes control messages,
      updates the database, creates the newsgroup lists, and generates
      reports.  Most scripts in that directory have POD documentation
      included at the end of the script, viewable by running perldoc on
      the script.

      The `templates` directory contains templates for the
      `control-summary` script.  These are the templates I use myself.
      Other installations should customize them.

      The `docs` directory contains the extra documentation files that are
      distributed from ftp.isc.org in the control message archive and
      newsgroup list directories, plus the DocKnot metadata for this
      package.
  - title: Installation
    body: |
      This software is set up to run from `/srv/control`.  To use a
      different location, edit the paths at the beginning of each of the
      scripts in the `scripts` directory to use different paths.  By
      default, copying all the files from the distribution into a
      `/srv/control` directory is almost all that's needed.  An install
      rule is provided to do this.  To install the software, run:

      ```sh
          make install
      ```

      You will need write access to `/srv/control` or permission to create
      it.

      `process-control` and `generate-files` need a GnuPG keyring
      containing all of the honored hierarchy keys.  To generate this
      keyring, run `make install` or:

      ```sh
          mkdir keyring
          gpg --homedir=keyring --allow-non-selfsigned-uid --import keys/*
      ```

      from the top level of this distribution.  `process-control` also
      expects a `control.ctl` file in `/srv/control/control.ctl`, which
      can be generated from the files included here (after creating the
      keyring as described above) by running `make install` or:

      ```sh
          scripts/generate-files
      ```

      Both of these are done automatically as part of `make install`.
      process-control expects `/srv/control/archive` to exist and archives
      control messages there.  It expects `/srv/control/tmp` to exist and
      uses it for temporary files for GnuPG control message verification.

      To process incoming control messages, you need to run
      `process-control` on each message.  `process-control` expects to
      receive, on standard input, lines consisting of a path to a file, a
      space, and a message ID.  This input format is designed to work with
      the tinyleaf server that comes with INN 2.5 and later, but it should
      also work as a channel feed from pre-storage-API versions of INN
      (1.x).  It will not work without modification via a channel feed
      from a current version of INN, since it doesn't understand the
      storage API and doesn't know how to retrieve articles by tokens.
      This could be easily added; I just haven't needed it.

      If you're using tinyleaf, here is the setup process:

      1. Create a directory that tinyleaf will use to store incoming
         articles temporarily, the archive directory, and the logs
         directory and install the software:

         ```sh
             make install
         ```

      2. Run tinyleaf on some port, configuring it to use that directory
         and to run process-control.  A typical tinyleaf command line
         would be:

         ```sh
             tinyleaf /srv/control/spool /srv/control/scripts/process-control
         ```

         I run tinyleaf using systemd, but any inetd implementation should
         work equally well.

      3. Set up a news feed to the system running tinyleaf that sends
         control messages of interest.  You should be careful not to send
         cancel control messages or you'll get a ton of junk in your logs.
         The INN newsfeeds entry I use is:

         ```
             isc-control:control,control.*,!control.cancel:Tf,Wnm:
         ```

         combined with nntpsend to send the articles.

      That should be all there is to it.  Watch the logs directory to see
      what happens for incoming messages.

      `scripts/process-control` just maintains a database file.  To export
      that data in a format that's useful for other software, run
      `scripts/export-control`.  This expects a `/srv/control/export`
      directory into which it stores active and newsgroups files, a copy
      of the `control.ctl` file, and all of the logs in a `LOGS`
      subdirectory.  This export directory can then be made available on
      the web, copied to another system, or whatever else is appropriate.
      Generally, `scripts/export-control` should be run periodically from
      cron.

      Reports can be generated using `scripts/control-summary`.  This
      script needs configuration before running; see the top of the script
      and its included POD documentation.  There is a sample template in
      the `templates` directory, and `scripts/weekly-report` shows a
      sample cron job for sending out a regular report.
  - title: Bootstrapping
    body: |
      This package is intended to provide all of the tools, configuration,
      and information required to duplicate the ftp.isc.org control
      message archive and newsgroup list service if you so desire.  To set
      up a similar service based on that service, however, you will also
      want to bootstrap from the existing data.  Here is the procedure for
      that:

      1. Be sure that you're starting from the latest software and set of
         configuration files.  I will generally try to make a new release
         after committing a batch of changes, but I may not make a new
         release after every change.  See the sections below for
         information about the Git repository in which this package is
         maintained.  You can always clone that repository to get the
         latest configuration (and then merge or cherry-pick changes from
         my repository into your repository as you desire).

      2. Download the current newsgroup list from:

             ftp://ftp.isc.org/pub/usenet/CONFIG/newsgroups.bz2

         and then bootstrap the database from it:

         ```sh
             bzip2 -dc newsgroups.bz2 | scripts/update-control bulkload
         ```

      3. If you want the log information so that your reports will include
         changes made in the ftp.isc.org archive before you created your
         own, copy the contents of
         ftp://ftp.isc.org/pub/usenet/CONFIG/LOGS/ into
         `/srv/control/logs`.

      4. If you want to start with the existing control message
         repository, download the contents of
         ftp://ftp.isc.org/pub/usenet/control/ into
         `/srv/control/archive`.  You can do this using a recursive
         download tool that understands FTP, such as wget, but please use
         the options that add delays and don't hammer the server to death.

      After finishing those steps, you will have a copy of the ftp.isc.org
      archive and can start processing control messages, possibly with
      different configuration choices.  You can generate the files that
      are found in ftp://ftp.isc.org/pub/usenet/CONFIG/ by running
      `scripts/export-control` as described above.
  - title: Maintenance
    body: |
      To add a new hierarchy, add a configuration fragment in the `config`
      directory named after the hierarchy, following the format of the
      existing files, and run `scripts/generate-files` to create a new
      `control.ctl` file.  See the documentation in
      `scripts/generate-files` for details about the supported
      configuration keys.

      If the hierarchy uses PGP-signed control messages, also put the PGP
      key into the `keys` directory in a file named after the hierarchy.
      Then, run:

      ```sh
          gpg --homedir=keyring --import keys/<hierarchy>
      ```

      to add the new key to the working keyring.

      The first user ID on the key must match the signer expected by the
      configuration data for the corresponding hierarchy.  If a hierarchy
      administrator sets that up wrong (usually by putting additional key
      IDs on the key), this can be corrected by importing the key into a
      keyring with GnuPG, using `gpg --edit-key` to remove the offending
      user ID, and exporting the key again with `gpg --export --ascii`.

      When adding a new hierarchy, it's often useful to bootstrap the
      newsgroup list by importing the current checkgroups.  To do this,
      obtain the checkgroups as a text file (containing only the groups
      without any news headers) and run:

      ```sh
          scripts/update-control checkgroups <hierarchy> < <checkgroups>
      ```

      where <hierarchy> is the hierarchy the checkgroups is for and
      <checkgroups> is the path to the checkgroups file.