Bif Design
This document attempts to describe why and how the Bif project management tool x does what it does.
0.1.0_10 (2014-04-17)
The development of Distributed Version Control System (DVCS) software enabled developers to discover the joys of working with a fully-featured, always available local repository. A number of benefits related to productivity (working when not connected) and efficiency (fast executing commands) appeared. Shortly afterwards developers re-discovered the pain of still having to use centralised, web-based bug/issue tracking systems, which kind of muted the joy somewhat.
The first open attempts at a Distributed Project Management System (DPMS) such as Bugs Everywhere, ScmBug, DisTract, DITrack, ticgit and ditz, were implemented on top of a DVCS. With a bit of hindsight, one can theorize that their failure to gain real traction was in part due to not understanding that DVCS and DPMS systems do not have the same information models. With time, most projects have also realized that there are users other than developers who need to interact with such a tool.
Later, DPMS systems were built on different models, but offered a non-UNIXy implementation (Fossil) or suffered documentation and implementation issues (Simple Defects - SD). As of late 2011 the Debian BTS (debbugs), SD and Launchpad (Canonical) appear to be the only systems that provide some kind of interproject cooperation (one issue with different status), but neither debbugs nor Launchpad are really distributed.
It was in this context that Bif was started in with the vague aim of doing something better. While I can say I was trying to learn from earlier efforts, I had nothing like the clarity of mind at the beginning that the above paragraphs imply. Like several others I actually started out building Bif on top of Git. This is no surprise because like everyone else I found the easy replication and plumbing toolset attractive. Experience finally taught me too that the Git model was the wrong one for this application, so after briefly playing with SQLite + Git I settled on SQLite alone with a purpose-built schema and a completely new synchronisation protocol.
Design Goals
The user manual says that Bif aims to be:
"... a distributed communication system that carries both
conversations and structured meta-data."
To put it another way, the goal of Bif is to provide users with a fully functional local issue tracker that interacts with remote instances as needed.
Note that some of the distributed requirements for a *project* tracking system are actually quite different to the distributed requirements for a *software* tracking system. DVCS focus on managing a multi-tentacled, pick-from-anywhere, many-versions-at-a-time set of changes to files. DPMS focus is on tracking items of work at the organisational or personal level.
- Command-line Interface (At Least)
Context switching from the shell to the browser is costly. Good engineering means that the CLI is anyway a thin layer over the database, meaning that adding other interfaces later is relatively easy.
CLI should be consistent, semi-similar to other CLI programs. Oh yeah, I almost forgot. The CLI should be responsive enough to be almost instantaneous. There will be no Moose in this CLI tool. Even though I upgraded my laptop recently with an SSD I still can't believe how much it affects the startup time in things like App::TimeTracker. If I release something like that the thoughts of the first (non-Perl-aware) user will be "Do we need to rewrite this in Go?"
- Powerful Querying
For management-style reports, for custom queries, for dealing with the whole interproject cooperation requirement. Users need to quickly see summaries of the current status as well as the change history.
- Distributed/Offline Operation
As much as possible, the tool should work everywhere that you can. In effect that means data replication.
- Fast Delta Synchronisation
There is no way that a sequential scan and check for matching rows in databases should be done each time a user wants to synchronise.
A RESTful object API just doesn't seem suitable either for working with large collections of objects like bugs or projects, and how does one not lose all the benefits associated with database transactions?
A project history is not a hierarchical tree al-la Git trees. Updates can be merged without needing to reparent anything.
- Universally Unique Identifiers
Necessary for exchanging updates between systems that have their own requirements for locally unique identifiers.
- Locally Unique Integers
We cannot subject users to identifiers that look like abb382f3c.
- Interproject Cooperation
If we are going to distribute things, we should do it properly. That means an issue can be tracked by multiple projects, and that each project could consider it to have a different status, and that each project can see the status in the other projects.
- No Universal Status
No reason that every project in the whole Bif ecosystem must use the same status types.
- Extreme Documentation
Aside from functionality, for this to be successful it has to be useable, approachable, and understandable. In many ways this comes down to the quality of the documentation.
Application Architecture
Bif is a Perl wrapper around an SQLite database. Commands are dispatched to Perl modules under the App::bif::* namespace by OptArgs. Execution happens like this:
The shell runs the
file, which due to the #! hashbang line results inperl
being executed on that file.The
script loads the Perl module OptArgs and calls theOptArgs::dispatch
function against the App::bif namespace.App::bif defines all of the subcommands, their arguments and options, which the dispatch function uses to dispatch to the appropriate App::bif::* module.
The App::bif::sub::command module
method is called.Sub-command classes use functions from App::bif::Context to discover the location of the repository, the user configuration, access the database, render output, generate errors and so on.
the program ends.
Database access
Each command uses either Bif::DB or Bif::DB::RW (based on DBIx::ThinSQL, DBI, and DBD::SQLite) to access the database.
As much as possible is done inside the database with the goal of ensuring data consistency regardless of what application is entering the data. Preparing for some other tool than Bif to be making modifications.
Fake SQLite function calls
SQLite does not have a built-in procedural programming language with a function calling interface. We cheat by defining BEFORE INSERT
triggers on normal tables that do their required work and then cancel the insert with a SELECT RAISE(IGNORE)
Transport & Synchronisation
Bif uses ssh to run the bifsync program on remote hosts when exchanging updates with a hub, or else calls bifsync directly when exchanging updates with a local repository. Regardless, it ends up being Bif::Client that is talking to Bif::Server, although most of the functionality is in the parent Bif::Role::Sync class.
Client/Server is a bit of a misnomer, as the protocol is actually about exchanging updates equally, and not particularly about a user needing a resource like HTTP verbs imply.
Data Model
Tables for current state of topics, table for updates to topics, tables to track meta data (Merkle trees).
Bif is not implementing a distributed database, or at least not in the classical sense where all nodes need to agree on what the "current" or "latest" values for objects are, based on some kind of consensus achieved real-time. What bif does is simply distribute *updates*. The state of a particular node is the result of the updates it has, and it doesn't care what the other nodes are doing, or when it will get missing updates. I.e. there is no consensus. This works because the users do not need a real-time global view of projects, in the same way they don't need real-time emails.
Updates, or Changesets
A Bif update can actually be composed of many operations in the database, but everything relates to a single row in the updates
table. The updates table has an integer primary key which is used for local operations and foreign key targets. It also has a 40 character Universally Unique ID (UUID).
The UUIDs of updates (same for UUIDs of topics) are SHA1 hashes calculated from the content of the update (or topic). This provides a builtin checksum mechanism that is useful during synchronisation to indicate a full and accurate transfer, and potentially simplifies signing updates in the future. The main purposes of the UUID however is for looking up local IDs when inserting updates with foreign key requirements.
Operations happen like this:
create a row in the updates table that identifies the author, time, timezone, message
add the changes in the *_updates tables for each topic
insert a row into func_merge_updates that calculates the hashes of everything.
Updates are immutable, and they can't be easily deleted from everywhere. For the moment at least. Possibly thinking about updates to an update...
Network operations
Basically just copies everything relating to a project from one repository to another.
Merkle tree synchronisation
There is a Merkle tree associated with every project, representing all of the changes contained therein.
A sync operation compares the tree from two repositories top-down, saving the updates missing from each one. The updates are then replayed in the correct order.
Mark Lawrence <>
Copyright & License
Copyright 2013-2014 Mark Lawrence <>
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.