NAME
Git::FastExport::Stitch - Stitch together multiple git fast-export streams
VERSION
version 0.09
SYNOPSIS
# create a new stitch object
my $export = Git::FastExport::Stitch->new();
# stitch in several git fast-export streams
# a git directory
$export->stitch( A => 'A' );
# a Git repository object
$export->stitch( Git->repository( Directory => 'B' ) => 'B' );
# a Git::FastExport object
$export->stitch( Git::FastExport->new('C') => 'C' );
# output the stitched stream
while ( my $block = $export->next_block() ) {
print $block->as_string();
}
DESCRIPTION
Git::FastExport::Stich is a module that "stitches" together several git fast-export streams. This module is the core of the git-stitch-repo utility.
Git::FastExport::Stitch objects can be used as Git::FastExport, since they support the same inteface for the next_block()
method.
METHODS
Git::FastExport::Stitch supports the following methods:
- new( \%options, [ ... ] )
-
Create a new Git::FastExport::Stitch object.
The options hash defines options that will be used during the creation of the stitched repository.
The select option defines the selection algorithm to be used when the last alien child algorithm reaches a branch point. Valid values are:
first
,last
andrandom
. The default value islast
.See "STITCHING ALGORITHM" for details about what these options really mean.
The remaining parameters (if any) are taken to be parameters (passed by pairs) to the
stitch()
method. - stitch( $repo, $dir )
-
Add the given
$repo
to the list of repositories to stitch in.$repo
can be either a directory, or a Git object (both will be used to instantiate a Git::FastExport object) or directly a Git::FastExport object.The optional
$dir
parameter will be used as the relative directory under which the trees of the source repository will be stored in the stitched repository. - next_block()
-
Return the next block of the stitched repository, as a Git::FastExport::Block object.
Return nothing at the end of stream.
STITCHING ALGORITHM
Commit attachment
Git::FastExport::Stitch processes the input commits in --date-order fashion, and builds the new graph by attaching the new commit to another commit of the graph being constructed. It starts from the "original" parents of the node, and tries do follow the graph as far as possible.
When a commit has several suitable child commits, it needs to make a selection. There are currently three selection algorithms:
- last
-
Pick the last child commit, i.e. the most recent one. This is the default.
- first
-
Pick the first child commit, i.e. the oldest one.
- random
-
Pick a random child.
Example
Imagine we have two repositories A and B that we want to stitch into a repository C so that all the files from A are in subdirectory A and all the files from B are in subdirectory B.
Note: in the following ASCII art graphs, horizontal order is chronological.
Repository A:
,topic ,master
,-A3------A5--A6
/ /
A1--A2------A4'
Branch master points to A6 and branch topic points to A3.
Repository B:
,topic ,master
,-B3------B5------B7--B8
/ /
B1--B2------B4------B6'
Branch master points to B8 and branch topic points to B5.
The RESULT repository should preserve chronology, commit relationships and branches as much as possible, while giving the impression that the directories A/ & B/ did live side-by-side all the time.
Assuming additional timestamps not shown on the above graphs (the commit order is A1, B1, A2, B2, A3, A4, B3, B4, A5, B5, B6, B7, B8, A6), Git::FastExport::Stitch will produce a git-fast-import stream that will create the following history, depending on the value of --select:
- last (default)
-
,topic-B ,-B3----------B5----. / \ ,master-B A1--B1--A2--B2------A4------B4--A5------B6--B7---B8--A6 \ / `master-A `-A3------------' `topic-A
- first
-
,---------B4----------B6-. / \ ,master-B A1--B1--A2--B2--A3------B3------A5--B5------B7---B8--A6 \ `topic-A / `topic-B `master-A `-----A4--------'
- random
-
In this example, there are only two places where the selection process is triggered, and there are only two items to choose from each time. Therefore the random selection algorithm will produce 4 possible different results.
In addition to the results shown above (
last+last
andfirst+first
), we can also obtain the two following graphs:first+last
:,topic-A ,master-B A1--B1--A2--B2--A3--------------A5------B6--B7---B8--A6 \ / / `master-A `-----A4------B4' B5----' \ / `topic-B `-B3--------'
last+first
:,master-B A1--B1--A2--B2------A4------B4----------B6--B7---B8--A6 \ \ / `master-A \ `-B3------A5--B5----' \ / `topic-B A3------------' `topic-A
Constraints of the stitching algorithm
Any mathematician will tell you there are many many ways to stitch two DAG together. This programs tries very hard not to create inconsistent history with regard to each input repository.
The algorithm used by Git::FastExport::Stitch enforces the following rules when building the resulting repository:
a commit is attached as far as possible in the DAG, starting from the original parent
a commit is only attached to another commit in the resulting repository that has exactly the same ancestors list as the original parent commits.
when there are several valid branches to follow when trying to find a commit to attach to, use the selection process (last or first commit (at the time of attachement), or random commit)
branches starting from the same commit in a source repository will start from the same commit in the resulting repository (this particular rule can be lifted: adding an option for this in on the TODO list)
BUGS & IMPROVEMENTS
The current implementation can probably be improved, and more options added. I'm very interested in test repositories that do not give the expected results.
INTERNAL METHODS
To run the stitching algorithm, Git::FastExport::Stitch makes use of several internal methods. These are not part of the public interface of the module, and are detailed below for those interested in the algorithm itself.
- _translate_block( $repo )
-
Given a repo key in the internal structure listing all the repositories to stitch together, this method "translates" the current block using the references (marks) of the resulting repository.
To ease debugging, the translated mark count starts at
1_000_000
. - _add_parents( $node, @parents )
-
Add the given parents to the node, and update the internal structure containing the node lineage.
- _last_alien_child( $node, $branch, $parents )
-
Given a node, its "branch" name (actually, the reference given on the
commit
line of the fast-export) and a structure describing it's lineage over the various source repositories, find a suitable commit to which attach it.This method is the heart of the stitching algorithm.
SEE ALSO
git-stitch-repo
BUGS
Please report any bugs or feature requests on the bugtracker website http://rt.cpan.org/NoAuth/Bugs.html?Dist=Git-FastExport or by email to bug-git-fastexport@rt.cpan.org.
When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature.
AUTHOR
Philippe Bruhat (BooK) <book@cpan.org>
COPYRIGHT
Copyright 2008-2013 Philippe Bruhat (BooK), All Rights Reserved.
LICENSE
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.