NAME
Algorithm::Diff::Apply -- apply one or more Algorithm::Diff diffs
SYNOPSIS
## Single-diff form:
use Algorithm::Diff::Apply qw{apply_diff};
my @ary = ...;
my @diff = ...; # some call to Algorithm::Diff::diff()
my @changed_ary = apply_diff(\@ary, \@diff);
my $changed_ary = apply_diff(\@ary, \@diff);
## To apply >1 diff at once, use the plural form:
use Algorithm::Diff::Apply qw{apply_diffs};
@newary = apply_diffs(\@ary,
diff1_name => \@diff1,
diff2_name => \@diff2,
...
diffN_name => \@diffN);
# Alternatively:
@newary = apply_diffs(\@ary, %named_diffs);
# Scalar context:
$newary = apply_diffs(\@ary, %named_diffs);
# Extension argument syntax:
$newary = apply_diffs(\@ary, {
resolver => \&some_sub,
optimisers => [\&foo, \&bar],
opt1 => ...,
opt2 => ...,
}, %named_diffs);
DESCRIPTION
This module contains subroutines for applying diffs generated by Algorithm::Diff
to a target array in the hope of regenerating a new array incorporating all the changes described in the diffs into a new merged array.
If two hunks from different diffs happen to affect the same line, conflicts are detected and can be optionally handed off to helper subroutines for resolution.
- apply_diff ARRAY,DIFF
-
Applies the changes described by a diff to a copy of ARRAY, which is then returned. DIFF is a diff generated by
Algorithm::Diff::diff()
, and ARRAY must be an array of an appropriate length. Both parameters are passed in as references. Neither argument is modified.In a scalar context,
apply_diff()
returns a reference to the permuted copy that's generated. In an array context, the permuted copy is returned as an array value.This version of the algorithm is simpler and quicker than the full-blown plural form, and should be used if you're only ever going to be applying one diff at once.
- apply_diffs ARRAY,HASH
- apply_diffs ARRAY,OPTIONS,HASH
-
Applies more than one diff to a copy of an array at once, manages conflicts, and returns the permuted copy as either a reference or an array depending on context.
ARRAY must be a reference to an array value of an appropriate length. The array behind the passed reference is not permuted.
The HASH parameter contains diffs from different sources, as generated by
Algorithm::Diff::diff()
. The diffs are keyed by arbitrary strings which should reflect their source. See "DIFF LABELS" for details of what these strings might reflect.OPTIONS, if specified, must be a hash reference of option keywords and the corresponding parameters. The following options are recognised:
- Option "optimisers" (a.k.a. "optimizers")
-
Reference to an array of conflict optimiser subroutines. Normally
apply_diffs()
performs all the optimisations documented in this module; this option can be used to change that. Pass in an empty array to turn off optimisations altogether. See "Conflict Optimiser Callbacks" for details of how these subs are called. - Option "resolver"
-
This option can be used to supply a subroutine which will be called when a conflict is detected. See "Conflict Resolver Callbacks" details of how resolver routines are called.
- mark_conflicts HASH
-
This is the default resolver callback; see "Conflict Optimiser Callbacks" for details of its interface. It causes
apply_diffs()
to return arrays looking a bit like:[ @part_before_conflict, ">>>>>> diff1_name\n", @lines_1, # Lines permuted by diff1 (only) ">>>>>> diff2_name\n" @lines_2, # The same lines, as permuted by diff2 "<<<<<<\n", @part_after_conflict, ]
Which is probably the right thing to do if your array is going to be printed out one item per line.
- optimise_remove_duplicates HASH
-
Conflict optimiser: detects identical diff hunks in the conflict block, and removes all but one of each duplicated hunk.
CALLBACK INTERFACES
A lot of the dirty work in apply_diffs()
is done by callback subroutines, and users of Algorithm::Diff::Apply
can override its default behaviour if they wish by passing appropriate options to apply_diffs()
(see above).
Conflict Optimiser Callbacks
Optimiser callbacks are a way of making conflict cases easier for humans to deal with. This can be done by combining diff hunks from different sources that do the same thing, factoring out common changes, and other devious bits of monkeywork. Doing this can even factor away a conflict situation entirely.
These callbacks are called by apply_diffs()
when it detects a group of hunks from different diff sequences that can't all be applied at once without conflicting with each other.
For each block of conflicting diffs, the callback will be called with every diff in the block in the following format:
@ret = optimiser_callback (
"conflict_block" => {
"name_1" => [$hunk_1_1, $hunk_1_2, ..., $hunk_1_N],
"name_2" => [$hunk_2_1, $hunk_2_2, ..., $hunk_2_N],
:
"name_M" => [$hunk_M_1, $hunk_M_2, ..., $hunk_M_N],
},
);
The tag names are whatever you labelled the diff sequences you passed to apply_diffs()
. By definition, a conflict block contains a subset of at least two separate diffs.
Each of the $hunk_X_X
scalars in the arguments above is a hash reference with the following structure:
{
"start" => N,
"changes" => [[OP1, DATA1], ..., [OPn, DATAn]],
}
Where "start" is a line number in the target array, indicating where this hunk is intended to be applied, and "changes" contains the changes to apply.
Optimiser callbacks should return a permuted copy of what they were passed. Empty diffs will be discarded automatically. If only one diff remains after processing, the conflict will have been optimised away completely.
Conflict Resolver Callbacks
Resolver callbacks are invoked when conflicts have been detected, and the optimisers weren't able to completely factor away the conflict block.
The job of a resolver callback is to return some kind of resolution of the conflicting sub-arrays. These subs are called a little like this:
@ret = resolver_callback (
"alt_txts" => {
"diff1_name" => ['m', 'n', 'o'],
"diff3_name" => [],
}
);
The alt_txts
parameter is a hash ref keyed by (some of the) names of the diffs being applied in the main apply_diffs()
call, whose values are arrays containing alternative generated subsequences. Each of these subsequences is the result of applying a set of hunks from the corresponding diff to a copy of the slice of the source array where the conflict happened.
A resolver callback should an array which will be spliced into the array that apply_diffs()
will return.
(Other options are passed to resolver subs too, but these are as yet undocumented because they're still liable to change)
DIFF LABELS
apply_diffs()
makes you uniquely label the diffs you're applying when merging and applying them together, in the hope that when a conflict occurs, a meaningful name can be associated with each source of changes.
A diff label should reflect where the change came from, and can be anything you like: a revision number, a URL, a filename, or just a unique number. Algorithm::Diff::Apply
doesn't care about the exact syntax of the tag names you use - any unique string will do.
This convention is also used by the callback routines whose interfaces are described in "CALLBACK INTERFACES" above.
TO-DO
Data comparison will need to be considered in the same way as Algorithm::Diff
by the library's conflict optimisers: use a hashing scheme to allow array elements to be refs. See "KEY GENERATION FUNCTIONS" in Algorithm::Diff for details.
The calling convention for options is ugly and confusing, and defeats prototyping. It's done like that to correspond with other modules in the Algorithm::{Diff,Patch} family, but alternative syntaxes could be permitted.
Add more conflict optimisers. If anyone out there has a good routine, and would like to see it added to the library, then I'd be happy to incorporate it.
AUTHOR
Andrew Chadwick, andrewc-algodiffaply@piffle.org.
LICENCE
Copyright (c) 2003 Andrew Chadwick. This program is free software; you may copy it, redistribute it, or modify it under the same terms as Perl itself.