NAME

XML::DifferenceMarkup

SYNOPSIS

use XML::DifferenceMarkup qw(make_diff);

$parser = XML::LibXML->new();
$parser->keep_blanks(0);
$d1 = $parser->parse_file($fname1);
$d2 = $parser->parse_file($fname2);

$dom = make_diff($d1, $d2);
print $dom->toString(1);

DESCRIPTION

This module implements an XML diff producing XML output. Both input and output are DOM documents, as implemented by XML::LibXML.

The diff format used by XML::DifferenceMarkup is meant to be human-readable (i.e. simple, as opposed to short) - basically the diff is a subset of the input trees, annotated with instruction element nodes specifying how to convert the source tree to the target by inserting and deleting nodes. To prevent name colisions with input trees, all added elements are in a namespace http://www.locus.cz/XML/DifferenceMarkup (the diff will fail on input trees which already use that namespace).

The top-level node of the diff is always <diff/> (or rather <dm:diff xmlns:dm="http://www.locus.cz/XML/DifferenceMarkup"> ... </dm:diff> - this description elides the namespace specification from now on); under it are fragments of the input trees and instruction nodes: <insert/>, <delete/> and <copy/>. <copy/> is used in places where the input subtrees are the same - in the limit, the diff of 2 identical documents is

<?xml version="1.0"?>
<dm:diff xmlns:dm="http://www.locus.cz/XML/DifferenceMarkup">
  <dm:copy count="1"/>
</dm:diff>

(copy always has the count attribute and no other content). <insert/> and <delete/> have the obvious meaning - in the limit a diff of 2 documents which have nothing in common is something like

<?xml version="1.0"?>
<dm:diff xmlns:dm="http://www.locus.cz/XML/DifferenceMarkup">
  <dm:delete>
    <old/>
  </dm:delete>
  <dm:insert>
    <new>
      <tree>with the whole subtree, of course</tree>
    </new>
  </dm:insert>
</dm:diff>

Actually, the above is a typical output even for documents which have plenty in common - if (for example) the names of top-level elements in the two input documents differ, XML::DifferenceMarkup will produce a maximal diff, even if their subtrees are exactly the same.

Note that <delete/> contains just one level of nested nodes - their subtrees are not included in the diff (but the element nodes which are included always come with all their attributes). <insert/> and <delete/> don't have any attributes and always contain some subtree.

Instruction nodes are never nested; all nodes above an instruction node (except the top-level <diff/>) come from the input trees. A node from the input tree is included in the output diff to provide context for instruction nodes when it satisfies the following conditions:

  • it's an element node

  • it has the same name in both input trees

  • it has the same attributes (names and values) in the same order

  • its subtree is not the same

The last condition guarantees that the "contextual" nodes always contain at least one <insert/> or <delete/>.

FUNCTIONS

Note that XML::DifferenceMarkup functions must be explicitly imported (i.e. with use XML::DifferenceMarkup qw(make_diff merge_diff);) before they can be called.

make_diff

make_diff takes 2 parameters (the input documents) and produces their diff. Note that the diff is asymmetric - make_diff($a, $b) is different from make_diff($b, $a).

merge_diff

merge_diff takes the first document passed to make_diff and its return value and produces the second document. (More-or-less - the document isn't canonicalized, so opinions on its "equality" may differ.)

Error Handling

Both make_diff and merge_diff throw exceptions on invalid input - its own exceptions as well as exceptions thrown by XML::LibXML. These exceptions can usually (not always, though - it is possible to construct an input which will crash the calling process) be catched by calling the functions from an eval block.

BUGS

  • attribute order is significant

  • diff needs just one namespace declaration but usually has more

  • information outside the document element is not processed

AUTHOR

Vaclav Barta <vbar@comp.cz>

SEE ALSO

XML::LibXML