
Data::Edit::Xml::To::Dita - Convert multiple Xml documents in parallel to Dita.


A framework for converting multiple Xml documents in parallel to Dita:

use Data::Edit::Xml::To::Dita;

sub convertDocument($$)
 {my ($project, $x) = @_;                   # use sumAbsRel to get default home

   {my ($c) = @_;
    if ($c->at_conbody)
<p>Hello world!</p>


Evaluate the results of the conversion by reading the summary file in the reports/ folder:

use Data::Table::Text qw(fpe readFile);

if (lint) # Lint report if available
 {my $s = readFile(&summaryFile);
  $s =~ s(\s+on.*) ()ig;
  my $S = <<END;

Summary of passing and failing projects

100 % success. Projects: 0+1=1.  Files: 0+1=1. Errors: 0,0

CompressedErrorMessagesByCount (at the end of this file):        0

FailingFiles   :         0
PassingFiles   :         1

FailingProjects:         0
PassingProjects:         1

FailingProjects:         0
   #  Percent   Pass  Fail  Total  Project
                                             # use sumAbsRel to get default home

PassingProjects:         1
   #   Files  Project
   1       1  1

DocumentTypes: 1

Document  Count
concept       1

100 % success. Projects: 0+1=1.  Files: 0+1=1. Errors: 0,0


  ok $s eq $S;

See the converted files in the out/ folder:

if (1) # Converted file
 {my $s = nwsc(readFile(fpe(&out, qw(hello_world dita))));
  my $S = nwsc(<<END);

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd" []>
<concept id="c1">
  <title id="title">Hello World</title>
    <p>Hello world!</p>

  ok $S eq $s;


Convert multiple Xml documents in parallel to Dita.

Version "2018-10-04+111".

The following sections describe the methods in each functional area of this module. For an alphabetic listing of all methods by name see Index.

Convert Xml to the Dita standard.

Convert Xml to the Dita standard.


Methods defined in this package.


Log messages including the project name if available

   Parameter  Description
1  @m         Messages


sub 𝗹𝗹𝗹(@)                                                                     
 {my (@m) = @_;                                                                 # Messages
  my $m = join '', dateTimeStamp, " ", @_;                                      # Time stamp each message
     $m =~ s(\s+) ( )gs;
  if ($project)
   {$m .= " in project $project";
  my ($p, $f, $l) = caller();
  $m .= " at $f line $l

  say STDERR $m;

You can provide you own implementation of this method in your calling package via:

sub lll {...}

if you wish to override the default processing supplied by this method.


Download documents from S3 to the downloads folder.


sub 𝗱𝗼𝘄𝗻𝗹𝗼𝗮𝗱𝗙𝗿𝗼𝗺𝗦𝟯                                                             
 {if (download)                                                                 # Download if requested
   {lll "Download from S3";
    clearFolder(downloads, clearCount);
    my $b = s3Bucket;
    my $d = downloads;
    my $f = s3FolderIn;
    my $p = s3Parms;
    my $c = qq(aws s3 sync s3://$b/$f $d $p);
    xxx $c;
   {lll "Download from S3 not requested";

You can provide you own implementation of this method in your calling package via:

sub downloadFromS3 {...}

if you wish to override the default processing supplied by this method.


Convert the encoding of documents in downloads to utf8 equivalents in folder in.


sub 𝗰𝗼𝗻𝘃𝗲𝗿𝘁𝗧𝗼𝗨𝗧𝗙𝟴                                                              
 {my $n = 0;
  if (unicode)
   {lll "Unicode conversion";
    my $d = downloads;
    my $i = in;
    clearFolder(in, clearCount);
    for my $source(searchDirectoryTreesForMatchingFiles(downloads, inputExt))
     {my $target = swapFilePrefix($source, downloads, in);
      my $type = trim(qx(enca -iL none "$source"));
      my $c    = qx(iconv -f $type -t UTF8 -o "$target" "$source");
      if (1)                                                                    # Change encoding
       {my $s = readFile($target);
        my $S = $s =~ s(encoding="[^"]{3,16}") (encoding="UTF-8")r;
        owf($target, $S) unless $S eq $s;
    lll "Unicode conversion applied to $n files";
   {lll "Unicode conversion not requested";


You can provide you own implementation of this method in your calling package via:

sub convertToUTF8 {...}

if you wish to override the default processing supplied by this method.


Number of projects to process.


sub 𝗽𝗿𝗼𝗷𝗲𝗰𝘁𝗖𝗼𝘂𝗻𝘁()                                                             
 {scalar keys %$projects

You can provide you own implementation of this method in your calling package via:

sub projectCount {...}

if you wish to override the default processing supplied by this method.


Project details including at a minimum the name of the project and its source file.


sub 𝗣𝗿𝗼𝗷𝗲𝗰𝘁                                                                    
 {my ($name, $source) = @_;                                                     # 𝗣𝗿𝗼𝗷𝗲𝗰𝘁 name, source file

  confess "No name for project
"          unless $name;
  confess "No source for project: $name
" unless $source;
  if (my $q = $$projects{$name})
   {my $Q = $q->source;
    confess "Duplicate project: $name
  confess "Source file does not exist:
" unless -e $source;

  my $p = genHash(q(𝗣𝗿𝗼𝗷𝗲𝗰𝘁),                                                   # 𝗣𝗿𝗼𝗷𝗲𝗰𝘁 definition
    id         => undef,                                                        # Id attribute value from outermost tag
    isMap      => undef,                                                        # Map
    name       => $name,                                                        # Name of project
    number     => projectCount + 1,                                             # Number of project
    outputFile => undef,                                                        # Output file
    source     => $source,                                                      # Input file
    title      => undef,                                                        # Title for project
    topicId    => undef,                                                        # Topic id for project - collected during gather

  $projects->{$p->name} = $p;                                                   # Save project definition

You can provide you own implementation of this method in your calling package via:

sub Project {...}

if you wish to override the default processing supplied by this method.


Locate documents to convert from folder in.


sub 𝗹𝗼𝗮𝗱𝗣𝗿𝗼𝗷𝗲𝗰𝘁𝘀                                                               
 {my @p = searchDirectoryTreesForMatchingFiles(in,    inputExt);                # Production documents
  my @t = searchDirectoryTreesForMatchingFiles(tests, inputExt);                # Test documents
  if (my %t = map {$_=>1} testDocuments)                                        # Locate documents to be tested
   {for my $file(@p, @t)                                                        # Favor production over test because test is easier to run in bulk
     {my $name = fn $file;
      next unless $t{$name};                                                    # Skip unless name matches
      next if $projects->{$name};                                               # Skip if we already have a document to test
      Project($name, $file);
  else                                                                          # Choose documents in bulk
   {for my $file(develop ? @t : @p, inputExt)
     {my $name  = fn $file;
      Project($name, $file);

You can provide you own implementation of this method in your calling package via:

sub loadProjects {...}

if you wish to override the default processing supplied by this method.


Process parse tree with checks to confirm features

   Parameter  Description
1  $project   Project
2  $x         Node
3  $sub       Sub


Output file for a document

   Parameter  Description
1  $project   Project
2  $x         Parse tree


Convert a title string to a file name

   Parameter  Description
1  $string    String


sub 𝘀𝘁𝗿𝗶𝗻𝗴𝗧𝗼𝗙𝗶𝗹𝗲𝗡𝗮𝗺𝗲($)                                                        
 {my ($string) = @_;                                                            # String
  $string =~ s(\.\.\.)        (_)gs;
  $string =~ s([%@#*?“'"|,.]) (_)gs;
  $string =~ s([&\+~\/\\:=])  (-)gs;
  $string =~ s([<\[])         (28)gs;
  $string =~ s([>\]])         (29)gs;
  $string =~ s(\s+)           (_)gs;

  my $r = lc firstNChars $string, maximumFileFromTitleLength;

You can provide you own implementation of this method in your calling package via:

sub stringToFileName {...}

if you wish to override the default processing supplied by this method.


Save file for parse tree after initial parse and gather

   Parameter  Description
1  $project   Project == document to convert


sub 𝗴𝗮𝘁𝗵𝗲𝗿𝗲𝗱𝗙𝗶𝗹𝗲($)                                                            
 {my ($project) = @_;                                                           # Project == document to convert
  fpe(gathered, $project->number, q(data))

You can provide you own implementation of this method in your calling package via:

sub gatheredFile {...}

if you wish to override the default processing supplied by this method.


Gather some information from each project

   Parameter  Description
1  $project   Project == document to convert


sub 𝗴𝗮𝘁𝗵𝗲𝗿𝗣𝗿𝗼𝗷𝗲𝗰𝘁($)                                                           
 {my ($project) = @_;                                                           # Project == document to convert
  my $projectName = $project->name;
  lll "Gather";                                                                 # Title of each conversion

  my $x = $project->parse;                                                      # Parse file
  $project->isMap   = $x->tag =~ m(map\Z)is;                                    # Map file
  $project->topicId = $x->id;                                                   # Topic Id

  $x->by(sub                                                                    # Locate title and hence output file
   {my ($t) = @_;
    if ($t->at_title)
     {my $T = $project->title = $t->stringContent;
      $project->outputFile = stringToFileName($T);

  storeFile gatheredFile($project), $x;                                         # Save parse tree - separately because it is large.

  $project                                                                      # Gathered information about a project

You can provide you own implementation of this method in your calling package via:

sub gatherProject {...}

if you wish to override the default processing supplied by this method.


Add deduplicating numbers to output files names that would otherwise be the same.


sub 𝗻𝘂𝗺𝗯𝗲𝗿𝗢𝘂𝘁𝗽𝘂𝘁𝗙𝗶𝗹𝗲𝘀                                                          
 {my %o;
  for my $P(sort keys %$projects)
   {my $p = $projects->{$P};
    if (my $o = $p->outputFile)
     {if (my $n = $o{$o}++)
       {$p->outputFile .= q(_).$n;
     {confess "No output file for project source:
", $p->source, "

You can provide you own implementation of this method in your calling package via:

sub numberOutputFiles {...}

if you wish to override the default processing supplied by this method.


Convert one document.

   Parameter  Description
1  $project   Project == document to convert
2  $x         Parse tree.


sub 𝗰𝗼𝗻𝘃𝗲𝗿𝘁𝗗𝗼𝗰𝘂𝗺𝗲𝗻𝘁($$)                                                        
 {my ($project, $x) = @_;                                                       # Project == document to convert, parse tree.

You can provide you own implementation of this method in your calling package via:

sub convertDocument {...}

if you wish to override the default processing supplied by this method.


Convert one document held in folder in into topic files held in out.

   Parameter  Description
1  $project   Project == document to convert


sub 𝗰𝗼𝗻𝘃𝗲𝗿𝘁𝗣𝗿𝗼𝗷𝗲𝗰𝘁($)                                                          
 {my ($project) = @_;                                                           # Project == document to convert
  my $projectName = $project->name;

  lll "Convert";                                                                # Title of each conversion

  my $x = retrieveFile(gatheredFile $project);                                  # Reload parse into this process

  convertDocument($project, $x);                                                # Convert document

  my $o = fpe(out, $project->outputFile, q(dita));                              # File to write to

  if (lint)                                                                     # Lint
   {my $l = Data::Edit::Xml::Lint::new();                                       # Write and lint topic
    $l->project = $project->name;                                               # Project name
    $l->catalog = catalog;                                                      # Catalog
    $l->file    = $o;                                                           # File to write to
    $l->source  = $project->formatXml($x);                                      # Format source and add headers
    $l->lintNOP;                                                                # Lint
  else                                                                          # Write without lint
   {my $f = $project->outputFile;
    writeFile($o, $project->formatXml($x));

  $project                                                                      # Conversion succeeded for project

You can provide you own implementation of this method in your calling package via:

sub convertProject {...}

if you wish to override the default processing supplied by this method.


Lint results held in folder outand write reports to folder reports.


sub 𝗹𝗶𝗻𝘁𝗥𝗲𝘀𝘂𝗹𝘁𝘀                                                                
 {if (lint)                                                                     # Only if lint requested
   {lll "Lint results";
    clearFolder(reports, clearCount);                                           # Clear prior run

    my $xref = Data::Edit::Xml::Xref::xref(inputFolder=>out, reports=>reports); # Check any cross references
    if (my $report = Data::Edit::Xml::Lint::report(out, qr(dita|ditamap|xml)))
     {my $r = $report->print;
      my $d = dateTimeStamp;
      my $h = home;
      my $b = s3Bucket;
      my $B = $b && upload ?

Please see: aws s3 sync s3://$b ?

) : qq(); my $x = # Include xref results ˢ{my $s = $xref->statusLine; return "

$s" if $s; q() };

      my $s = <<END;                                                            # rrrr
Summary of passing and failing projects on $d.\t\tVersion: $VERSION$B

      say STDERR $s;
      writeFile(summaryFile, $s);
     {lll "No Lint report available";
   {lll "Lint report not requested";

You can provide you own implementation of this method in your calling package via:

sub lintResults {...}

if you wish to override the default processing supplied by this method.


Send results to S3 from folder out.


sub 𝘂𝗽𝗹𝗼𝗮𝗱𝗧𝗼𝗦𝟯                                                                 
 {if (upload)
   {lll "Upload to S3";
    my $h = home;
    my $b = s3Bucket;
    my $f = s3FolderUp;
    my $p = s3Parms;
    my $c = qq(aws s3 sync $h $b/$f $p);
    say STDERR $c;
    print STDERR $_ for qx($c);
    say STDERR qq(Please see:  aws s3 sync $b/$f ?);
    Flip::Flop::𝘂𝗽𝗹𝗼𝗮𝗱𝗧𝗼𝗦𝟯();                                                   # Reset upload flip flop
   {lll "Upload to S3 not requested";

You can provide you own implementation of this method in your calling package via:

sub uploadToS3 {...}

if you wish to override the default processing supplied by this method.


Run tests by comparing files in folder out with corresponding files in testResults.


sub 𝗿𝘂𝗻𝗧𝗲𝘀𝘁𝘀                                                                   
 {if (develop)                                                                  # Run tests if developing
    my $F = join " ", @failedTests;
    my $f = @failedTests;
    my $p = @passedTests;
    my $a = @availableTests;
    say STDERR "Failed tests: $F" if @failedTests;
    $p + $f == $a or warn "Passing plus failing tests".
     " not equal to tests available: $p + $f != $a";
    say STDERR "Tests: $p+$f == $a pass+fail==avail";

You can provide you own implementation of this method in your calling package via:

sub runTests {...}

if you wish to override the default processing supplied by this method.


Normalize white space and remove comments

   Parameter  Description
1  $string    Text to normalize


sub 𝗻𝘄𝘀𝗰($)                                                                    
 {my ($string) = @_;                                                            # Text to normalize
  $string =~ s(<\?.*?\?>)  ()gs;
  $string =~ s(<!--.*?-->) ()gs;
  $string =~ s(<!DOCTYPE.+?>)  ()gs;
  $string =~ s( (props|id)="[^"]*") ()gs;

You can provide you own implementation of this method in your calling package via:

sub nwsc {...}

if you wish to override the default processing supplied by this method.


Evaluate the results of a test

   Parameter  Description
1  $file      File
2  $got       What we got
3  $expected  What we expected result


sub 𝘁𝗲𝘀𝘁𝗥𝗲𝘀𝘂𝗹𝘁($$$)                                                            
 {my ($file, $got, $expected) = @_;                                             # File, what we got, what we expected result
  my $f = fpe(tests, $file, q(dita));                                           # Actual result
  my $g = nwsc($got);
  my $e = nwsc($expected);

  if ($e !~ m(\S)s)                                                             # Blank test file
   {confess "Test $file is all blank";

  if ($g eq $e)                                                                 # Compare got with expected and pass
   {push @passedTests, $file;
    return 1;
  else                                                                          # Not as expected
   {push @failedTests, $file;
    my @g = grep {!/\A\s*(<!|<\?)/} split /
/, readFile($f);
    my @e = grep {!/\A\s*(<!|<\?)/} split /
/, $expected;
    shift @g, shift @e while @g and @e and nwsc($g[0]) eq nwsc($e[0]);
    cluck "Got/expected in test $file:
". $g[0].
". $e[0]. "
    return 0;

You can provide you own implementation of this method in your calling package via:

sub testResult {...}

if you wish to override the default processing supplied by this method.


Send results to S3 from folder out.


sub 𝗰𝗵𝗲𝗰𝗸𝗥𝗲𝘀𝘂𝗹𝘁𝘀                                                               
 {for my $expected(searchDirectoryTreesForMatchingFiles(testResults))
   {my $got  = swapFilePrefix($expected, testResults, out);
    my $test = fn $expected;
    push @availableTests, $test;
    if (-e $got)
     {testResult($test, readFile($got), readFile($expected));

You can provide you own implementation of this method in your calling package via:

sub checkResults {...}

if you wish to override the default processing supplied by this method.


Gather information from the selected project by reading their source files held in the in.


sub 𝗴𝗮𝘁𝗵𝗲𝗿𝗦𝗲𝗹𝗲𝗰𝘁𝗲𝗱𝗣𝗿𝗼𝗷𝗲𝗰𝘁𝘀                                                     
 {lll "Gather selected projects";
  my $ps = newProcessStarter(maximimumNumberOfProcesses, process);              # Process starter

  for(sort keys %$projects)                                                     # Gather information from each project

  if (my @results = $ps->finish)                                                # Consolidate results
   {reloadHashes(\@results);                                                    # Recreate attribute methods
    my %togather = %$projects;
    for my $project(@results)                                                   # Each result
     {my $projectName = $project->name;                                         # Project name
      if (my $p = $$projects{$projectName})                                     # Find project
       {$$projects{$projectName} = $project;                                    # Consolidate information gathered
        delete $togather{$projectName};                                         # Mark project as gathered
      else                                                                      # Confess to invalid project
       {confess "Unknown gathered project $projectName";
    if (my @f = sort keys %togather)                                            # Confess to projects that failed to gather
     {confess "The following projects failed to gather:
", join (' ', @f);

You can provide you own implementation of this method in your calling package via:

sub gatherSelectedProjects {...}

if you wish to override the default processing supplied by this method.


Convert the selected documents by reading their source in in, converting them and writing the resulting topics to out.


sub 𝗰𝗼𝗻𝘃𝗲𝗿𝘁𝗦𝗲𝗹𝗲𝗰𝘁𝗲𝗱𝗣𝗿𝗼𝗷𝗲𝗰𝘁𝘀                                                    
 {lll "Converted selected projects";
  my $ps = newProcessStarter(maximimumNumberOfProcesses, process);              # Process starter

  for $project(sort keys %$projects)                                            # Convert projects
   {$ps->start(sub{convertProject($projects->{$project})});                     # Convert each project in a separate process

  if (my @results = $ps->finish)                                                # Consolidate results
   {reloadHashes(\@results);                                                    # Recreate attribute methods
    my %toConvert = %$projects;
    for my $project(@results)                                                   # Each result
     {my $projectName = $project->name;                                         # Converted project name
      if (my $p = $$projects{$projectName})                                     # Find project
       {$$projects{$projectName} = $project;                                    # Consolidate information gathered
        delete $toConvert{$projectName};                                        # Mark project as converted
      else                                                                      # Confess to invalid project
       {confess "Unknown converted project $projectName";
    if (my @f = sort keys %toConvert)                                           # Confess to projects that failed to convert
     {confess "The following projects failed to convert:
", join (' ', @f);

You can provide you own implementation of this method in your calling package via:

sub convertSelectedProjects {...}

if you wish to override the default processing supplied by this method.


Convert the selected documents.


sub 𝗰𝗼𝗻𝘃𝗲𝗿𝘁𝗣𝗿𝗼𝗷𝗲𝗰𝘁𝘀                                                            
 {if (convert)                                                                  # Convert the documents if requested.
   {lll "Convert documents";
    clearFolder($_, clearCount) for out, process;                               # Clear output folders
    loadProjects;                                                               # Projects to run
    gatherSelectedProjects;                                                     # Gather information about each project
    numberOutputFiles;                                                          # Deduplicate output file names
    my @r = convertSelectedProjects                                             # Convert selected projects
    Flip::Flop::convert();                                                      # Reset conversion flip flop
    return @r;                                                                  # Return results of conversions
   {lll "Convert documents not requested";


You can provide you own implementation of this method in your calling package via:

sub convertProjects {...}

if you wish to override the default processing supplied by this method.


Perform all the conversion projects.


sub 𝗰𝗼𝗻𝘃𝗲𝗿𝘁𝗫𝗺𝗹𝗧𝗼𝗗𝗶𝘁𝗮                                                           
 {my ($package) = caller;


  for my $phase(qw(downloadFromS3 convertToUTF8 convertProjects
                   lintResults runTests uploadToS3))
   {no strict;
#   lll "Phase: ", $phase;

  $endTime = time;                                                              # Run time statistics
  $runTime = $endTime - $startTime;

You can provide you own implementation of this method in your calling package via:

sub convertXmlToDita {...}

if you wish to override the default processing supplied by this method.

Hash Definitions

Project Definition

Project definition

id - Id attribute value from outermost tag

isMap - Map

name - Name of project

number - Number of project

outputFile - Output file

source - Input file

title - Title for project

topicId - Topic id for project - collected during gather

𝗣𝗿𝗼𝗷𝗲𝗰𝘁 Definition

𝗣𝗿𝗼𝗷𝗲𝗰𝘁 definition

id - Id attribute value from outermost tag

isMap - Map

name - Name of project

number - Number of project

outputFile - Output file

source - Input file

title - Title for project

topicId - Topic id for project - collected during gather


The following is a list of all the attributes in this package. A method coded with the same name in your package will over ride the method of the same name in this package and thus provide your value for the attribute in place of the default value supplied for this attribute by this package.

Replaceable Attribute List

catalog clearCount convert devShm devShmOrHome develop download downloads endTime gathered home in inputExt lint maximimumNumberOfProcesses maximumFileFromTitleLength out parseCache process reports runTime s3Bucket s3FolderIn s3FolderUp s3Parms startTime summaryFile testDocuments testResults tests unicode upload


Dita catalog to be used for linting.


Limit on number of files to clear from each output folder.


Convert documents to dita if true.


Shared memory folder for output files.


Shared memory folder or home folder.


Production run if this file folder is detected otherwise development.


Download from S3 if true.


Downloads folder.


End time of run in seconds since the epoch.


Folder containing saved parse trees after initial parse and information gathering.


Home folder containing all the other folders


Input documents folder.


Extension of input files.


Lint output xml if true or write directly if false.


Maximum number of processes to run in parallel.


Maximum amount of title to use in constructing output file names.


Converted documents output folder.


Cached parse trees


Process data folder used to communicate results between processes.


Reports folder.


Elapsed run time in seconds.


Bucket on S3 holding documents to convert and the converted results.


Folder on S3 containing original documents.


Folder on S3 containing results of conversion.


Additional S3 parameters for uploads and downloads.


Start time of run in seconds since the epoch.


Summary report file.


List of production documents to test in development or () for normal testing locally or normal production if on Aws.


Folder containing test results expected.


Folder containing test files.


Convert to utf8 if true.


Upload to S3 if true.

Optional Replace Methods

The following is a list of all the optionally replaceable methods in this package. A method coded with the same name in your package will over ride the method of the same name in this package providing your preferred processing for the replaced method in place of the default processing supplied by this package. If you do not supply such an over riding method, the existing method in this package will be used instead.

Private Methods


Compute home directory once


Md5 sum for a file

   Parameter  Description
1  $file      File


Name of the file in which to cache parse trees

   Parameter  Description
1  $project   Project


Parse a project.

   Parameter  Description
1  $project   Project


Replaceable methods


Attribute methods


Merge packages

   Parameter  Description
1  $package   Name of package to be merged defaulting to that of the caller.


Create sample input files for testing. The attribute inputFolder supplies the name of the folder in which to create the sample files.


This module is written in 100% Pure Perl and, thus, it is easy to read, comprehend, use, modify and install via cpan:

sudo cpan install Data::Edit::Xml::To::Dita



Copyright (c) 2016-2018 Philip R Brenan.

This module is free software. It may be used, redistributed and/or modified under the same terms as Perl itself.