Name
Data::Edit::Xml::Lint - lint xml files in parallel using xmllint and report the failure rate
Synopsis
Linting and reporting
Create some sample xml files, some with errors, lint them in parallel and retrieve the number of errors and failing files:
for my $n(1..$N) # Some projects
{my $x = Data::Edit::Xml::Lint::new(); # New xml file linter
my $catalog = $x->catalog = catalogName; # Use catalog if possible
my $project = $x->project = projectName($n); # Project name
my $file = $x->file = fileName($n); # Target file
$x->source = <<END; # Sample source
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE concept PUBLIC "-//HPE//DTD HPE DITA Concept//EN" "concept.dtd" []>
<concept id="$project">
<title>Project $project</title>
<conbody>
<p>Body of $project</p>
</conbody>
</concept>
END
$x->source =~ s/id="\w+?"//gs if addError($n); # Introduce an error into some projects
$x->lint(foo=>1); # Write the source to the target file, lint using xmllint, include some attributes to be included as comments at the end of the target file
}
Data::Edit::Xml::Lint::wait; # Wait for lints to complete
say STDERR Data::Edit::Xml::Lint::report($outDir, "xml")->print; # Report total pass fail rate
}
Produces:
50 % success converting 3 projects containing 10 xml files on 2017-07-13 at 17:43:24
ProjectStatistics
# Percent Pass Fail Total Project
1 33.3333 1 2 3 aaa
2 50.0000 2 2 4 bbb
3 66.6667 2 1 3 ccc
FailingFiles
# Errors Project File
1 1 ccc out/ccc5.xml
2 1 aaa out/aaa9.xml
3 1 bbb out/bbb1.xml
4 1 bbb out/bbb7.xml
5 1 aaa out/aaa3.xml
Rereading
Once a file has been linted, it can reread with read to obtain details about the xml including id=?s defined (see: idDefs below) and any labels that refer to these id=?s (see: labelDefs below). Such labels provide additional names for a node which cannot be stored in the xml itself.
{catalog => "/home/phil/hp/dtd/Dtd_2016_07_12/catalog-hpe.xml",
definition => "bbb",
docType => "<!DOCTYPE concept PUBLIC \"-//HPE//DTD HPE DITA Concept//EN\" \"concept.dtd\" []>",
errors => 1,
file => "out/bbb1.xml",
foo => 1,
header => "<?xml version=\"1.0\" encoding=\"UTF-8\"?>",
idDefs => { bbb => 1, c1 => 1 },
labelDefs => {
bbb => "bbb",
c1 => "c1",
conbody1 => "c1",
conbody2 => "c1",
concept1 => "bbb",
concept2 => "bbb",
},
labels => "bbb concept1 concept2",
project => "bbb",
sha256 => "b00cdebf2e1837fa15140d25315e5558ed59eb735b5fad4bade23969babf9531",
source => "..."
}
ReLinting
In order to fix references between files, a list of files can be relinted:
a map is constructed to locate all the ids and labels defined in the specified files
the resulting parse tree and id map are handed to a caller provided 𝘀𝘂𝗯 that can the traverse the parse tree fixing attributes which make references between the files.
the modified parse trees are written back to the originating file thus fixing the changes
Description
Constructor
Construct a new linter
new
Create a new xml linter - call this method statically as in Data::Edit::Xml::Lint
Attributes
Attributes describing a lint
file :lvalue
File that the xml will be written to and read from by lint or read
catalog :lvalue
Optional catalog file containing the locations of the DTDs used to validate the xml
docType :lvalue
The second line: the document type extracted from the source
dtds :lvalue
Optional directory containing the DTDs used to validate the xml
errors :lvalue
Number of lint errors detected by xmllint
header :lvalue
The first line: the xml header extracted from source
labels :lvalue
Optional parse tree to supply labels for the current source as the labels are present in the parse tree not in the string representing the parse tree
linted :lvalue
Date the lint was performed by lint
idDefs :lvalue
{id} = count - the number of times this id is defined in the xml contained in this file
labelDefs :lvalue
{label or id} = id - the id of the node containing a label defined on the xml
project :lvalue
Optional project name to allow error counts to be aggregated by project and to allow id and labels to be scoped to the files contained in each project
processes :lvalue
Maximum number of xmllint processes to run in parallel - 8 by default
sha256 :lvalue
Sha256 hash of the string containing the xml processed by lint or read
source :lvalue
The source Xml to be linted
Lint
Lint xml files in parallel
lint
Store some xml in a files and apply xmllint in parallel
Parameter Description
1 $lint Linter
2 %attributes Attributes to be recorded as xml comments
read
Reread a linted xml file and extract the attributes associated with the lint
Parameter Description
1 $file File containing xml
wait()
Wait for all lints to finish - this is a static method, call as Data::Edit::Xml::Lint::wait
searchDirectoryTreeForMatchingFiles
Search a directory tree for files that match the specified extensions
Parameter Description
1 $folder Directory to start search in
2 @extensions Extensions of files to find
clear
Clear the results of a prior run
Parameter Description
1 $outputDirectory Directory to clear
2 @fileExtensions Extensions of files to remove
relint
Locate all the labels or id in the specified files, analyze the map of labels and ids with analysisSub parse each file, process each parse with processSub, then "lint" in lint the reprocessed xml back to the original file - this allows you to reprocess the contents of each file with knowledge of where labels or id are located in the other files associated with a project. The analysisSub(linkmap = {project}{labels or id>}=[file, id]) should return true if the processing of each file is to be performed subsequently. The processSub(parse tree representation of a file, id and label mapping, reloaded linter) should return true if a lint is required to save the results after each file has been processed else false, files to reprocess
Parameter Description
1 $analysisSub Analysis 𝘀𝘂𝗯
2 $processSub Process 𝘀𝘂𝗯
3 $folder Folder containing files to process (recursively)
4 @extensions Extensions of files to process
resolveUniqueLink
Return the unique definition of the specified link in the link map or undef if no such definition exists
Parameter Description
1 $linkMap Link map
2 $link Label
multipleLabelDefs
Return ([project; source label or id; targets count]*) of all labels or id that have multiple definitions
Parameter Description
1 $labelDefs Label and Id definitions
multipleLabelDefsReport
Return a report showing labels and id with multiple definitions in each project ordered by most defined
Parameter Description
1 $labelDefs Label and Id definitions
singleLabelDefs
Return ([project; label or id]*) of all labels or ids that have a single definition
Parameter Description
1 $labelDefs Label and Id definitions
singleLabelDefsReport
Return a report showing label or id with just one definitions ordered by project, label name
Parameter Description
1 $labelDefs Label and Id definitions
Report
Methods for reporting the results of linting several files
report
Analyse the results of prior lints and return a hash reporting various statistics and a printable report
Parameter Description
1 $outputDirectory Directory to clear
2 @fileExtensions Types of files to analyze
Attributes
passRatePercent :lvalue
Total number of passes as a percentage of all input files
timestamp :lvalue
Timestamp of report
numberOfProjects :lvalue
Number of projects defined - each project can contain zero or more files
numberOfFiles :lvalue
Number of files encountered
failingFiles :lvalue
Array of [number of errors, project, files] ordered from least to most errors
projects :lvalue
Hash of "project name"=>[project name, pass, fail, total, percent pass]
print :lvalue
A printable report of the above
Index
searchDirectoryTreeForMatchingFiles
Installation
This module is written in 100% Pure Perl and is thus easy to read, use, modify and install.
Standard Module::Build process for building and installing modules:
perl Build.PL
./Build
./Build test
./Build install
Author
philiprbrenan@gmail.com
http://www.appaapps.com
Copyright
Copyright (c) 2016-2017 Philip R Brenan.
This module is free software. It may be used, redistributed and/or modified under the same terms as Perl itself.