Fix a file by moving its hrefs and conrefs to the xtrf attribute unless deguidization is in effect and the guid can be converted into a valid Dita reference accessing a file in the input corpus.

If fixRelocatedRefs is in effect: such references are fixed by assuming that the files mentioned in broken links have been relocated else where in the elsewhere in the folder structure and can be located by base file name alone.

If fixXrefsByTitle is in effect apply the Gearhart Title Method: fix broken xrefs by looking for topics with the same title text as the content of the xref.

Only files that have something in them that needs fixing are parsed and fixed as this saves time not processing files that do not need any work on them.

Report on hrefs that have been guidized and mark them for fixing. The reasons we do not fix them here are:

- we do not have access to a parse tree in which to fix them
- the caller might not want them fixed
- the caller might want to choose the fixing strategy.

Thus this report merely identifies hrefs with guids in them in line with xrefs initial goal of reporting the state of play, while the question of actually improving the situation is deferred until later.

References might need fixing either because they are invalid or because we are deguidizing

Name

Data::Edit::Xml::Xref - Cross reference Dita XML, match topics and ameliorate missing references.

Synopsis

Check the references in a large corpus of Dita XML documents held in folder inputFolder running processes in parallel where ever possible to take advantage of multi-cpu computers:

use Data::Edit::Xml::Xref;

my $x = xref(inputFolder              => q(in),
             maximumNumberOfProcesses => 512,
             relativePath             => q(out),
             fixBadRefs               => 1,
             flattenFolder            => q(out2),
             matchTopics              => 0.9,
            );

The cross reference analysis can be requested as a status line:

ok nws($x->statusLine) eq nws(<<END);
Xref: 108 references fixed, 50 bad xrefs, 16 missing image files, 16 missing image references, 13 bad first lines, 13 bad second lines, 9 bad conrefs, 9 duplicate topic ids, 9 files with bad conrefs, 9 files with bad xrefs, 8 duplicate ids, 6 bad topicrefs, 6 files not referenced, 4 invalid guid hrefs, 2 bad book maps, 2 bad tables, 1 External xrefs with no format=html, 1 External xrefs with no scope=external, 1 file failed to parse, 1 href missing
END

Or as a tabular report:

 ok nws($x->statusTable) eq nws(<<END);
Xref:
   Count  Condition
1    108  references fixed
2     50  bad xrefs
3     16  missing image files
4     16  missing image references
5     13  bad first lines
6     13  bad second lines
7      9  files with bad conrefs
8      9  bad conrefs
9      9  files with bad xrefs
10      9  duplicate topic ids
11      8  duplicate ids
12      6  bad topicrefs
13      6  files not referenced
14      4  invalid guid hrefs
15      2  bad book maps
16      2  bad tables
17      1  href missing
18      1  file failed to parse
19      1  External xrefs with no format=html
20      1  External xrefs with no scope=external
END

More detailed reports are produced in the reports folder:

$x->reports

and indexed by the reports report:

reports/reports.txt

which contains a list of all the reports generated:

   Rows  Title                                                           File
1     5  Attributes                                                      reports/count/attributes.txt
2    13  Bad Xml line 1                                                  reports/bad/xmlLine1.txt
3    13  Bad Xml line 2                                                  reports/bad/xmlLine2.txt
4     9  Bad conRefs                                                     reports/bad/ConRefs.txt
5     2  Bad external xrefs                                              reports/bad/externalXrefs.txt
6    16  Bad image references                                            reports/bad/imageRefs.txt
7     9  Bad topicrefs                                                   reports/bad/topicRefs.txt
8    50  Bad xRefs                                                       reports/bad/XRefs.txt
9     2  Bookmaps with errors                                            reports/bad/bookMap.txt
10     2  Document types                                                  reports/count/docTypes.txt
11     8  Duplicate id definitions within files                           reports/bad/idDefinitionsDuplicated.txt
12     3  Duplicate topic id definitions                                  reports/bad/topicIdDefinitionsDuplicated.txt
13     3  File extensions                                                 reports/count/fileExtensions.txt
14     1  Files failed to parse                                           reports/bad/parseFailed.txt
15     0  Files types                                                     reports/count/fileTypes.txt
16    16  Files whose short names are bi-jective with their md5 sums      reports/good/shortNameToMd5Sum.txt
17     0  Files whose short names are not bi-jective with their md5 sums  reports/bad/shortNameToMd5Sum.txt
18   108  Fixes Applied To Failing References                             reports/lists/referencesFixed.txt
19     0  Good bookmaps                                                   reports/good/bookMap.txt
20     9  Good conRefs                                                    reports/good/ConRefs.txt
21     5  Good topicrefs                                                  reports/good/topicRefs.txt
22     8  Good xRefs                                                      reports/good/XRefs.txt
23     1  Guid topic definitions                                          reports/lists/guidsToFiles.txt
24     2  Image files                                                     reports/good/imagesFound.txt
25     1  Missing hrefs                                                   reports/bad/missingHrefAttributes.txt
26    16  Missing image references                                        reports/bad/imagesMissing.txt
27     4  Possible improvements                                           reports/improvements.txt
28     2  Resolved GUID hrefs                                             reports/good/guidHrefs.txt
29     2  Tables with errors                                              reports/bad/tables.txt
30    23  Tags                                                            reports/count/tags.txt
31    11  Topic Reuses                                                    reports/lists/topicReuse.txt
32     0  Topic Reuses                                                    reports/lists/similar/byTitle.txt
33    16  Topics                                                          reports/lists/topics.txt
34    15  Topics with similar vocabulary                                  reports/lists/similar/byVocabulary.txt
35     0  Topics with validation errors                                   reports/bad/validationErrors.txt
36     0  Topics without ids                                              reports/bad/topicIdDefinitionsMissing.txt
37     6  Unreferenced files                                              reports/bad/notReferenced.txt
38    11  Unresolved GUID hrefs                                           reports/bad/guidHrefs.txt

File names in reports can be made relative to a specified directory named on the:

relativePath => q(out)

attribute.

Add navigation titles to topic references

Xref will create or update the navigation titles navtitles of topic refs appendix|chapter|topicref in maps if requested by both file name and GUID reference:

addNavTitle => 1

Reports of successful updates will be written to:

reports/good/navTitles.txt

Reports of unsuccessful updates will be written to:

reports/bad/navTitles.txt

Fix bad references

It is often desirable to ameliorate unresolved Dita href attributes so that incomplete content can be loaded into a content management system. The:

fixBadRefs => 1

attribute requests that the:

conref and href

attributes be renamed to:

xtrf

if the conref or href attribute specification cannot be resolved in the current corpus.

If the fixedFolder attribute is set, the fixed files are written into this folder, else they are written back into the inputFolder. Two reports are generated by this action:

reports/bad/fixedRefs.txt

reports/bad/fixedRefsNoAction.txt

This feature designed by mailto:mim@cpan.org.

Deguidize

Some content management systems use guids, some content management systems use file names as their means of identifying content. When moving from a guid to a file name content management system it might be necessary to replace the guids representing file names with the actual underlying file names. If the

deguidize =>1

parameter is set to true, Xref will replace any such file guids with the underlying file name if it is present in the content being cross referenced.

File flattening

It is often desirable to flatten the topic files so that they can coexist in a single folder of a content management system without colliding with each other.

The presence of the input attribute:

flattenFolder=> folder-to-flatten-files-into

causes topic files to be flattened into the named folder.

Xref uses the GBStandard to generate flattened file names.

Locating relocated files

File references in conref/hrefs that have a valid base file name and an invalid path can be fixed by setting the input attribute:

fixRelocatedRefs=>1

to a true value to request Xref to replace the incorrect path to the specified base file with the correct path.

If coded in conjunction with the fixBadRefs input attribute this will cause Xref to first try and fix any missing xrefs, any that still fal to resolve will then be ameliorated by moving them to the xtrf attribute.

The first letter of the root tag of the topic.
The title of the topic with all runs of characters not in the ranges:
a-z, A-Z, 0-9

reduced to a single underscore.

The MD5 sum in hexadecimal of the content of the topic.

    This has the effect of sorting files by their root tags and titles while guaranteeing a unique name for the topic that depends only on its content.

    If the content of two such files is identical then they will have an identical file name because the generation of the file name depends only on the content of the topic. If two topic files have the same name under this naming system then they have identical content and only one file is needed to hold the topic in a content management system.

Fix Xrefs by Title

Xrefs with broken or missing hrefs can sometimes be fixed by matching the text content of the xref with the titles of topics. If:

fixXrefsByTitle => 1

is specified, Xref will locate possible targets for a broken href by matching the white space normalized Data::Table::Text::nws of the text content of the xref with the similarly normalized title of each topic. If a single matching candidate is located then it will be used to update the href attribute of the xref.

Topic Matching

Topics can be matched on title and vocabulary to assist authors in finding similar topics by specifying the:

matchTopics => 0.9

attribute where the value of this attribute is the confidence level between 0 and 1.

Topic matching might take some time for large input folders.

Title matching

Title sorts topics by their titles so that topic with similar titles can be easily located:

   Similar  Prefix        Source
1       14  c_Notices__   c_Notices_5614e96c7a3eaf3dfefc4a455398361b
2           c_Notices__   c_Notices_14a9f467215dea879d417de884c21e6d
3           c_Notices__   c_Notices_19011759a2f768d76581dc3bba170a44
4           c_Notices__   c_Notices_aa741e6223e6cf8bc1a5ebdcf0ba867c
5           c_Notices__   c_Notices_f0009b28c3c273094efded5fac32b83f
6           c_Notices__   c_Notices_b1480ac1af812da3945239271c579bb1
7           c_Notices__   c_Notices_5f3aa15d024f0b6068bd8072d4942f6d
8           c_Notices__   c_Notices_17c1f39e8d70c765e1fbb6c495bedb03
9           c_Notices__   c_Notices_7ea35477554f979b3045feb369b69359
10           c_Notices__   c_Notices_4f200259663703065d247b35d5500e0e
11           c_Notices__   c_Notices_e3f2eb03c23491c5e96b08424322e423
12           c_Notices__   c_Notices_06b7e9b0329740fc2b50fedfecbc5a94
13           c_Notices__   c_Notices_550a0d84dfc94982343f58f84d1c11c2
14           c_Notices__   c_Notices_fa7e563d8153668db9ed098d0fe6357b
15        3  c_Overview__  c_Overview_f9e554ee9be499368841260344815f58
16           c_Overview__  c_Overview_f234dc10ea3f4229d0e1ab4ad5e8f5fe
17           c_Overview__  c_Overview_96121d7bcd41cf8be318b96da0049e73

Vocabulary matching

Vocabulary matching compares the vocabulary of pairs of topics: topics with similar vocabularies within the confidence level specified are reported together:

   Similar  Topic
1        8  in/1.dita
2           in/2.dita
3           in/3.dita
4           in/4.dita
5           in/5.dita
6           in/6.dita
7           in/7.dita
8           in/8.dita
9
10        2  in/map/bookmap.ditamap
11           in/map/bookmap2.ditamap
12
13        2  in/act4. dita
14           in/act5.dita

Description

Cross reference Dita XML, match topics and ameliorate missing references.

Version 20190524.

The following sections describe the methods in each functional area of this module. For an alphabetic listing of all methods by name see Index.

Cross reference

Check the cross references in a set of Dita files and report the results.

xref(%)

Check the cross references in a set of Dita files held in inputFolder and report the results in the reports folder. The possible attributes are defined in Data::Edit::Xml::Xref

   Parameter    Description
1  %attributes  Attributes

Example:

if (1) {
  clearFolder($_, 420) for qw(in out reports);
  createSampleInputFiles(8);
  my $x = 𝘅𝗿𝗲𝗳(inputFolder              => q(in),
               deguidize                => 1,
               fixBadRefs               => 1,
               fixRelocatedRefs         => 1,
               maximumNumberOfProcesses => 2,
               matchTopics              => 0.9,
               flattenFolder            => q(out),
               relativePath             => q(in));

  ok nws($x->statusLine) eq nws(<<'END');
Xref: 103 xtfr, 50 bad xrefs, 18 missing image files, 18 missing image references, 14 bad first lines, 14 bad second lines, 12 duplicate topic ids, 11 bad conrefs, 9 files with bad conrefs, 9 files with bad xrefs, 8 duplicate ids, 6 bad topicrefs, 4 invalid guid hrefs, 3 bad book maps, 3 files not referenced, 2 bad tables, 2 href url encoding, 1 External xrefs with no format=html, 1 External xrefs with no scope=external, 1 file failed to parse, 1 href missing
END

  say STDERR $x->statusTable;

  is_deeply $x->relocatedReferencesFixed,
[["map/bookmap.ditamap",
  "bookmap.ditamap",
  "/home/phil/perl/cpan/DataEditXmlXref/lib/Data/Edit/Xml/in/act2.dita",
 ],
 ["map/bookmap2.ditamap",
  "bookmap2.ditamap",
  "/home/phil/perl/cpan/DataEditXmlXref/lib/Data/Edit/Xml/in/act2.dita",
 ],
];
 }

Data::Edit::Xml::Xref Definition

Attributes used by the Xref cross referencer.

Input fields

debugTimes - Write timing information if true

deguidize - Set true to replace guids in dita references with file name. Given reference g1#g2/id convert g1 to a file name by locating the topic with topicId g2. This requires the guids to be genuinely unique. SDL guids are thought to be unique by language code but the same topic, translated to a different language might well have the same guid as the original topic with a different language code: =(de|en|es|fr). If the source is in just one language then the guid uniqueness is a reasonable assumption. If the conversion can be done in phases by language then the uniqueness of guids is again reasonably assured. Data::Edit::Xml::Lint provides an alternative solution to deguidizing by using labels to record the dita reference in the input corpus for each id encountered, these references can then be resolved in the usual manner by Data::Edit::Xml::Lint::relint.

fixBadRefs - Try to fix bad references in these files where possible by either changing a guid to a file name assuming the right file is present in the corpus nbing scanned and deguidize has been set true or failing that by moving the failing reference to the "xtrf" attribute.

fixRelocatedRefs - Fix references to topics that have been moved around in the out folder structure assuming that all file names are unique.

fixXrefsByTitle - Try to fix invalid xrefs by the Gearhart Title Method if true

flattenFolder - Files are renamed to the Gearhart standard and placed in this folder if set. References to the unflattened files are updated to references to the flattened files. This option will eventually be deprecated as the Dita::GB::Standard is now fully available allowing files to be easily flattened before being processed by Xref.

inputFolder - A folder containing the dita and ditamap files to be cross referenced.

matchTopics - Match topics by title and by vocabulary to the specified confidence level between 0 and 1. This operation might take some time to complete on a large corpus.

maxZoomIn - Optional hash of names to regular expressions to look for in each file

maximumNumberOfProcesses - Maximum number of processes to run in parallel at any one time.

relativePath - Report files relative to this path or absolutely if undefined.

reports - Reports folder: the cross referencer will write reports to files in this folder.

summary - Print the summary line.

Output fields

addNavTitles - If true, add navtitle to topicrefs to show the title of the target

attributeCount - {file}{attribute name} == count of the different xml attributes found in the xml files.

attributeNamesAndValuesCount - {file}{attribute name}{value} = count

author - {file} = author of this file.

badBookMaps - Bad book maps.

badConRefs - {sourceFile} = [file, href] indicating the file has at least one bad conref.

badConRefsList - Bad conrefs - by file.

badGuidHrefs - Bad conrefs - all.

badImageRefs - Consolidated images missing.

badNavTitles - Details of nav titles that were not resolved

badTables - Array of tables that need fixing.

badTopicRefs - [file, href] Invalid href attributes found on topicref tags.

badXRefs - Bad Xrefs - by file

badXRefsList - Bad Xrefs - all

badXml1 - [Files] with a bad xml encoding header on the first line.

badXml2 - [Files] with a bad xml doc type on the second line.

baseTag - Base Tag for each file

conRefs - {file}{href} Count of conref definitions in each file.

docType - {file} == docType: the docType for each xml file.

duplicateIds - [file, id] Duplicate id definitions within each file.

duplicateTopicIds - [topicId, [files]] Files with duplicate topic ids - the id on the outermost tag.

fileExtensions - Default file extensions to load

fixRefs - {file}{ref} where the href or conref target is not valid.

fixedFolder - Fixed files are placed in this folder if fixBadRefs has been specified.

fixedRefs - [] hrefs and conrefs from fixRefs

fixedRefsFailed - [] hrefs and conrefs from fixRefs

fixedRefsGB - [] files fixed to the Gearhart-Brenan file naming standard

fixedRefsNoAction - [] hrefs and conrefs from fixRefs

flattenFiles - {old full file name} = file renamed to Gearhart-Brenan file naming standard

goodBookMaps - Good book maps.

goodConRefs - Good con refs - by file.

goodConRefsList - Good con refs - all.

goodGuidHrefs - {file}{href}{location}++ where a href that starts with GUID- has been correctly resolved.

goodImageRefs - Consolidated images found.

goodNavTitles - Details of nav titles that were resolved

goodTopicRefs - Good topic refs.

goodXRefs - Good xrefs - by file.

goodXRefsList - Good xrefs - all.

guidHrefs - {file}{href} = location where href starts with GUID- and is thus probably a guid.

guidToFile - {topic id which is a guid} = file defining topic id.

hrefUrlEncoding - Hrefs that need url encoding because they contain white space

ids - {file}{id} Id definitions across all files.

images - {file}{href} Count of image references in each file.

imagesReferencedFromBookMaps - {bookmap full file name}{full name of image referenced from topic referenced from bookmap}++

imagesReferencedFromTopics - {topic full file name}{full name of image referenced from topic}++

improvements - Suggested improvements - a list of improvements that might be made.

inputFiles - Input files from inputFolder.

inputFolderImages - {full image file name} for all files in input folder thus including any images resent

ltgt - {text between &lt; and &gt}{filename} = count giving the count of text items found between &lt; and &gt;

maxZoomOut - Results from maxZoomIn where {file name}{regular expression key name in maxZoomIn}++

md5Sum - MD5 sum for each input file.

missingImageFiles - [file, href] == Missing images in each file.

missingTopicIds - Missing topic ids.

noHref - Tags that should have an href but do not have one.

notReferenced - {file name} Files in input area that are not referenced by a conref, image, topicref or xref tag and are not a bookmap.

olBody - The number of ol under body by file

parseFailed - {file} files that failed to parse.

relocatedReferencesFailed - Failing references that were not fixed by relocation

relocatedReferencesFixed - Relocated references fixed

results - Summary of results table.

sourceFile - The source file from which this structure was generated.

statusLine - Status line summarizing the cross reference.

statusTable - Status table summarizing the cross reference.

tagCount - {file}{tags} == count of the different tag names found in the xml files.

title - {file} = title of file.

titleToFile - {title}{file}++ if fixXrefsByTitle is in effect

topicIds - {file} = topic id - the id on the outermost tag.

topicRefs - {bookmap full file name}{href}{navTitle}++ References from bookmaps to topics via appendix, chapter, topicref.

topicsReferencedFromBookMaps - {bookmap file, file name}{topic full file name}++

validationErrors - True means that Lint detected errors in the xml contained in the file.

vocabulary - The text of each topic shorn of attributes for vocabulary comparison.

xRefs - {file}{href}++ Xrefs references.

xrefBadFormat - External xrefs with no format=html.

xrefBadScope - External xrefs with no scope=external.

Attributes

The following is a list of all the attributes in this package. A method coded with the same name in your package will over ride the method of the same name in this package and thus provide your value for the attribute in place of the default value supplied for this attribute by this package.

Replaceable Attribute List

improvementLength

improvementLength

Improvement length

Private Methods

countLevels($$)

Count has elements to the specified number of levels

   Parameter  Description
1  $l         Levels
2  $h         Hash

loadInputFiles($)

Load the names of the files to be processed

   Parameter  Description
1  $xref      Cross referencer

analyzeOneFile($$)

Analyze one input file

   Parameter  Description
1  $Xref      Xref request
2  $iFile     File to analyze

reportGuidsToFiles($)

Map and report guids to files

   Parameter  Description
1  $xref      Xref results

editXml($$$)

Edit an xml file

   Parameter  Description
1  $in        Input file
2  $out       Output file
3  $x         Parse tree

fixOneFile($$)

Fix one file by moving unresolved references to the xtrf attribute

   Parameter  Description
1  $xref      Xref results
2  $file      File to fix

fixFiles($)

Fix files by moving unresolved references to the xtrf attribute if no other solution is available

   Parameter  Description
1  $xref      Xref results

fixOneFileGB($$)

Fix one file to the Gearhart-Brenan standard

   Parameter  Description
1  $xref      Xref results
2  $file      File to fix

fixFilesGB($)

Rename files to the Gearhart-Brenan standard

   Parameter  Description
1  $xref      Xref results

analyze($)

Analyze the input files

   Parameter  Description
1  $xref      Cross referencer

reportDuplicateIds($)

Report duplicate ids

   Parameter  Description
1  $xref      Cross referencer

reportDuplicateTopicIds($)

Report duplicate topic ids

   Parameter  Description
1  $xref      Cross referencer

reportNoHrefs($)

Report locations where an href was expected but not found

   Parameter  Description
1  $xref      Cross referencer

reportRefs($$)

Report bad references found in xrefs or conrefs as they have the same structure

   Parameter  Description
1  $xref      Cross referencer
2  $type      Type of reference to be processed

reportGuidHrefs($)

Report on guid hrefs

   Parameter  Description
1  $xref      Cross referencer

reportXrefs($)

Report bad xrefs

   Parameter  Description
1  $xref      Cross referencer

reportTopicRefs($)

Report topic refs

   Parameter  Description
1  $xref      Cross referencer

reportConrefs($)

Report bad conrefs refs

   Parameter  Description
1  $xref      Cross referencer

reportImages($)

Reports on images and references to images

   Parameter  Description
1  $xref      Cross referencer

reportParseFailed($)

Report failed parses

   Parameter  Description
1  $xref      Cross referencer

reportXml1($)

Report bad xml on line 1

   Parameter  Description
1  $xref      Cross referencer

reportXml2($)

Report bad xml on line 2

   Parameter  Description
1  $xref      Cross referencer

reportDocTypeCount($)

Report doc type count

   Parameter  Description
1  $xref      Cross referencer

reportTagCount($)

Report tag counts

   Parameter  Description
1  $xref      Cross referencer

reportLtGt($)

Report items found between &lt; and &gt;

   Parameter  Description
1  $xref      Cross referencer

reportAttributeCount($)

Report attribute counts

   Parameter  Description
1  $xref      Cross referencer

reportAttributeNamesAndValuesCount($)

Report attribute value counts

   Parameter  Description
1  $xref      Cross referencer

reportValidationErrors($)

Report the files known to have validation errors

   Parameter  Description
1  $xref      Cross referencer

checkBookMap($$)

Check whether a bookmap is valid or not

   Parameter  Description
1  $xref      Cross referencer
2  $bookMap   Bookmap

reportBookMaps($)

Report on whether each bookmap is good or bad

   Parameter  Description
1  $xref      Cross referencer

reportTables($)

Report on tables that have problems

   Parameter  Description
1  $xref      Cross referencer

reportFileExtensionCount($)

Report file extension counts

   Parameter  Description
1  $xref      Cross referencer

reportFileTypes($)

Report file type counts - takes too long in series

   Parameter  Description
1  $xref      Cross referencer

reportNotReferenced($)

Report files not referenced by any of conref, image, topicref, xref and are not bookmaps.

   Parameter  Description
1  $xref      Cross referencer

reportExternalXrefs($)

Report external xrefs missing other attributes

   Parameter  Description
1  $xref      Cross referencer

reportPossibleImprovements($)

Report improvements possible

   Parameter  Description
1  $xref      Cross referencer

reportMaxZoomOut($)

Text located via Max Zoom In

   Parameter  Description
1  $xref      Cross referencer

reportTopicDetails($)

Things that occur once in each file

   Parameter  Description
1  $xref      Cross referencer

reportTopicReuse($)

Count how frequently each topic is reused

   Parameter  Description
1  $xref      Cross referencer

reportFixRefs($)

Report of hrefs that need to be fixed

   Parameter  Description
1  $xref      Cross referencer

reportReferencesFromBookMaps($)

Topics and images referenced from bookmaps

   Parameter  Description
1  $xref      Cross referencer

reportSimilarTopicsByTitle($)

Report topics likely to be similar on the basis of their titles as expressed in the non Guid part of their file names

   Parameter  Description
1  $xref      Cross referencer

reportSimilarTopicsByVocabulary($)

Report topics likely to be similar on the basis of their vocabulary

   Parameter  Description
1  $xref      Cross referencer

reportMd5Sum($)

Good files have short names which uniquely represent their content and thus can be used instead of their md5sum to generate unique names

   Parameter  Description
1  $xref      Cross referencer

reportOlBody($)

ol under body - indicative of a task

   Parameter  Description
1  $xref      Cross referencer

reportHrefUrlEncoding($)

href needs url encoding

   Parameter  Description
1  $xref      Cross referencer

addNavTitlesToOneMap($$)

Fix navtitles in one map

   Parameter  Description
1  $xref      Xref results
2  $file      File to fix

addNavTitlesToMaps($)

Add nav titles to files containing maps.

   Parameter  Description
1  $xref      Xref results

createSampleInputFiles($)

Create sample input files for testing. The attribute inputFolder supplies the name of the folder in which to create the sample files.

   Parameter  Description
1  $N         Number of sample files

createSampleInputFilesFixFolder($)

Create sample input files for testing fixFolder

   Parameter  Description
1  $in        Folder to create the files in

createSampleInputFilesLtGt($)

Create sample input files for testing items between &lt; and &gt;

   Parameter  Description
1  $in        Folder to create the files in

Index

1 addNavTitlesToMaps - Add nav titles to files containing maps.

2 addNavTitlesToOneMap - Fix navtitles in one map

3 analyze - Analyze the input files

4 analyzeOneFile - Analyze one input file

5 checkBookMap - Check whether a bookmap is valid or not

6 countLevels - Count has elements to the specified number of levels

7 createSampleInputFiles - Create sample input files for testing.

8 createSampleInputFilesFixFolder - Create sample input files for testing fixFolder

9 createSampleInputFilesLtGt - Create sample input files for testing items between &lt; and &gt;

10 editXml - Edit an xml file

11 fixFiles - Fix files by moving unresolved references to the xtrf attribute if no other solution is available

12 fixFilesGB - Rename files to the Gearhart-Brenan standard

13 fixOneFile - Fix one file by moving unresolved references to the xtrf attribute

14 fixOneFileGB - Fix one file to the Gearhart-Brenan standard

15 loadInputFiles - Load the names of the files to be processed

16 reportAttributeCount - Report attribute counts

17 reportAttributeNamesAndValuesCount - Report attribute value counts

18 reportBookMaps - Report on whether each bookmap is good or bad

19 reportConrefs - Report bad conrefs refs

20 reportDocTypeCount - Report doc type count

21 reportDuplicateIds - Report duplicate ids

22 reportDuplicateTopicIds - Report duplicate topic ids

23 reportExternalXrefs - Report external xrefs missing other attributes

24 reportFileExtensionCount - Report file extension counts

25 reportFileTypes - Report file type counts - takes too long in series

26 reportFixRefs - Report of hrefs that need to be fixed

27 reportGuidHrefs - Report on guid hrefs

28 reportGuidsToFiles - Map and report guids to files

29 reportHrefUrlEncoding - href needs url encoding

30 reportImages - Reports on images and references to images

31 reportLtGt - Report items found between &lt; and &gt;

32 reportMaxZoomOut - Text located via Max Zoom In

33 reportMd5Sum - Good files have short names which uniquely represent their content and thus can be used instead of their md5sum to generate unique names

34 reportNoHrefs - Report locations where an href was expected but not found

35 reportNotReferenced - Report files not referenced by any of conref, image, topicref, xref and are not bookmaps.

36 reportOlBody - ol under body - indicative of a task

37 reportParseFailed - Report failed parses

38 reportPossibleImprovements - Report improvements possible

39 reportReferencesFromBookMaps - Topics and images referenced from bookmaps

40 reportRefs - Report bad references found in xrefs or conrefs as they have the same structure

41 reportSimilarTopicsByTitle - Report topics likely to be similar on the basis of their titles as expressed in the non Guid part of their file names

42 reportSimilarTopicsByVocabulary - Report topics likely to be similar on the basis of their vocabulary

43 reportTables - Report on tables that have problems

44 reportTagCount - Report tag counts

45 reportTopicDetails - Things that occur once in each file

46 reportTopicRefs - Report topic refs

47 reportTopicReuse - Count how frequently each topic is reused

48 reportValidationErrors - Report the files known to have validation errors

49 reportXml1 - Report bad xml on line 1

50 reportXml2 - Report bad xml on line 2

51 reportXrefs - Report bad xrefs

52 xref - Check the cross references in a set of Dita files held in inputFolder and report the results in the reports folder.

Installation

This module is written in 100% Pure Perl and, thus, it is easy to read, comprehend, use, modify and install via cpan:

sudo cpan install Data::Edit::Xml::Xref

Author

philiprbrenan@gmail.com

http://www.appaapps.com

Copyright

Copyright (c) 2016-2019 Philip R Brenan.

This module is free software. It may be used, redistributed and/or modified under the same terms as Perl itself.

4 POD Errors

The following errors were encountered while parsing the POD:

Around line 2922:

You forgot a '=back' before '=head2'

You forgot a '=back' before '=head2'

Around line 3143:

Nested L<> are illegal. Pretending inner one is X<...> so can continue looking for other errors.

Unterminated L<...> sequence

Around line 3145:

Nested L<> are illegal. Pretending inner one is X<...> so can continue looking for other errors.

Nested L<> are illegal. Pretending inner one is X<...> so can continue looking for other errors.

Unterminated L<...> sequence

Around line 3149:

Unterminated L<...> sequence