NAME
TAP::DOM - TAP as Document Object Model.
SYNOPSIS
# Create a DOM from TAP
use TAP::DOM;
my $tapdom = TAP::DOM->new( tap => $tap ); # same options as TAP::Parser
print Dumper($tapdom);
# Recreate TAP from DOM
my $tap2 = $tapdom->to_tap;
DESCRIPTION
The purpose of this module is
That is useful when you want to analyze the TAP in detail with "data exploration tools", like Data::DPath.
``Reliable'' means that this structure is kind of an API that will not change, so your data tools can, well, rely on it.
METHODS
new
Constructor which immediately triggers parsing the TAP via TAP::Parser and returns a big data structure containing the extracted results.
Synopsis
my $tap;
{
local $/; open (TAP, '<', 't/some_tap.txt') or die;
$tap = <TAP>;
close TAP;
}
my $tapdata = TAP::DOM->new (
tap => $tap
disable_global_kv_data => 1,
put_dangling_kv_data_under_lazy_plan => 1,
ignorelines => '(## |# Test-mymeta_)',
dontignorelines => '# Test-mymeta_(tool1|tool2)_',
ignoreunknown => 1,
preprocess_ignorelines => 1,
preprocess_tap => 1,
usebitsets => 0,
ignore => ['as_string'], # keep 'raw' which is the unmodified variant
document_data_prefix => '(MyApp|Test)-',
lowercase_fieldnames => 1,
trim_fieldvalues => 1,
);
Arguments
- ignore
-
Arrayref of fieldnames not to contain in generated TAP::DOM. For example you can skip the
as_string
field which is often a redundant variant ofraw
. - ignorelines
-
A regular expression describing lines to ignore.
Be careful to not screw up semantically relevant lines, like indented YAML data.
The regex is internally prepended with a start-of-line
^
anchor. - dontignorelines (EXPERIMENTAL!)
-
This is the whitelist of lines to not being skipped when using the
ignore
blacklist.The
dontignorelines
feature is HIGHLY EXPERIMENTAL, in particular in combination withpreprocess_ignorelines
.Background: the preprocessing is done in a single regex operation for speed reasons, and to do that the
dontignorelines
regex is turned into a zero-width negative-lookahead condition and prepended before theignorelines
condition into a combined regex.Without
preprocess_ignorelines
it is a relatively harmless additional condition during TAP line processing.Survival tips:
have unit tests for your setup
do not use
^
anchors neither inignorelines
nor indontignorelines
but rely on the implicitly prepended anchors.write both
ignorelines
anddontignorelines
completely describing from beginning of line (yet without the^
anchor).do not use it but define
ignorelines
instead with your own zero-width negative-lookaround conditionsknow the zero-width negative look-around conditions of your use Perl version
- ignoreunknown
-
By default non-TAP lines are still part of the TAP::DOM (with
is_unknown=1
and most other entry fields set toundef
).If you mix a lot of non-TAP lines with actual TAP lines then this can lead to a huge TAP::DOM data structure.
With this option set to 1 the
unknown
lines are skipped. - usebitsets
-
Instead of having a lot of long boolean fields like
has_skip => 1 has_todo => 0
you can encode all of them into a compact bitset
is_has => $SOME_NUMERIC_REPRESENTATION
This field must be evaluated later with bit-comparison operators.
Originally meant as memory-saving mechanism it turned out not to be worth the hazzle.
- disable_global_kv_data
-
Early TAP::DOM versions put all lines like
# Test-foo: bar
into a global hash. Later these fields are placed as children under their parent
ok
/not ok
line but kept globally for backwards compatibility. With this flag you can drop the redundant global hash.But see also
put_dangling_kv_data_under_lazy_plan
. - put_dangling_kv_data_under_lazy_plan
-
This addresses the situation what to do in case a key/value field from a line
# Test-foo: bar
appears without a parent
ok
/not ok
line and the global kv_data hash is disabled. When this option is set it's placed under the plan as parent. - document_data_prefix
-
To interpret lines like
# Test-foo: bar
the
document_data_prefix
is by default set toTest-
so that a key/value fieldfoo => 'bar'
is generated. However, you can have a regular expression to capture other or multiple different values as allowed prefixes.
- document_data_ignore
-
This is another regex-based way to avoid generating particular fields. This regex is matched against the already extracted keys, and stops processing of this field for
document_data
andkv_data
. - lowercase_fieldnames
-
If set to a true value all recognized fields are lowercased.
- lowercase_fieldvalues
-
If set to a true value all recognized values are lowercased.
- trim_fieldvalues
-
If set to a true value all field values are trimmed of trailing whitespace. Note that fields don't have leading whitespace as it's already consumed away after the fieldname separator colon
:
.
All other provided parameters are passed through to TAP::Parser, see sections "HOW TO STRIP DETAILS" and "USING BITSETS". Usually the options are just one of those:
tap => $some_tap_string
or
source => $test_file
But there are more, see TAP::Parser.
to_tap
Called on a TAP::DOM object it returns a string that is TAP.
STRUCTURE
The data structure is basically a nested hash/array structure with keys named after the functions of TAP::Parser that you normally would use to extract results.
See the TAP example file in t/some_tap.txt
and its corresponding result structure in t/some_tap.dom
.
Here is a slightly commented and beautified excerpt of t/some_tap.dom
. Due to it's beeing manually washed for readability there might be errors in it, so for final reference, dump a DOM by yourself.
bless( {
# general TAP stats:
'version' => 13,
'plan' => '1..6',
'tests_planned' => 6
'tests_run' => 8,
'is_good_plan' => 0,
'has_problems' => 2,
'skip_all' => undef,
'parse_errors' => 1,
'parse_errors_msgs' => [
'Bad plan. You planned 6 tests but ran 8.'
],
'pragmas' => [
'strict'
],
'exit' => 0,
'start_time' => '1236463400.25151',
'end_time' => '1236463400.25468',
# the used TAP::DOM specific options to TAP::DOM->new():
'tapdom_config' => {
'ignorelines' => qr/(?-xism:^## )/,
'usebitsets' => undef,
'ignore' => {}
},
# summary according to TAP::Parser::Aggregator:
'summary' => {
'status' => 'FAIL',
'total' => 8,
'passed' => 6,
'failed' => 2,
'all_passed' => 0,
'skipped' => 1,
'todo' => 4,
'todo_passed' => 2,
'parse_errors' => 1,
'has_errors' => 1,
'has_problems' => 1,
'exit' => 0,
'wait' => 0
'elapsed' => bless( [
0,
'0',
0,
0,
0,
0
], 'Benchmark' ),
'elapsed_timestr' => ' 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)',
},
# all recognized TAP lines:
'lines' => [
{
'is_actual_ok' => 0,
'is_bailout' => 0,
'is_comment' => 0,
'is_plan' => 0,
'is_pragma' => 0,
'is_test' => 0,
'is_unknown' => 0,
'is_version' => 1, # <---
'is_yaml' => 0,
'has_skip' => 0,
'has_todo' => 0,
'severity' => 0,
'raw' => 'TAP version 13'
'as_string' => 'TAP version 13',
},
{
'is_actual_ok' => 0,
'is_bailout' => 0,
'is_comment' => 0,
'is_plan' => 1, # <---
'is_pragma' => 0,
'is_test' => 0,
'is_unknown' => 0,
'is_version' => 0,
'is_yaml' => 0,
'has_skip' => 0,
'has_todo' => 0,
'severity' => 0,
'raw' => '1..6'
'as_string' => '1..6',
},
{
'is_actual_ok' => 0,
'is_bailout' => 0,
'is_comment' => 0,
'is_ok' => 1, # <---
'is_plan' => 0,
'is_pragma' => 0,
'is_test' => 1, # <---
'is_unknown' => 0,
'is_unplanned' => 0,
'is_version' => 0,
'is_yaml' => 0,
'has_skip' => 0,
'has_todo' => 0,
'number' => '1', # <---
'severity' => 1,
'type' => 'test',
'raw' => 'ok 1 - use Data::DPath;'
'as_string' => 'ok 1 - use Data::DPath;',
'description' => '- use Data::DPath;',
'directive' => '',
'explanation' => '',
'_children' => [
# ----- children are the subsequent comment/yaml lines -----
{
'is_actual_ok' => 0,
'is_unknown' => 0,
'has_todo' => 0,
'is_bailout' => 0,
'is_pragma' => 0,
'is_version' => 0,
'is_comment' => 0,
'has_skip' => 0,
'is_test' => 0,
'is_yaml' => 1, # <---
'is_plan' => 0,
'raw' => ' ---
- name: \'Hash one\'
value: 1
- name: \'Hash two\'
value: 2
...'
'as_string' => ' ---
- name: \'Hash one\'
value: 1
- name: \'Hash two\'
value: 2
...',
'data' => [
{
'value' => '1',
'name' => 'Hash one'
},
{
'value' => '2',
'name' => 'Hash two'
}
],
}
],
},
{
'is_actual_ok' => 0,
'is_bailout' => 0,
'is_comment' => 0,
'is_ok' => 1, # <---
'is_plan' => 0,
'is_pragma' => 0,
'is_test' => 1, # <---
'is_unknown' => 0,
'is_unplanned' => 0,
'is_version' => 0,
'is_yaml' => 0,
'has_skip' => 0,
'has_todo' => 0,
'explanation' => '',
'number' => '2', # <---
'type' => 'test',
'description' => '- KEYs + PARENT',
'directive' => '',
'severity' => 1,
'raw' => 'ok 2 - KEYs + PARENT'
'as_string' => 'ok 2 - KEYs + PARENT',
},
# etc., see the rest in t/some_tap.dom ...
],
}, 'TAP::DOM') # blessed
NESTED LINES
As you can see above, diagnostic lines (comment or yaml) are nested into the line before under a key _children
which simply contains an array of those comment/yaml line elements.
With this you can recognize where the diagnostic lines semantically belong.
HOW TO STRIP DETAILS
You can make the DOM a bit more terse (i.e., less blown up) if you do not need every detail.
Strip unneccessary TAP-DOM fields
For this provide the ignore
option to new(). It is an array ref specifying keys that should not be contained in the TAP-DOM. Currently supported are:
has_todo
has_skip
directive
as_string
explanation
description
is_unplanned
is_actual_ok
is_bailout
is_unknown
is_version
is_bailout
is_comment
is_pragma
is_plan
is_test
is_yaml
is_ok
number
type
raw
Use it like this:
$tapdom = TAP::DOM->new (tap => $tap,
ignore => [ qw( raw as_string ) ],
);
Strip unneccessary lines
You can ignore complete lines from the input TAP as if they weren't existing by by setting a regular expression in ignorelines
. Of course you can break the TAP with this, so usually you only apply this to non-TAP lines or diagnostics you are not interested in.
My primary use-case is TAP with large parts of logfiles included with a prefixed "## " just for dual-using the TAP also as an archive of the log. When evaluating the TAP later I leave those log lines out because they only blow up the memory for the TAP-DOM:
$tapdom = TAP::DOM->new (tap => $tap,
ignorelines => qr/^## /,
);
See t/some_tap_ignore_lines.t
for an example.
Pre-process TAP
WARNING, experimental features!
preprocess_ignorelines
By setting that option,
ignorelines
is applied to the input TAP text before it is parsed.This could help to speed up TAP parsing when there is a huge amount of non-TAP lines that the regex engine could throw away faster than TAP::Parser would parse it line by line.
There is a risk: without that option, only lines are filtered that are already parsed as lines by the TAP parser. If applied before parsing, the regex could mis-match non-trivial situations.
preprocess_tap
With this option, any lines that don't obviously look like TAP are stripped away.
There is a substantial risk, though: the purely line-based regex processing could screw up when it mis-matches lines. Parsing TAP is not as obvious as it seems first. Just think of unindented YAML or indented YAML with strange multi-line spanning values at line starts, or the (non-standardized and unsupported) nested indented TAP. So be careful!
noempty_tap
When a document is empty (which can also happen after preprocessing) then this option set to 1 triggers to put in some replacement line.
pragma +tapdom_error # document was empty
which in turn assigns it an error severity, so that these situations are no longer invisible.
utf8
Declare a document is UTF-8 encoded Unicode.
This triggers decoding the document accordingly, inclusive filtering out illegal Unicode characters.
In particular it converts illegal chars into Unicode REPLACEMENT CHARACTER (
\N{U+FFFD}
... i.e. diamond with question mark in it).For more info see:
https://stackoverflow.com/a/2656433/1342345
https://metacpan.org/pod/Encode#FB_DEFAULT
https://en.wikipedia.org/wiki/Specials_(Unicode_block)#Replacement_character
Additionall convert
\0
as it's not covered by Encode::decode() but is still illegal for some tools.
USING BITSETS
Option "usebitsets"
You can make the DOM even smaller by using the option usebitsets
:
$tapdom = TAP::DOM->new (tap => $tap, usebitsets => 1 );
In this case all the 'has_*' and 'is_*' attributes are stored in a common bitset entry 'is_has' with their respective bits set.
This reduces the memory footprint of a TAP::DOM remarkably (for large TAP-DOMs ~40%) and is meant as an optimization option for memory constrained problems.
Access bitset attributes via methods
You can get the actual values of 'is_*' and 'has_*' attributes regardless of their storage as hash entries or bitsets by using the respective methods on single entries:
if ($tapdom->{lines}[4]->is_test) {...}
if ($tapdom->{lines}[4]->is_ok) {...}
...
or with even less direct hash access
if ($tapdom->lines->[4]->is_test) {...}
if ($tapdom->lines->[4]->is_ok) {...}
...
Access bitset attributes via bit comparisons
You can also use constants that represent the respective bits in expressions like this:
if ($tapdom->{lines}[4]{is_has} | $TAP::DOM::IS_TEST) {...}
And the constants can be imported into your namespace:
use TAP::DOM ':constants';
if ($tapdom->{lines}[4]{is_has} | $IS_TEST ) {...}
Tweak the resulting DOM
Lowercase all key:value fieldnames
By setting option lowercase_fieldnames
all field names (hash keys) in document_data
and kv_data
are set to lowercase. This is especially helpful to normalize different casing like
# Test-Strange-Key: some value
# Test-strange-key: some value
# Test-STRANGE-KEY: some value
etc. all into
"strange-key" => "some value"
Lowercase all key:value values
By setting option lowercase_fieldvalues
all field values in document_data
and kv_data
are set to lowercase. This is especially helpful to normalize different casing like
# Test-strange-key: Some Value
# Test-strange-key: Some value
# Test-strange-key: SOME VALUE
etc. all into
"strange-key" => "some value"
Warning: while the sister option lowercase_fieldnames
above is obviously helpful to keep the information more together, this lowercase_fieldvalues
option here should be used with care. You loose much more information here which is usually better searched via case-insensitive options of the mechanism you use, regular expressions, Elasticsearch, etc.
Placing key:value pairs
Normally a key:value pair {foo =
bar}> from a line like
# Test-foo: bar
ends up as entry in a has kv_values
under the entry before that line - which ideally is either a normal ok/not_ok line or a plan line.
If that's not the case then it is not clear where they belong. Early TAP::DOM versions had put them under a global entry document_data
.
However this makes these entries inconsistently appear in different levels of the DOM. so you can suppress that old behaviour by setting disable_global_kv_data
to 1.
However, with that option now, there can be lines that appear directly at the start with no preceding parent line, in case the plan comes at the end of the document. To not loose those key values they can be saved up until the plan appears later and put it there. As this reorders data inside the DOM differently from the original document you must explicitely request that behaviour by setting put_dangling_kv_data_under_lazy_plan
to 1.
Summary: for consistency it is suggested to set both options:
disable_global_kv_data => 1,
put_dangling_kv_data_under_lazy_plan => 1
ACCESSORS AND UTILITY METHODS
end_time
exit
has_problems
is_good_plan
parse_errors
parse_errors_msgs
plan
pragmas
skip_all
start_time
summary
tapdom_config
utf8_tap
The actual worker function behind utf8
option.
document_data
A document can contain comment lines which actually contain key/value data, like this:
# Test-vendor-id: GenuineIntel
# Test-cpu-model: Intel(R) Core(TM) i7-3667U CPU @ 2.00GHz
# Test-cpu-family: 6
# Test-flags.fpu: 1
Those lines are converted into a hash by splitting it at the :
delimiter and stripping the # Test-
prefix. The resulting data structure looks like this:
# ... inside TAP::DOM ...
document_data => {
'vendor-id' => 'GenuineIntel',
'cpu-model' => #Intel(R) Core(TM) i7-3667U CPU @ 2.00GHz',
'cpu-family' => 6,
'flags.fpu' => 1,
},
tests_planned
tests_run
version
ADDITIONAL ATTRIBUTES
TAP::DOM creates attributes beyond those from TAP::Parser, usually to simplify later processing.
severity
The severity
describes the combination of ok
/not ok
and todo
/skip
directives as one single numeric value.
This allows to handle the otherwise nominal values as ordinal value, i.e., it provides them with a particular order.
This order is explained as this:
0 - represents the 'missing' severity.
It is used for all things that are not a test or as fallback when the other attributes appear in illegal combinations (like saying both SKIP and TODO).
1 - straight ok.
2 - ok with a
#TODO
That's slightly worse than a straight ok because of the directive.
3 - ok with a
#SKIP
That's one step worse because the ok is not from actual test execution, as that's what skip means.
4 - not_ok with a
#TODO
That's worse as it represents a fail but it's a known issue.
5 - straight not_ok.
A straight fail is the worst real-world value.
6 - forbidden combination of a not_ok with a
#SKIP
.How can it fail when it was skipped? That's why it's even worse than worst.
A severity value is set for lines of type test
and plan
.
Additionally, it is set on the TAP::DOM-specific pragma +tapdom_error
with a severity value 5 (i.e., not_ok). Because a pragma doesn't interfere with test
/plan
lines you can use this to express an out-of-band error situation which would be lost otherwise. Read below for more.
TAP::DOM-SPECIFIC PRAGMAS
Pragmas in TAP are meant to influence the behaviour of the TAP parser.
TAP::DOM recognizes special pragmas. They are all prefixed with tapdom_
.
So far there is:
+tapdom_error - assign this line a severity of 5 (not ok)
You can for instance append this pragma to the TAP document during post-processing to express an out-of-band error situation without interfering with the existing test lines and plan.
Typical situations can be a an error from
prove
or other TAP processor, and you want to ensure this problem does not get lost when storing the document in a database.Pragmas allow
kv_data
like intest
andplan
lines, so you can transport additional error details like this:pragma +tapdom_error # Test-tapdom-error-type: prove # Test-tapdom-prove-exit: 1
AUTHOR
Steffen Schwigon <ss5@renormalist.net>
COPYRIGHT AND LICENSE
This software is copyright (c) 2024 by Steffen Schwigon.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.