NAME

Parse::Taxonomy::Cookbook - cookbook for Parse-Taxonomy

DESCRIPTION

This file is a cookbook holding usage examples -- recipes -- for various Parse::Taxonomy subclasses.

The documentation herein presumes that you have already studied the documentation in Parse::Taxonomy, Parse::Taxonomy::MaterializedPath, Parse::Taxonomy::AdjacentList, etc.

RECIPES

Validate a taxonomy-by-materialized-path

Problem

You have a CSV file which you have been told is a taxonomy-by-materialized-path. You want to confirm its validity.

Let's say the file holds these records:

$> cat ./proposed_taxonomy.csv
"path","vertical","is_actionable"
"|Alpha","Auto",,"0"
"|Alpha|Epsilon|Kappa","Auto","0"
"|Alpha|Epsilon|Kappa","Auto","1"
"|Alpha|Zeta","Auto","0"
"|Alpha|Zeta|Lambda","Auto","1"
"|Alpha|Zeta|Mu","Auto","0"

Solution

Try to create a Parse::Taxonomy::MaterializedPath object using the file interface.

local $@;
eval {
    $self = Parse::Taxonomy::MaterializedPath->new( {
        file => './proposed_taxonomy.csv',
    } );
};
print STDERR "$@\n";

If $self is created successfully, the taxonomy meets the requirements described in Parse::Taxonomy. This particular file, however, will throw an exception. Examination of the content of $@ will show that two records have the same materialized path, i.e., the same value in the path column.

Validate a taxonomy-by-adjacent-list

Problem

You have a CSV file which you have been told is a taxonomy-by-adjacent-list. You want to confirm its validity.

Let's say the file holds these records:

$> cat ./proposed_taxonomy_by_index.csv
"id","parent_id","name","vertical","is_actionable"
"1","","Alpha","Auto","0"
"2","1","Epsilon","Auto","0"
"3","2","Kappa","Auto","1"
"4","2","Kappa","Auto","1"
"5","1","Zeta","Auto","0"
"6","5","Lambda","Auto","1"
"7","5","Mu","Auto","1"

Solution

Try to create a Parse::Taxonomy::AdjacentList object using the file interface.

local $@;
eval {
    $self = Parse::Taxonomy::AdjacentList->new( {
        file => './proposed_taxonomy_by_index.csv',
    } );
};
print STDERR "$@\n";

If $self is created successfully, the taxonomy meets the requirements described in Parse::Taxonomy. This particular file, however, will throw an exception. Examination of the content of $@ will show that two records with the same parent_id have the same name.

Apply extra validations to a taxonomy

Problem

You have a taxonomy file from which you have successfully created a Parse::Taxonomy::MaterializedPath object. From that you know that it is valid with respect to the requirements for a taxonomy imposed by this library. But you have additional business requirements which a taxonomy must fulfill before you can use the taxonomy in production.

Suppose that you have a taxonomy file with this data:

$> cat local_requirement.csv
"path","is_actionable"
"|Alpha","0"
"|Beta","0"
"|Alpha|Epsilon","0"
"|Alpha|Epsilon|Kappa","1"
"|Alpha|Zeta","0"
"|Alpha|Zeta|Lambda","1"
"|Alpha|Zeta|Mu","0"
"|Beta|Eta","1"
"|Beta|Theta","1"
"|Beta|Iota","0"

Suppose further that you have a business requirement that all nodes which are "pure" leaf nodes -- all nodes which have no children of their own -- have a true value for is_actionable.

Solution

Use Parse::Taxonomy:::Path accessor methods to get at the data in the taxonomy and then write your own functions to conduct local validations.

In this case:

$self = Parse::Taxonomy::MaterializedPath->new( {
    file    => 'local_requirement.csv',
} );
$hashified       = $self->hashify();
$child_counts    = $self->child_counts();

Use hashify() to turn the taxonomy into a hash. Use child_counts() to get the number of children each node has. Then iterate over the hash checking whether an element has no children and, if so, whether the node's is_actionable setting is true.

@non_actionable_leaf_nodes = ();
for my $node (keys %{$hashified}) {
    if (
        ($child_counts->{$node} == 0) &&
        (! $hashified->{$node}->{is_actionable})
    ) {
        push @non_actionable_leaf_nodes, $node;
    }
}
warn "leaf node '$_' is non-actionable"
    for @non_actionable_leaf_nodes;

Output will resemble:

leaf node '|Alpha|Zeta|Mu' is non-actionable at ...
leaf node '|Beta|Iota' is non-actionable at ...

You can then decide how to handle this per your business requirements.

Convert a taxonomy-by-materialized-path to a taxonomy-by-adjacent-list

Problem

You have a file which holds a validated taxonomy-by-materialized-path and you want to create a file which holds the equivalent taxonomy-by-adjacent-list.

Suppose you have a file with this data:

"path","is_actionable"
"|Alpha","0"
"|Beta","0"
"|Alpha|Epsilon","0"
"|Alpha|Epsilon|Kappa","1"
"|Alpha|Zeta","0"
"|Alpha|Zeta|Lambda","1"
"|Alpha|Zeta|Mu","1"
"|Beta|Eta","1"
"|Beta|Theta","1"
"|Beta|Iota","1"

Solution

Use the adjacentify() and write_adjacentified_to_csv() methods.

$adjacentified = $self->adjacentify();
$file_taxonomy_by_index = $self->write_adjacentified_to_csv($adjacentified);

The file whose path is stored in $file_taxonomy_by_index will look like this:

id,parent_id,name,is_actionable
1,,Alpha,0
2,,Beta,0
3,1,Epsilon,0
4,1,Zeta,0
5,2,Eta,1
6,2,Theta,1
7,2,Iota,1
8,3,Kappa,1
9,4,Lambda,1
10,4,Mu,1

Convert a taxonomy-by-adjacent-list to a taxonomy-by-materialized-path

Problem

In a relational database (RDB), you have hierarchical data stored in a flat table by way of id, parent_id and name columns. You need to communicate the current status of that taxonomy to someone who is familiar with CSV-formatted data and who would like to see the structure in that taxonomy expressed in a single column.

Solution

First you need to get the data out of the RDB and into a text file. For that you might use a command-line language appropriate for that in RDB. For example, in psql, the command-line language associated with PostgreSQL, you would say:

$> \copy (SELECT id, parent_id, name, is actionable FROM my_table) TO /path/to/taxonomy.csv WITH CSV HEADERS

The CSV file would then contain data like this:

"id","parent_id","name","is_actionable"
"1","","Alpha","0"
"2","","Beta","0"
"3","1","Epsilon","0"
"4","3","Kappa","1"
"5","1","Zeta","0"
"6","5","Lambda","1"
"7","5","Mu","0"
"8","2","Eta","1"
"9","2","Theta","1"

First, create a Parse::Taxonomy::AdjacentList object from this source file, and then apply the pathify() method to it with the as_string option set to a true value.

$source = "/path/to/taxonomy.csv";
$self = Parse::Taxonomy::AdjacentList->new( {
    file    => $source,
} );

$rv = $self->pathify( { as_string => 1 } );

That returns a Perl reference to an array of array references:

[
  ["path", "is_actionable"],
  ["|Alpha", 0],
  ["|Beta", 0],
  ["|Alpha|Epsilon", 0],
  ["|Alpha|Epsilon|Kappa", 1],
  ["|Alpha|Zeta", 0],
  ["|Alpha|Zeta|Lambda", 1],
  ["|Alpha|Zeta|Mu", 0],
  ["|Beta|Eta", 1],
  ["|Beta|Theta", 1],
]

Confirm that two taxonomies are equivalent

Problem

You have received a plain-text file which holds CSV-formatted records which constitute a taxonomy-by-materialized-path. You passed the file off to the staff member with the "Big Data" t-shirt. Big Data got the records inserted into an RDBMS where the materialized path is represented by "pointers" to records higher up in the taxonomy, i.e., by means of a parent_id column. You want to demonstrate that the taxonomic structure present in the incoming file has been correctly represented in the RDBMS.

Let's suppose that the data in the incoming taxonomy file looked like this:

$> cat mu.csv

"path","is_actionable"
"--Alpha","0"
"--Beta","0"
"--Alpha--Epsilon","0"
"--Alpha--Epsilon--Kappa","1"
"--Alpha--Zeta","0"
"--Alpha--Zeta--Lambda","1"
"--Alpha--Zeta--Mu","0"
"--Beta--Eta","1"
"--Beta--Theta","1"

Let's further suppose that the data was inserted into the greeks table in the taxonomy database:

taxonomy=# SELECT * FROM greeks;
 id | parent_id |  name   | is_actionable
----+-----------+---------+---------------
  1 |           | Alpha   | f
  2 |           | Beta    | f
  3 |         1 | Epsilon | f
  4 |         3 | Kappa   | t
  5 |         1 | Zeta    | f
  6 |         5 | Lambda  | t
  7 |         5 | Mu      | f
  8 |         2 | Eta     | t
  9 |         2 | Theta   | t
(9 rows)

How can we demonstrate that the data and the hierarchical structure implicit in the table matches that of the original file?

Solution

Copy the database table (or the relevant rows) to a plain-text file in CSV format. In the case of PostgreSQL, that would look like this:

\copy (SELECT * FROM greeks) TO '/path/to/taxonomy/greeks.csv' WITH CSV HEADER

Yielding:

id,parent_id,name,is_actionable
1,,Alpha,f
2,,Beta,f
3,1,Epsilon,f
4,3,Kappa,t
5,1,Zeta,f
6,5,Lambda,t
7,5,Mu,f
8,2,Eta,t
9,2,Theta,t

Create a Parse::Taxonomy::AdjacentList object using greeks.csv as the source:

$self = Parse::Taxonomy::AdjacentList->new( {
    file    => '/path/to/taxonomy/greeks.csv',
} );

Call the pathify method on the object.

$pathified = $self->pathify;

If we were to dump $pathified, we would see:

$Data::Dump::pp($pathified);

[
  ["path", "is_actionable"],
  [["", "Alpha"], "f"],
  [["", "Beta"], "f"],
  [["", "Alpha", "Epsilon"], "f"],
  [["", "Alpha", "Epsilon", "Kappa"], "t"],
  [["", "Alpha", "Zeta"], "f"],
  [["", "Alpha", "Zeta", "Lambda"], "t"],
  [["", "Alpha", "Zeta", "Mu"], "f"],
  [["", "Beta", "Eta"], "t"],
  [["", "Beta", "Theta"], "t"],
]

Now, there's one little tweak needed here. In the original taxonomy file, the is_actionable column held Boolean data which represented TRUE and FALSE by "1" and "0", respectively. When this data was inserted into the database (PostgreSQL, in this case), that Boolean data was stored as t or f, which was also the way it was stored in the CSV file created by copying data from the greeks table. We will need to rewrite the data in the second (is_actionable) column of the elements of $pathified as 1 or 0 to proceed with our test for equivalence.

$fields = $pathified->[0];
@data_records = ();
for $rec (@{$pathified}[1..$#{$pathified}]) {
    $bool = ($rec->[1] eq 't') ? 1 : 0;
    push @data_records, [
        join('|' => @{$rec->[0]}),
        ($rec->[1] eq 't') ? 1 : 0,
    ];
}

We're now in a position to create a new Parse::Taxonomy::MaterializedPath object based on the data extracted from the database via the CSV file. We'll use the components interface to Parse::Taxonomy::MaterializedPath::new() because our taxonomy now lives in the form of Perl data structures suitable for the components interface.

$ptpobj = Parse::Taxonomy::MaterializedPath->new( {
    components  => {
        fields          => $fields,
        data_records    => \@data_records,
    },
} );

We'll now call the fields_and_data_records_path_components() method on the object. This method returns an array of array references in which the first element is an array reference holding the column names and in which, in all subsequent elements, the first element is itself a reference to an array holding the components of the materialized path.

$fdr1 = $ptpobj->fields_and_data_records_path_components;

Which, when dumped, gives:

[
  ["path", "is_actionable"],
  [["", "Alpha"], 0],
  [["", "Beta"], 0],
  [["", "Alpha", "Epsilon"], 0],
  [["", "Alpha", "Epsilon", "Kappa"], 1],
  [["", "Alpha", "Zeta"], 0],
  [["", "Alpha", "Zeta", "Lambda"], 1],
  [["", "Alpha", "Zeta", "Mu"], 0],
  [["", "Beta", "Eta"], 1],
  [["", "Beta", "Theta"], 1],
]

We now return to the original incoming taxonomy file, which we use as the source for a second Parse::Taxonomy::MaterializedPath object. Remember that -- was used to separate the parts of the materialized path in that file.

$tax = Parse::Taxonomy::MaterializedPath->new( {
    file => '/path/to/mu.csv,
    path_col_sep => '--',
} );

We now call the fields_and_data_records_path_components() method on this object as well.

$fdr2 = $tax->fields_and_data_records_path_components;

Finally, we feed $fdr1 and $fdr2 to Test::More::is_deeply().

is_deeply($fdr1, $fdr2, "QED");

If this function returns true -- which it does -- then the taxonomy implicit in the greeks table in the database is equivalent to that in the original incoming taxonomy file.