NAME

Parse::Taxonomy::MaterializedPath - Validate a file for use as a path-based taxonomy

SYNOPSIS

use Parse::Taxonomy::MaterializedPath;

# 'file' interface: reads a CSV file for you

$source = "./t/data/alpha.csv";
$self = Parse::Taxonomy::MaterializedPath->new( {
    file    => $source,
} );

# 'components' interface:  as if you've already read a
# CSV file and now have Perl array references to header and data rows

$self = Parse::Taxonomy::MaterializedPath->new( {
    components  => {
        fields          => $fields,
        data_records    => $data_records,
    }
} );

METHODS

`new()`

Purpose

Parse::Taxonomy::MaterializedPath constructor.
Arguments

Single hash reference. There are two possible interfaces: file and components.
1 file interface
```
$source = "./t/data/alpha.csv";
$self = Parse::Taxonomy::MaterializedPath->new( {
    file    => $source,
    path_col_idx    => 0,
    path_col_sep    => '|',
    %TextCSVoptions,
} );
```
Elements in the hash reference are keyed on:
- file
  
  Absolute or relative path to the incoming taxonomy file. Required for this interface.
- path_col_idx
  
  If the column to be used as the "path" column in the incoming taxonomy file is not the first column, this option must be set to the integer representing the "path" column's index position (count starts at 0). Optional; defaults to 0.
- path_col_sep
  
  If the string used to distinguish components of the path in the path column in the incoming taxonomy file is not a pipe (|), this option must be set. Optional; defaults to |.
- Text::CSV_XS options
  
  Any other options which could normally be passed to Text::CSV_XS->new() will be passed through to that module's constructor. On the recommendation of the Text::CSV documentation, binary is always set to a true value.
2 components interface
```
$self = Parse::Taxonomy::MaterializedPath->new( {
    components  => {
        fields          => $fields,
        data_records    => $data_records,
    }
} );
```
Elements in this hash are keyed on:
- components
  
  This element is required for the components interface. The value of this element is a hash reference with two keys, fields and data_records. fields is a reference to an array holding the field or column names for the data set. data_records is a reference to an array of array references, each of the latter arrayrefs holding one record or row from the data set.
- path_col_idx
  
  Same as in file interface above.
- path_col_sep
  
  Same as in file interface above.
Return Value

Parse::Taxonomy::MaterializedPath object.
Comment

new() will throw an exception under any of the following conditions:
- Argument to new() is not a reference.
- Argument to new() is not a hash reference.
- In the file interface, unable to locate the file which is the value of the file element.
- Argument to path_col_idx element is not an integer.
- Argument to path_col_idx is greater than the index number of the last element in the header row of the incoming taxonomy file, i.e., the path_col_idx is wrong.
- The same field is found more than once in the header row of the incoming taxonomy file.
- Unable to open or close the incoming taxonomy file for reading.
- In the column designated as the "path" column, the same value is observed more than once.
- id, parent_id, name, lft and rgh are reserved terms. One or more columns is named with a reserved term.
- A non-parent node's parent node cannot be located in the incoming taxonomy file.
- A data row has a number of fields different from the number of fields in the header row.

`fields()`

Purpose

Identify the names of the columns in the taxonomy.
Arguments
```
my $fields = $self->fields();
```
No arguments; the information is already inside the object.
Return Value

Reference to an array holding a list of the columns as they appear in the header row of the incoming taxonomy file.
Comment

Read-only.

`path_col_idx()`

Purpose

Identify the index position (count starts at 0) of the column in the incoming taxonomy file which serves as the path column.
Arguments
```
my $path_col_idx = $self->path_col_idx;
```
No arguments; the information is already inside the object.
Return Value

Integer in the range from 0 to 1 less than the number of columns in the header row.
Comment

Read-only.

`path_col()`

Purpose

Identify the name of the column in the incoming taxonomy which serves as the path column.
Arguments
```
my $path_col = $self->path_col;
```
No arguments; the information is already inside the object.
Return Value

String.
Comment

Read-only.

`path_col_sep()`

Purpose

Identify the string used to separate path components once the taxonomy has been created. This is just a "getter" and is logically distinct from the option to new() which is, in effect, a "setter."
Arguments
```
my $path_col_sep = $self->path_col_sep;
```
No arguments; the information is already inside the object.
Return Value

String.
Comment

Read-only.

`data_records()`

Purpose

Once the taxonomy has been validated, get a list of its data rows as a Perl data structure.
Arguments
```
$data_records = $self->data_records;
```
None.
Return Value

Reference to array of array references. The array will hold the data records found in the incoming taxonomy file in their order in that file.
Comment

Does not contain any information about the fields in the taxonomy, so you should probably either (a) use in conjunction with fields() method above; or (b) use fields_and_data_records().

`fields_and_data_records()`

Purpose

Once the taxonomy has been validated, get a list of its header and data rows as a Perl data structure.

Arguments

$data_records = $self->fields_and_data_records;

None.

Return Value

Reference to array of array references. The first element in the array will hold the header row (same as output of fields()). The remaining elements will hold the data records found in the incoming taxonomy file in their order in that file.

`data_records_path_components()`

Purpose

Once the taxonomy has been validated, get a list of its data rows as a Perl data structure. In each element of this list, the path is now represented as an array reference rather than a string.

Arguments

$data_records_path_components = $self->data_records_path_components;

None.

Return Value

Reference to array of array references. The array will hold the data records found in the incoming taxonomy file in their order in that file.
Comment

Does not contain any information about the fields in the taxonomy, so you may wish to use this method either (a) use in conjunction with fields() method above; or (b) use fields_and_data_records_path_components().

`fields_and_data_records_path_components()`

Purpose

Once the taxonomy has been validated, get a list of its data rows as a Perl data structure. The first element in this list is an array reference holding the header row. In each data element of this list, the path is now represented as an array reference rather than a string.

Arguments

$fields_and_data_records_path_components = $self->fields_and_data_records_path_components;

None.

Return Value

Reference to array of array references. The array will hold the data records found in the incoming taxonomy file in their order in that file.

`get_field_position()`

Purpose

Identify the index position of a given field within the header row.
Arguments
```
$index = $self->get_field_position('income');
```
Takes a single string holding the name of one of the fields (column names).
Return Value

Integer representing the index position (counting from 0) of the field provided as argument. Throws exception if the argument is not actually a field.

`descendant_counts()`

Purpose

Display the number of descendant (multi-generational) nodes each node in the taxonomy has.
Arguments
```
$descendant_counts = $self->descendant_counts();

$descendant_counts = $self->descendant_counts( { generations => 1 } );
```
None required; one optional hash reference. Currently, the only element honored in that hashref is generations, whose value must be a non-negative integer. If, instead of getting the count of all descendants of a node, you only want the count of its first generation, i.e., its immediate children, you provide a value of 1. Want the count of only the first and second generations? Provide a value of 2 -- and so on.
Return Value

Reference to hash in which each element is keyed on the value of the path column in the incoming taxonomy file.

`get_descendant_count()`

Purpose

Get the total number of descendant nodes for one specific node in a validated taxonomy.
Arguments
```
$descendant_count = $self->get_descendant_count('|Path|To|Node');

$descendant_counts = $self->get_descendant_count('|Path|To|Node', { generations => 1 } );
```
One required: string containing node's path as spelled in the taxonomy.

One optional hash reference. Currently, the only element honored in that hashref is generations, whose value must be a non-negative integer. If, instead of getting the count of all descendants of a node, you only want the count of its first generation, i.e., its immediate children, you provide a value of 1. Want the count of only first and second generations? Provide a value of 2 -- and so on.
Return Value

Unsigned integer >= 0. Any node whose child count is 0 is by definition a leaf node.
Comment

Will throw an exception if the node does not exist or is misspelled.

If get_descendant_count() is called with no second (hashref) argument following an invocation of descendant_counts(), it will return a value from an internal cache created during that earlier method call. Otherwise, it will re-create the cache from scratch. (This, of course, assumes that you have not manipulated the object's internal data subsequent to its creation.)

`hashify()`

Purpose

Turn a validated taxonomy into a Perl hash keyed on the column designated as the path column.
Arguments
```
$hashref = $self->hashify();
```
Takes an optional hashref holding a list of any of the following elements:
- remove_leading_path_col_sep
  
  Boolean, defaulting to 0. By default, hashify() will spell the key of the hash exactly as the value of the path column is spelled in the taxonomy -- which in turn is the way it was spelled in the incoming file. That is, a path in the taxonomy spelled |Alpha|Beta|Gamma will be spelled as a key in exactly the same way.
  
  However, since in many cases (including the example above) the root node of the taxonomy will be empty, the user may wish to remove the first instance of path_col_sep. The user would do so by setting remove_leading_path_col_sep to a true value.
```
$hashref = $self->hashify( {
    remove_leading_path_col_sep => 1,
} );
```
  In that case they key would now be spelled: Alpha|Beta|Gamma.
  
  Note further that if the root_str switch is set to a true value, any setting to remove_leading_path_col_sep will be ignored.
- key_delim
  
  A string which will be used in composing the key of the hashref returned by this method. The user may select this key if she does not want to use the value found in the incoming CSV file (which by default will be the pipe character (|) and which may be overridden with the path_col_sep argument to new().
```
$hashref = $self->hashify( {
    key_delim   => q{ - },
} );
```
  In the above variant, a path that in the incoming taxonomy file was represented by |Alpha|Beta|Gamma will in $hashref be represented by - Alpha - Beta - Gamma.
- root_str
  
  A string which will be used in composing the key of the hashref returned by this method. The user will set this switch if she wishes to have the root note explicitly represented. Using this switch will automatically cause remove_leading_path_col_sep to be ignored.
  
  Suppose the user wished to have All Suppliers be the text for the root node. Suppose further that the user wanted to use the string - as the delimiter within the key.
```
$hashref = $self->hashify( {
    root_str    => q{All Suppliers},
    key_delim   => q{ - },
} );
```
  Then incoming path |Alpha|Beta|Gamma would be keyed as:
```
All Suppliers - Alpha - Beta - Gamma
```
Return Value

Hash reference. The number of elements in this hash should be equal to the number of non-header records in the taxonomy.

`adjacentify()`

Purpose

Transform a taxonomy-by-materialized-path into a taxonomy-by-adjacent-list.
Arguments
```
$adjacentified = $self->adjacentify();

$adjacentified = $self->adjacentify( { serial => 500 } );
$adjacentified = $self->adjacentify( { floor  => 500 } );  # same as serial
```
Optional single hash reference.

For that hashref, adjacentify() supports the key serial, which defaults to 0. serial must be a non-negative integer and sets the "floor" above which new unique IDs will be assigned to the id column. Hence, if serial is set to 500, the value assigned to the id column of the first record to be processed will be 501.

Starting with version .19, floor will serve as an alternative way of providing the same information to adjacentify(). If, however, by mistake you provide both serial and floor elements in the hash, serial will take precedence.
Return Value

Reference to an array of hash references. Each element represents one node in the taxonomy. Each element will have key-value pairs for id, parent_id and name which will hold the adjacentification of the materialized path in the original taxonomy-by-materialized-path. Each element will, as well, have KVPs for the non-materialized-path fields in the records in the original taxonomy-by-materialized-path.
Comment

See documentation for write_adjacentified_to_csv() for example.

Note that the order in which adjacentify() will assign id and parent_id values to records in the taxonomy-by-adjacent-list will almost certainly not match the order in which elements appear in a CSV file or in the data structure returned by a method such as data_records().

`write_adjacentified_to_csv()`

Purpose

Create a CSV-formatted file holding the data returned by adjacentify().
Arguments
```
$csv_file = $self->write_adjacentified_to_csv( {
   adjacentified => $adjacentified,                   # output of adjacentify()
   csvfile => './t/data/taxonomy_out3.csv',
} );
```
Single hash reference. That hash is keyed on:
- adjacentified
  
  Required: Its value must be the arrayref of hash references returned by the adjacentify() method.
- csvfile
  
  Optional. Path to location where a CSV-formatted text file holding the taxonomy-by-adjacent-list will be written. Defaults to a file called taxonomy_out.csv in the current working directory.
- Text::CSV_XS options
  
  You can also pass through any key-value pairs normally accepted by Text::CSV_XS.
Return Value

Returns path to CSV-formatted text file just created.
Example

Suppose we have a CSV-formatted file holding the following taxonomy-by-materialized-path:
```
"path","is_actionable"
"|Alpha","0"
"|Beta","0"
"|Alpha|Epsilon","0"
"|Alpha|Epsilon|Kappa","1"
"|Alpha|Zeta","0"
"|Alpha|Zeta|Lambda","1"
"|Alpha|Zeta|Mu","0"
"|Beta|Eta","1"
"|Beta|Theta","1"
```
After running this file through new(), adjacentify() and write_adjacentified_to_csv() we will have a new CSV-formatted file holding this taxonomy-by-adjacent-list:
```
id,parent_id,name,is_actionable
1,,Alpha,0
2,,Beta,0
3,1,Epsilon,0
4,1,Zeta,0
5,2,Eta,1
6,2,Theta,1
7,3,Kappa,1
8,4,Lambda,1
9,4,Mu,0
```
Note that the path column has been replaced by the id, parent_id and name columns.

To install Parse::Taxonomy, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Parse::Taxonomy

CPAN shell

perl -MCPAN -e shell
install Parse::Taxonomy

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)

NAME

SYNOPSIS

METHODS

new()

fields()

path_col_idx()

path_col()

path_col_sep()

data_records()

fields_and_data_records()

data_records_path_components()

fields_and_data_records_path_components()

get_field_position()

descendant_counts()

get_descendant_count()

hashify()

adjacentify()

write_adjacentified_to_csv()

Module Install Instructions

`new()`

`fields()`

`path_col_idx()`

`path_col()`

`path_col_sep()`

`data_records()`

`fields_and_data_records()`

`data_records_path_components()`

`fields_and_data_records_path_components()`

`get_field_position()`

`descendant_counts()`

`get_descendant_count()`

`hashify()`

`adjacentify()`

`write_adjacentified_to_csv()`