NAME

Data::Frame - data frame implementation

VERSION

version 0.001

SYNOPSIS

use Data::Frame;
use PDL;

my $df = Data::Frame->new( columns => [
    z => pdl(1, 2, 3, 4),
    y => ( sequence(4) >= 2 ) ,
    x => [ qw/foo bar baz quux/ ],
] );

say $df;
# ---------------
#     z  y  x
# ---------------
#  0  1  0  foo
#  1  2  0  bar
#  2  3  1  baz
#  3  4  1  quux
# ---------------

say $df->nth_column(0);
# [1 2 3 4]

say $df->select_rows( 3,1 )
# ---------------
#     z  y  x
# ---------------
#  3  4  1  quux
#  1  2  0  bar
# ---------------

DESCRIPTION

This implements a data frame container that uses PDL for individual columns. As such, it supports marking missing values (BAD values).

The API is currently experimental and is made to work with Statistics::NiceR, so be aware that it could change.

METHODS

new

new( Hash %options ) # returns Data::Frame

Creates a new Data::Frame when passed the following options as a specification of the columns to add:

  • columns => ArrayRef $columns_array

    When columns is passed an ArrayRef of pairs of the form

    $columns_array = [
        column_name_z => $column_01_data, # first column data
        column_name_y => $column_02_data, # second column data
        column_name_x => $column_03_data, # third column data
    ]

    then the column data is added to the data frame in the order that the pairs appear in the ArrayRef.

  • columns => HashRef $columns_hash

    $columns_hash = {
        column_name_z => $column_03_data, # third column data
        column_name_y => $column_02_data, # second column data
        column_name_x => $column_01_data, # first column data
    }

    then the column data is added to the data frame by the order of the keys in the HashRef (sorted with a stringwise cmp).

string

string() # returns Str

Returns a string representation of the Data::Frame.

number_of_columns

number_of_columns() # returns Int

Returns the count of the number of columns in the Data::Frame.

number_of_rows

number_of_rows() # returns Int

Returns the count of the number of rows in the Data::Frame.

nth_columm

number_of_rows(Int $n) # returns a column

Returns column number $n. Supports negative indices (e.g., $n = -1 returns the last column).

column_names

column_names() # returns an ArrayRef

column_names( @new_column_names ) # returns an ArrayRef

Returns an ArrayRef of the names of the columns.

If passed a list of arguments @new_column_names, then the columns will be renamed to the elements of @new_column_names. The length of the argument must match the number of columns in the Data::Frame.

row_names

row_names() # returns a PDL

row_names( Array @new_row_names ) # returns a PDL

row_names( ArrayRef $new_row_names ) # returns a PDL

row_names( PDL $new_row_names ) # returns a PDL

Returns an ArrayRef of the names of the columns.

If passed a argument, then the rows will be renamed. The length of the argument must match the number of rows in the Data::Frame.

column

column( Str $column_name )

Returns the column with the name $column_name.

add_columns

add_columns( Array @column_pairlist )

Adds all the columns in @column_pairlist to the Data::Frame.

add_column

add_column(Str $name, $data)

Adds a single column to the Data::Frame with the name $name and data $data.

select_rows

select_rows( Array @which )

select_rows( ArrayRef $which )

select_rows( PDL $which )

The argument $which is a vector of indices. select_rows returns a new Data::Frame that contains rows that match the indices in the vector $which.

This Data::Frame supports PDL's data flow, meaning that changes to the values in the child data frame columns will appear in the parent data frame.

SEE ALSO

AUTHOR

Zakariyya Mughal <zmughal@cpan.org>

COPYRIGHT AND LICENSE

This software is copyright (c) 2014 by Zakariyya Mughal.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.