The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

Name

SPVM::R::DataFrame - Data Frame

Description

R::DataFrame class in SPVM represents a data frame.

Usage

  use R::DataFrame;
  use R::OP::DataFrame as DFOP;
  use R::OP::Int as IOP;
  use R::OP::Double as DOP;
  use R::OP::String as STROP;
  use R::OP::Time::Piece as TPOP;
  
  # Create a R::DataFrame object
  my $data_frame = R::DataFrame->new;
  
  $data_frame->set_col("name", STROP->c(["Ken", "Yuki", "Mike"]));
  $data_frame->set_col("age", IOP->c([19, 43, 50]));
  $data_frame->set_col("weight", DOP->c([(double)50.6, 60.3, 80.5]));
  $data_frame->set_col("birth", TPOP->c(["1980-10-10", "1985-12-10", "1970-02-16"]));
  
  my $data_frame_string = $data_frame->to_string;
  
  my $slice_data_frame = $data_frame->slice(["name", "age"], [IOP->c([0, 1])]);
  
  $data_frame->sort(["name", "age desc"]);
  
  my $nrow = $data_frame->nrow;
  
  my $ncol = $data_frame->ncol;
  
  my $dim = $data_frame->first_col->dim;

See also Data Frame Examples.

Details

This class is a port of data frame features of R language.

Complexity of Getting, Setting, Inserting and Removing a Column

The complexity of getting a new column is O(1).

The complexity of setting a column is O(1).

The complexity of inserting a new column is O(n), but the complexity of inserting a new column to the end of columns is O(1).

The complexity of removing a column is O(n), but the complexity of removing a column from the end of columns is O(1).

Data Frame Operations

See R::OP::DataFrame about data frame operations.

Fields

colobjs_list;

has colobjs_list : List of R::DataFrame::Column;

A list of R::DataFrame::Column objects that represents columns.

colobjs_indexes_h

has colobjs_indexes_h : Hash of Int;

A hash that stores the column index for each column name.

Class Methods

new

static method new : R::DataFrame ();

Creates a new R::DataFrame object and returns it.

Instance Methods

colnames

method colnames : string[] ();

Returns column names.

exists_col

method exists_col : int ($colname : string);

If the colnum named $colname exists, returns 1, otherwise returns 0.

colname

method colname : string ($col : int);

Returns a column name at column index $col.

Exceptions:

If the column at index $col dose not exist, an exception is thrown.

colindex

method colindex : int ($colname : string);

Returns the column index of the column named $colname.

Exceptions:

If the column named $colname does not exists, an exception is thrown.

col_by_index

method col_by_index : R::NDArray ($col : int);

Returns the n-dimensional array at column index $col.

Exceptions:

If the column at index $col does not exists, an exception is thrown.

first_col

method first_col : R::NDArray ();

Returns the n-dimensional array of the first column.

This method calls "col" method.

Exceptions:

Exceptions thrown by "col" method could be thrown.

col

method col : R::NDArray ($colname : string);

Returns the n-dimensional array of the column named $colname.

Excepttions:

If the column named $colname dose not exist, an exception is thrown.

set_col

method set_col : void ($colname : string, $ndarray : R::NDArray);

Sets the n-dimensional array of the column named $colname to the n-dimensional array $ndarray.

$ndarray becomes read-only by calling R::NDArray#make_dim_read_only method.

If the n-dimensional array of the column named $colname exists, it is replaced with $ndarray.

If not, a new column is inserted by "insert_col" method.

insert_col

method insert_col : void ($colname : string, $ndarray : R::NDArray, $before_colname : string = undef);

Inserts the column named $colname with the n-dimensional array $ndarray before the column named $before_colname.

If $before_colname is not defined, the new column is instered at the end of columns.

The column name $colname must be a non-empty string. Otherwise, an exception is thrown.

The n-dimensional array $ndarray must be defined. Otherwise, an exception is thrown.

If the column named $colname already exists, an exception is thrown.

The dimensions of the n-dimensional array $ndarray must be equal to the dimensions of the n-dimensional array of the first column of this data frame. Otherwise, an exception is thrown.

ncol

method ncol : int ();

Returns the column numbers.

nrow

method nrow : int ();

Returns the row numbers.

If columns do not exist, returns 0.

Exceptions:

The n-dimensional array of the first column of this data frame must be a vector.

remove_col

method remove_col : void ($colname : string);

Removes the column named $colname.

This method calls "colindex" method.

Exceptions:

Exceptions thrown by "colindex" method could be thrown.

clone

method clone : R::DataFrame ($shallow : int = 0);

Clones this data frame and returns it.

This method calls R::NDArray#clone method for the n-dimensional array of each column in this data frame.

to_string

method to_string : string ();

Strigifies this data frame and returns it.

slice

method slice : R::DataFrame ($colnames : string[], $axis_indexes_product : R::NDArray::Int[]);

Slices this data frame given the column names $colnames and the product of axis indexes $axis_indexes_product, and returnd sliced data frame.

This method calls R::NDArray#slice method for the n-dimensional array of each column in this data frame.

set_order

method set_order : void ($data_indexes_ndarray : R::NDArray::Int);

Calls R::NDArray#set_order method for the n-dimeniaonal array of each column in this data frame.

sort

method sort : void ($colnames_with_sort_order : string[]);

Sort data in the n-dimensional array in this data frame given the column name with the sort order $colnames_with_sort_order.

Implementation:

This method calls "order" method given $colnames_with_sort_order and calls "set_order" method givne the return value of "order" method.

order

method order : R::NDArray::Int ($colnames_with_sort_order : string[]);

Gets order data given the column names with the sort orders $colnames_with_sort_order, and returns the order data.

The returned order data can be used for the argument of "set_order" method.

Format of a Column Name with the Sort Order:

  COLUMN_NAME
  COLUMN_NAME SORT_ORDER

COLUMN_NAME is a column name.

SORT_ORDER is asc or desc.

Examples are

  age
  age asc
  age desc

Exceptions:

The column names $colnames_with_sort_order must be defined. Otherwise an exception is thrown.

The column numbers of this data frame must be greater than 0. Otherwise an exception is thrown.

If the column named $colname does not exist, an excetpion is thrown.

($colname is the column part of $colnames_with_sort_order.)

If the column name with the sort order is invalid format, an exception is thrown.

See Also

Copyright & License

Copyright (c) 2024 Yuki Kimoto

MIT License