NAME
Statistics::R::IO::Parser - Functions for parsing R data files
VERSION
version 1.0002
SYNOPSIS
use Statistics::R::IO::ParserState;
use Statistics::R::IO::Parser;
my $state = Statistics::R::IO::ParserState->new(
    data => 'file.rds'
);
say $state->at
say $state->next->at;
DESCRIPTION
You shouldn't create instances of this class, it exists mainly to handle deserialization of R data files by the IO classes.
FUNCTIONS
This library is inspired by monadic parser frameworks from the Haskell world, like Packrat or Parsec. What this means is that parsers are constructed by combining simpler parsers.
The library offers a selection of basic parsers and combinators. Each of these is a function (think of it as a factory) that returns another function (the actual parser) which receives the current parsing state (Statistics::R::IO::ParserState) as the argument and returns a two-element array reference (called for brevity "a pair" in the following text) with the result of the parser in the first element and the new parser state in the second element. If the parser fails, say if the current state is "a" where a number is expected, it returns undef to signal failure.
The descriptions of individual functions below use a shorthand because the above mechanism is implied. Thus, when any_char is described as "parses any character", it really means that calling any_char will return a function that when called with the current state will return "a pair of the character...", etc.
CHARACTER PARSERS
- any_char
 - 
Parses any character, returning a pair of the character at the current State's position and the new state, advanced by one from the starting state. If the state is at the end (
$state-eof> is true), returns undef to signal failure. - char $c
 - 
Parses the given character
$c, returning a pair of the character at the current State's position if it is equal to$cand the new state, advanced by one from the starting state. If the state is at the end ($state-eof> is true) or the character at the current position is not$c, returns undef to signal failure. - string $s
 - 
Parses the given string
$s, returning a pair of the sequence of characters starting at the current State's position if it is equal to$sand the new state, advanced bylength($s)from the starting state. If the state is at the end ($state-eof> is true) or the string starting at the current position is not$s, returns undef to signal failure. 
NUMBER PARSERS
- endianness [$end]
 - 
When the
$endargument is given, this functions sets the byte order used by parsers in the module to be little- (when$endis "<") or big-endian ($endis ">"). This function changes the module's state and remains in effect until the next change.When called with no arguments,
endiannessreturns the current byte order in effect. The starting byte order is big-endian. - any_uint8, any_uint16, any_uint24, any_uint32
 - 
Parses an 8-, 16-, 24-, and 32-bit unsigned integer, returning a pair of the integer starting at the current State's position and the new state, advanced by 1, 2, 3, or 4 bytes from the starting state, depending on the parser. The integer value is determined by the current value of
endianness. If there are not enough elements left in the data from the current position, returns undef to signal failure. - uint8 $n, uint16 $n, uint24 $n, uint32 $n
 - 
Parses the specified 8-, 16-, 24-, and 32-bit unsigned integer
$n, returning a pair of the integer at the current State's position if it is equal$nand the new state. The new state is advanced by 1, 2, 3, or 4 bytes from the starting state, depending on the parser. The integer value is determined by the current value ofendianness. If there are not enough elements left in the data from the current position or the current position is not$n, returns undef to signal failure. - any_int8, any_int16, any_int24, any_int32
 - 
Parses an 8-, 16-, 24-, and 32-bit signed integer, returning a pair of the integer starting at the current State's position and the new state, advanced by 1, 2, 3, or 4 bytes from the starting state, depending on the parser. The integer value is determined by the current value of
endianness. If there are not enough elements left in the data from the current position, returns undef to signal failure. - int8 $n, int16 $n, int24 $n, int32 $n
 - 
Parses the specified 8-, 16-, 24-, and 32-bit signed integer
$n, returning a pair of the integer at the current State's position if it is equal$nand the new state. The new state is advanced by 1, 2, 3, or 4 bytes from the starting state, depending on the parser. The integer value is determined by the current value ofendianness. If there are not enough elements left in the data from the current position or the current position is not$n, returns undef to signal failure. - any_real32, any_real64
 - 
Parses an 32- or 64-bit real number, returning a pair of the number starting at the current State's position and the new state, advanced by 4 or 8 bytes from the starting state, depending on the parser. The real value is determined by the current value of
endianness. If there are not enough elements left in the data from the current position, returns undef to signal failure. - any_int32_na, any_real64_na
 - 
Parses a 32-bit signed integer or 64-bit real number, respectively, but recognizing R-style missing values (NAs): INT_MIN for integers and a special NaN bit pattern for reals. Returns a pair of the number value (
undefif a NA) and the new state, advanced by 4 or 8 bytes from the starting state, depending on the parser. If there are not enough elements left in the data from the current position, returns undef to signal failure. 
SEQUENCING
- seq $p1, ...
 - 
This combinator applies parsers
$p1, ... in sequence, using the returned parse state of$p1as the input parse state to$p2, etc. Returns a pair of the concatenation of all the parsers' results and the parsing state returned by the final parser. If any of the parsers returns undef,seqwill return it immediately without attempting to apply any further parsers. - many_till $p, $end
 - 
This combinator applies a parser
$puntil parser$endsucceeds. It does this by alternating applications of$endand$p; once$endsucceeds, the function returns the concatenation of results of preceding applications of$p. (Thus, if$endsucceeds immediately, the 'result' is an empty list.) Otherwise,$pis applied and must succeed, and the procedure repeats. Returns a pair of the concatenation of all the$p's results and the parsing state returned by the final parser. If any applications of$preturns undef,many_tillwill return it immediately. - count $n, $p
 - 
This combinator applies the parser
$pexactly$ntimes in sequence, threading the parse state through each call. Returns a pair of the concatenation of all the parsers' results and the parsing state returned by the final application. If any application of$preturns undef,countwill return it immediately without attempting any more applications. - with_count [$num_p = any_uint32], $p
 - 
This combinator first applies parser
$num_pto get the number of times that$pshould be applied in sequence. If only one argument is given,any_uint32is used as the default value of$num_p. (Sowith_countworks by getting a number $n by applying$num_pand then callingcount $n, $p.) Returns a pair of the concatenation of all the parsers' results and the parsing state returned by the final application. If the initial application of$num_por any application of$preturns undef,with_countwill return it immediately without attempting any more applications. - choose $p1, ...
 - 
This combinator applies parsers
$p1, ... in sequence, until one of them succeeds, when it immediately returns the parser's result. If all of the parsers fail,choosefails and returns undef 
COMBINATORS
- bind $p1, $f
 - 
This combinator applies parser
$p1and, if it succeeds, calls function$fusing the first element of$p1's result as the argument. The call to$fneeds to return a parser, whichbindapplies to the parsing state after$p1's application.The
bindcombinator is an essential building block for most combinators described so far. For instance,with_countcan be written as:bind($num_p, sub { my $n = shift; count $n, $p; }) - mreturn $value
 - 
Returns a parser that when applied returns
$valuewithout changing the parsing state. - error $message
 - 
Returns a parser that when applied croaks with the
$messageand the current parsing state. 
SINGLETONS
These functions are an interface to ParseState's singleton-related functions, "add_singleton" in ParseState and "get_singleton" in ParseState. They exist because certain types of objects in R data files, for instance environments, have to exist as unique instances, and any subsequent objects that include them refer to them by a "reference id".
- add_singleton $singleton
 - 
Adds the
$singletonto the current parsing state. Returns a pair of$singletonand the new parsing state. - get_singleton $ref_id
 - 
Retrieves from the current parse state the singleton identified by
$ref_id, returning a pair of the singleton and the (unchanged) state. - reserve_singleton $p
 - 
Preallocates a space for a singleton before running a given parser, and then assigns the parser's value to the singleton. Returns a pair of the singleton and the new parse state.
 
BUGS AND LIMITATIONS
Instances of this class are intended to be immutable. Please do not try to change their value or attributes.
There are no known bugs in this module. Please see Statistics::R::IO for bug reporting.
SUPPORT
See Statistics::R::IO for support and contact information.
AUTHOR
Davor Cubranic <cubranic@stat.ubc.ca>
COPYRIGHT AND LICENSE
This software is Copyright (c) 2017 by University of British Columbia.
This is free software, licensed under:
The GNU General Public License, Version 3, June 2007