NAME

DataExtract::FixedWidth - The one stop shop for parsing static column width text tables!

SYNOPSIS

SAMPLE FILE
HEADER:  'COL1NAME  COL2NAME       COL3NAMEEEEE'
DATA1:   'FOOBARBAZ THIS IS TEXT      ANHER COL'
DATA2:   'FOOBAR FOOBAR IS TEXT    ANOTHER COL'

In the above example, this module can discern the column names from the header. It will then parse out DATA1 and DATA2 appropriatly. If the column bleeds into another column you can use the option ->fix_overlay(1)

my $de = DataExtract::FixedWidth->new({
	header_row => 'COL1NAME  COL2NAME       COL3NAMEEEEE'
	## You can optionally be explicit about the column names
	## This is required if your column names have spaces
	cols       => [qw/COL1NAME COL2NAME COL3NAMEEEEE/]
});

After you have constructed, you can ->parse which will return an ArrayRef $de->parse('FOOBARBAZ THIS IS TEXT ANOTHER COL');

Or, you can use ->parse_hash() which returns a HashRef of the data indexed by the column header

DESCRIPTION

This module parses any type of fixed width table -- these types of tables are often outputed by ghostscript, printf() displays with string padding (i.e. %-20s %20s etc), and most screen capture mechanisms.

Constructor

The class constructor -- ->new -- provides numerious features. Some options it has are:

heuristics => \@lines: This will deduce the unpack format string from data. If you opt to use this method parse_hash will be unavailble to you.
cols => \@cols: This will permit you to explicitly list the columns in the header row. This is especially handy if you have spaces in the column header. This option will make the header_string mandatory.
header_string => $string: If a cols option is not provided the assumption is that there are no spaces in the column header. The module can take care of the rest. The only way this column can be avoided is if we deduce the header from heuristics, or if you explicitly supply the unpack string and only use ->parse($line)

Methods

->parse( $data_line )

Parses the data and returns an ArrayRef

->parse_hash( $data_line )

Parses the data and returns a HashRef

->first_col_zero(1/0)

On by default, this option forces the unpack string to make the first column assume the characters to the left of the header column. So, in the below example the first column also includes the first char of the row, even though the word stock begins at the second character.

CHAR NUMBERS: |1|2|3|4|5|6|7|8|9|10
HEADER ROW  : | |S|T|O|C|K| |V|I|N

->trim_whitespace(1/0)

On by default, simply trims the whitespace for the elements that ->parse() outputs

->fix_overlay(1/0)

Off by default, fixes columns that bleed into other columns, move over all non-whitespace characters preceding the first whitespace of the next column.

So if ColumnA as is 'foob' and ColumnB is 'ar Hello world'

* ColumnA becomes 'foobar', and ColumnB becomes 'Hello world'

->null_as_undef(1/0)

Simply undef all elements that return length(element) = 0

->colchar_map

Returns a hash ref that sisplays the results of each column header and the character position the column starts at.

->unpack_string

Returns the CORE::unpack() template string that will be used internally by ->parse()

AVAILABILITY

CPAN.org

COPYRIGHT & LICENSE

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

AUTHOR

Evan Carroll <me at evancarroll.com>
System Lord of the Internets

BUGS

Please report any bugs or feature requests to bug-dataexract-fixedwidth at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=DataExtract-FixedWidth. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

To install DataExtract::FixedWidth, copy and paste the appropriate command in to your terminal.

cpanm

cpanm DataExtract::FixedWidth

CPAN shell

perl -MCPAN -e shell
install DataExtract::FixedWidth

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)