NAME

Spreadsheet::Edit - Slice and dice spreadsheets, optionally using tied variables.

NON-OO SYNOPSIS

use Spreadsheet::Edit qw(:all);

# Examples assume a spreadsheet with these titles in the first row:
# "Account Number"  "Customer's Name"  "Email"  "Home-phone"  "Income"

read_spreadsheet "mailing_list.xls!Sheet1"; 

# alias an identifier to a long or complicated title
alias Name => qr/customer/i;  # matches "Customer's Name"

# ------------ without tied column variables -----------

# Print the data 
printf "%20s %8s %8s %-13s %s\n", "Name","A/N","Income","Phone","Email";
apply {
  printf "%20s %8d %8.2f %-13s %s\n",
         $crow{Name},              # this key is an explicit alias
         $crow{"Account Number"},  #   ...actual title
         $crow{Income},            #   ...actual title
         $crow{Home_phone},        #   ...auto-generated alias
         $crow{Email} ;            #   ...actual title
};

# Randomly access rows.  
print "Row 42: Column 'C' is ",      $rows[41]{C},    "\n";
print "Row 42: Customer's Name is ", $rows[41]{Name}, "\n";
print "Row 42: 3rd column is ",      $rows[41][2],    "\n";

# Split the "Customer's Name" into separate FName and LName columns
insert_cols '>Name', "FName", "LName";
apply { 
  ($crow{FName}, $crow{LName}) = ($crow{Name} =~ /(.*) (.*)/) 
    or die logmsg "Could not parse Name"; # logmsg adds current row number
};
delete_cols "Name";

# Sort by last name
sort_rows { $a->{LName} cmp $b->{LName} };

# ------------ using tied column variables -----------

our $Name;            # 'Name' is the explicit alias created above
our $Account_Number;  # Auto-generated alias for "Account Number"
our $Home_phone;      #   ditto
our $Income;          # 'Income' is an actual title 
our $Email;           #   ditto
our ($FName, $LName); # These columns do not yet exist

tie_column_vars "Name", "Account_Number", qr/phone/, qr/^inc/i, "FName", "LName";

# Print the data
printf "%20s %8s %8s %-13s %s\n", "Name","A/N","Income","Phone","Email";
apply {
  printf "%20s %8d %8.2f %-13s%s\n",
         $Name, $Account_Number, $Income, $Home_phone, $Email;
};

# Split the "Customer's Name" into separate FName and LName columns
insert_cols '>Name', "FName", "LName";
apply { 
  ($FName, $LName) = ($Name =~ /^(\S+) (\S+)$/) 
    or die logmsg "Could not parse Name";
};
delete_cols "Name";

# Simple mail-merge
use POSIX qw(strftime);
apply {
  return 
    if $Income < 100000;  # not in our audience
  open SENDMAIL, "|sendmail -t -oi" || die "pipe:$!";
  print SENDMAIL "To: $FName $LName <$Email>\n";
  print SENDMAIL strftime("Date: %a, %d %b %y %T %z\n", localtime(time));
  print SENDMAIL <<EOF ;
From: sales\@example.com
Subject: Help for the 1%

Dear $FName,
  If you have disposable income, we can help with that.
Sincerely,
Your investment advisor.
EOF
  close SENDMAIL || die "sendmail failed ($?)\n";
};

# ------------ multiple sheets --------------

our ($Foo, $Bar, $Income);

read_spreadsheet "file1.csv";
tie_column_vars;              # tie all vars that ever become valid

my $s1 = sheet undef ;        # Save ref to current sheet & forget it

read_spreadsheet "file2.csv"; # Auto-creates sheet bc current is undef
tie_column_vars;              

my $s2 = sheet ;              # Save ref to second sheet

print "$Foo $Bar $Income\n";  # these refer to $s2, the current sheet

sheet $s1 ;                     
print "$FName $LName $Income\n"; # these now refer to the original sheet

# ------------ create sheet from memory --------------

my $s3 = new_sheet
    data_source => "my own data",
    rows => [ 
      ["This is a row before the title row"                       ],
      ["Full Name",  "Address",         "City",   "State", "Zip"  ],
      ["Joe Smith",  "123 Main St",     "Boston", "CA",    "12345"],
      ["Mary Jones", "999 Olive Drive", "Fenton", "OH",    "67890"],
    ] ;
title_rx 1;
...

OO SYNOPSIS

use Spreadsheet::Edit ();

my $sheet = Spreadsheet::Edit->new();
$sheet->read_spreadsheet("mailing_list.xls!sheet name");
$sheet->alias( Name => qr/customer/i );  # matches "Customer's Name" 

# Randomly access rows.  
# Sheet objects, when used as an ARRAYref, act like \@rows
print "Row 42: Name is ",     $sheet->[41]{Name}, "\n";
print "Row 42, Column 3 is ", $sheet->[41][2],    "\n";

# Print the data.
# Sheet objects, when used as an HASHref, act like \%crow 
printf "%20s %8s %8s %-13s %s\n", "Name","A/N","Income","Phone","Email";
$sheet->apply( sub{
    printf "%20s %8d %8.2f %-13s%s\n",
           $sheet->{Name},
           $sheet->{"Account Number"},
           $sheet->{Income},
           $sheet->{Home_phone},
           $sheet->{Email} ;
});

# Another way:
$sheet->apply( sub{
    my $r = $sheet->crow();
    printf "%20s %8d %8.2f %-13s%s\n",
           $r->{Name}, $r->{"Account Number"}, $r->{Income},
           $r->{Home_phone}, $r->{Email} ;
});

# Another way:
$sheet->apply( sub{
    my $r = $sheet->crow();
    printf "%20s %8d %8.2f %-13s%s\n",
           $r->[0], $r->[1], $r->[4], $r->[3], $r->[2] ;
});

# Split the "Customer's Name" into separate FName and LName columns
$sheet->insert_cols('>Name', "FName", "LName");
$sheet->apply( sub {
    my $r = $sheet->crow();
    ($r->{FName}, $r->{LName}) = ($r->{Name} =~ /(.*) (.*)/) 
      or die Spreadsheet::Edit::logmsg("Could not parse Name");
});
$sheet->delete_cols( "Name" );

DESCRIPTION

This package allows easy manipulation of CSV data (or data from a spreadsheet) referring to columns by title or absolute position. Rows and columns may be inserted, deleted, or moved.

The usual paradigm is to iterate over rows applying a function to each, vaguely inspired by 'sed' and 'awk' (see apply below). Random access is also supported.

There is both a procedural and object-oriented API, which can work together.

Optionally, tied variables can be used with the procedural API.

Note: Only cell values are handled; there is no provision for processing formatting information from spreadsheets. The author has a notion to add format support, perhaps integrating with Spreadsheet::Read and Spreadsheet::Write or the packages they use. Please contact the author if you want to help.

HOW TO IMPORT

By default only functions are imported, but most people will

use Spreadsheet::Edit ':all';

to import both functions and helper variables (see STANDARD SHEET VARIABLES and VARIABLES USED DURING APPLY).

You can rename imported variables using the '-as' feature shown in Exporter::Tiny::Manual::QuickStart.

THE 'CURRENT SHEET'

Functions and helper variables (the procedural API) implicitly operate on a 'current sheet' object. Each package has its own 'current sheet'.

A new sheet is created by any operation if there no 'current sheet'.

The 'current sheet' may be saved, changed or forgotten (i.e. unset).

Except where noted, each function has a corresponding OO method which operates on the specified object instead of the 'current sheet'.

See "OO DESCRIPTION" for a summary of all methods.

TIED COLUMN VARIABLES

Package variables can refer directly to columns in the 'current sheet' during apply. For example $Email and $FName in the SYNOPSIS above.

tie is used to bind these variables to the corresponding cell in the current row of the 'current sheet' during execution of apply;

See tie_column_vars for details.

THE FUNCTIONS

In the following descriptions, {OPTIONS} refers to an optional first argument which, if present, is a hashref giving additional parameters. For example in

read_spreadsheet {sheetname => 'Sheet1'}, '/path/to/file.xlsx';

the {...} hash is optional and specifies the sheet name.

$curr_sheet = sheet ;

$prev_sheet = sheet $another_sheet ;

$prev_sheet = sheet undef ;

[Procedural API only] Retrieve, change, or forget the 'current sheet' object used by the procedural API.

Changing the current sheet immediately changes what is referenced by tied column variables and STANDARD SHEET VARIABLES (described later).

{OPTIONS} may specify package => 'pkgname' to operate on the specified package instead of the caller's package (useful for library packages).

read_spreadsheet CSVFILEPATH

read_spreadsheet SPREADSHEETPATH

read_spreadsheet "SPREADSHEETPATH!SHEETNAME"

Replace any existing data with content from the given file. The file may be a .csv or any format supported by Libre Office or gnumeric.

{OPTIONS} may include:

sheetname => SHEETNAME

Specify which sheet in a workbook (i.e. spreadsheet file) to read. Alternatively, the sheet name may be appended to the input path after '!' as shown in the example.

If no SHEETNAME is given, the "last used" is read, i.e. the "active" sheet when the spreadsheet was saved.

(other options as in read_workbook)

$href = read_workbook SPREADSHEETPATH

**NOT YET IMPLEMENTED**

[Function only, not callable as a method] All sheets in the specified document are read into memory without changing the 'current sheet'. A hashref is returned:

{ 
  "sheet name" => (Spreadsheet::Edit object),
  ...for each sheet in the workbook... 
}

To access one of the workbook sheets, execute

sheet $href->{"sheet name"};  # or call OO methods on it

If SPREADSHEETPATH was a .csv file then the resulting hash will have only one member with an indeterminate key.

{OPTIONS} may include:

silent => bool
verbose => bool
debug => bool
Other key => value pairs override details of CSV parsing.

See Text::CSV. UTF-8 encoding is assumed by default.

Due to bugs in Libre/Open Office, spreadsheet files can not be read if LO/OO is currently running, even for unrelated purposes (might be fixed in the future, see "BUGS"). This problem does not occur with .csv files

new_sheet

[procedural API only] Create a new empty sheet and make it the 'current sheet', returning the sheet object.

Rarely used because a new sheet is automatically created by any operation if the package has no current sheet.

{OPTIONS} may contain any of the OPTIONS which may be passed to the OO new method.

alias IDENT => COLSPEC, ... ;

alias IDENT => qr/regexp/, ... ;

Create alternate identifiers for specified columns.

Each IDENT, which must be a valid Perl identifier, will henceforth refer to the specified column even if the identifier is the same as the title or letter code of a different column.

$row{IDENT}, $colx{IDENT} etc., and a tied variable $IDENT will refer to the specified column.

Once created, aliases automatically track the column if it's position changes.

Regular Expressions are matched against titles only, and must match exactly one column or else an exception is thrown. Other kinds of COLSPECs may be titles, existing alias names, column letters, etc. (see "COLUMN SPECIFIERS" for details).

The COLSPEC is evaluated before the alias is created, so

alias B => "B";

would make "B" henceforth refer to the current second column (or a different column which has title "B" if such exists) even if that column later moves.

RETURNS: In array context, the 0-based column indices of the aliased columns; in scalar context the column index of the first alias.

unalias IDENT, ... ;

Forget alias(es). Any masked COLSPECs become usable again.

spectocx COLSPEC or qr/regexp/, ... ;

Returns the 0-based indicies of the specified colomn(s). Throws an exception if there is no such column. A regexp may match multiple columns. See also %colx.

tie_column_vars VARNAME, ...

Create tied package variables (scalars) for use during apply.

Each variable is a scalar corresponding to a column, and reading or writing it accesses the corresponding cell in the row being visited during apply. The variable name itself implies which column it refers to. The '$' may be omitted in the VARNAME arguments to tie_column_vars;

Normally you must separately declare these variables with our $NAME. However not if imported or if tie_column_vars is called in a BEGIN block as explained below).

Variable names may be:

  • User-defined alias names (see "alias")

  • Titles which happen to be valid Perl identifiers

  • Identifiers derived from titles by replacing offending characters with underscrores (see "AUTOMATIC ALIASES"),

  • Spreadsheet column letters like "A", "B" etc.

See "CONFLICT RESOLUTION" about name clashes.

Multiple calls accumulate, including with different sheets.

Variable bindings are dynamically evaluated during each access by using the variable's identifier as a COLSPEC with the 'current sheet' in your package. This means that it does not matter which sheet was 'current' when tie_column_vars was called with a particular name; it only matters that the name of a tied variable is a valid COLSPEC in the 'current sheet' when that variable is referenced (otherwise a read returns undef and a write throws an exception). [*Need further clarification*]

{OPTIONS} may specify:

package => "pkgname"

Tie variables in the specified package instead of the caller's package.

verbose => bool
debug => bool

Print trace messages.

tie_column_vars ':all'

With the :all token all possible variables are tied, corresponding to the aliases, titles, non-conflicting column letters etc. which exist for the current sheet.

In addition, variables will be tied in the future whenever new identifiers become valid (for example when a new alias is created, column added, or another file is read into the same sheet).

Although convenient this is insecure because malicious titles could clobber unintended globals.

If VARNAMES are also specified, those variables will be tied immediately even if not yet usable; an exception occurs if a tied variable is referenced before the corresponding alias or title exists.

Use in BEGIN{} or module import methods

tie_column_vars imports the tied variables into your module, or the module specified with package => "pkgname" in {OPTIONS}.

It is unnecessary to declare tied variables if the import occurs before code is compiled which references the variables. This can be the case if tie_column_vars is called in a BEGIN{} block or in the import method of a module loaded with use.

Spreadsheet::Edit::Preload makes use of this.

title_rx {AUTODETECT_OPTIONS} ;

title_rx ROWINDEX ;

$rowindex = title_rx ;

Set or auto-detect the row containing titles. Titles are used to generate column-selection keys (COLSPECs).

It is not necessary to call title_rx unless you want to change auto-detect options from the defaults (including to disable auto-detect), or to specify a title row explicitly. By default, the title row is auto-detected the first time any operation needs it.

If title_rx is called, it should be done immediately after calling read_spreadsheet or directly modifying title cells.

An optional initial {AUTODETECT_OPTIONS} argument may contain:

    enable     => BOOL, # False to disable auto-detect 
    min_rx     => NUM,  # first rx which may contain the title row.
    max_rx     => NUM,  # maximum rx which may contain the title row.
    required   => COLSPEC or [COLSPEC,...]  # any required title(s)
    first_cx   => NUM,  # first column ix which must contain required titles
    last_cx    => NUM,  # last column ix which must contain required titles

The default options are {enable => 1, max_rx => 4}.

The first row is used which includes the required title(s), if any, and has non-empty titles in all positions. If first_cx and/or last_cx are specified then columns outside that range are ignored and may contain anything.

An exception is thrown when auto-detect is attempted if a plausible title row can not be found.

If you specify {AUTODETECT_OPTIONS} they will also be saved for later re-use, for example after reading a different spreadsheet. Specifying {} restores the default options.

apply {code} ;

apply_all {code} ;

apply_torx {code} RX-OR-RXLIST ;

apply_exceptrx {code} RX-OR-RXLIST ;

Execute the specified code block (or referenced sub) once for each row.

While executing the code block, tied column variables and the sheet variables @crow, %crow, $rx and $linenum and corresponding OO methods will refer to the row being visited.

apply normally visits all rows which follow the title row, or all rows if there is no title row. If first_data_rx and last_data_rx are defined, then they further limit the range visited.

apply_all unconditionally visits every row, including any title row.

apply_torx or apply_exceptrx visit exactly the indicated rows. RX-OR-RXLIST may be either a single row index or a [list of rx];

Rows may be safely inserted or deleted during 'apply'; rows inserted after the currently-being-visited row will be visited at the proper time.

An 'apply' sub may change the 'current sheet', after which global variables will refer to the other sheet and any apply active for that sheet. It should take care to restore the original sheet before returning (perhaps using Guard::scope_guard). Nested and recursive applys are allowed.

MAGIC VARIABLES USED DURING APPLY

These variables refer to the row currently being visited:

    $rx is the 0-based index of the current row.

    @crow is an array aliased to the current row's cells.

    %crow is a hash aliased to the same cells, indexed by alias, title, letter code, etc. (any COLSPEC).

    $linenum is the starting line number of the current row if the data came from a .csv file.

    For example, the "FIRST NAME" column may be accessed many ways:

    alias Name => "FIRST NAME";
    apply {
    
      $crow{"FIRST NAME"}            # %crow indexed by title
      $crow{Name}                    #   using an explicit alias
      $crow{FIRST_NAME}              #   using the AUTOMATIC ALIAS
    
      $crow[ $colx{"FIRST NAME"} ];  # @crow indexed by a 0-based index
      $crow[ $colx{"Name"} ];        #  ...obtained from %colx
      $crow[ $colx{"FIRST_NAME"} ];  # 
    
      $rows[$rx]->[ $colx{FIRST_NAME} ] # Directly accessing @rows
    
      # See "TIED COLUMN VARIABLES" for a sweeter alternative
    };

delete_col COLSPEC ;

delete_cols COLSPEC+ ;

The indicated columns are removed. Remaining title bindings are adjusted to track shifted columns.

only_cols COLSPEC+ ;

All columns except the specified columns are deleted.

move_col POSITION, SOURCE ;

move_cols POSITION, SOURCES... ;

Relocate the indicated column(s) (SOURCES) so they are adjacent, in the order specified, starting at the position POSITION.

POSITION may be ">COLSPEC" to place moved column(s) immediately after the indicated column (">$" to place at the end), or POSITION may directly specify the destination column using an unadorned COLSPEC.

A non-absolute COLSPEC indicates the initial position of the referenced column.

insert_col POSITION, newtitle ;

insert_cols POSITION, newtitles... ;

One or more columns are created starting at a position specified the same way as in move_cols (later columns are moved rightward).

POSITION may be ">$" to place new column(s) at the far right.

A new title must be specified for each new column. If there is no title row, specify undef for each position.

Returns the new column index or indices.

split_col {code} COLSPEC, POSITION, newtitles... ;

New columns are created starting at POSITION as with insert_cols, and populated with data from column COLSPEC.

{code} is called for each row with $_ bound to the cell at COLSPEC and @_ bound to cell(s) in the new column(s). It is up to your code to read the old column ($_) and write into the new columns (@_).

The old column is left as-is (not deleted).

If there is no title row, specify undef for each new title.

sort_rows {rx cmp function}

sort_rows {rx cmp function} $first_rx, $last_rx

If no range is specified, then the range is the same as for apply (namely: All rows after the title row unless limited by first_data_rx .. last_data_rx).

In the comparison function globals $a and $b will contain row objects, which are dual-typed to act as either an array or hash ref to the cells in their row. The corresponding original row indicies are also passed as parameters in @_.

Rows are not actually moved until after all comparisons have finished.

Returns a list of the previous row indicies of all rows in the sheet.

# Sort on the "LName" column using row indicies
# (contrast with the example in SYNOPSIS which uses $a and $b)
sort_rows { my ($rxa, $rxb) = @_; 
            $rows[$rxa]{LName} cmp $rows[$rxb]{LName}
          };

rename_cols COLSPEC, "new title", ... ;

Multiple pairs may be given. Title cell(s) are updated as indicated.

Existing user-defined aliases are not affected, i.e., they continue to refer to the same columns as before.

join_cols_sep STRING COLSPEC+ ;

join_cols {code} COLSPEC+ ;

The specified columns are combined into the first-specified column and the other columns are deleted.

The first argument of join_cols_sep should be a fixed separator. The first argument of join_cols may be a {code} block or subref;

If a separator string is specified it is used to join column content together.

If a {code} block or sub ref is specified, it is executed once for each row following the title row, with $_ bound to the first-named column, i.e. the surviving column, and @_ bound to all named columns in the order given.

It is up to your code to combine the data by reading @_ and writing $_ (or, equivalently, by writing $_[0]).

first_data_rx and last_data_rx are ignored, and the title is not modified.

reverse_cols

The order of the columns is reversed.

insert_row

insert_row 'END' [,$count]

insert_rows $rowx [,$count]

Insert one or more empty rows at the indicated position (default: at end). $rowx, if specified, is either a 0-based offset for the new row or 'END' to add the new row(s) at the end. Returns the index of the first new row.

delete_rows $rowx,... ;

delete_rows 'LAST',... ;

The indicated data rows are deleted. $rowx is a zero-based row index or the special token "LAST" to indicate the last row (same as $#rows). Any number of rows may be deleted in a single command, listed individually.

transpose

Invert the relation, i.e. rotate and flip the table. Cells A1,B1,C1 etc. become A1,A2,A3 etc. Any title_rx is forgotten.

logmsg [FOCUSARG,] string, string, ...

(not exported by default)

Concatenate strings, prefixed by a description of the 'current sheet' and row during apply, if any (or with FOCUSARG, the specified sheet and/or row).

The resulting string is returned, with "\n" appended if it was not already terminated by a newline.

The first argument is used as FOCUSARG if it is a sheet object, [sheet_object], or [sheet_object, rowindex], and specifies the sheet and/or row to describe in the message prefix. Otherwise the first argument is not special and is simply the first message string.

The details of formatting the sheet may be customized with a call-back given by a {logmsg_pfx_gen} attribute. See comments in the source for how this works.

write_csv CSVFILEPATH

write_csv *FILEHANDLE

write_csv $filehandle

Write the current data to the indicated path or open file handle as a CSV text file. The default encoding is UTF-8 or, if read_spreadsheet was most-recently used to read a csv file, the encoding used then.

{OPTIONS} may include

options for Text::CSV

Usually none need be specified because we supply sane defaults.

silent => bool
verbose => bool
debug => bool

write_spreadsheet OUTPUTPATH

Write the current data to a spreadsheet (.ods, .xlsx, etc.) by first writing to a temporary CSV file and then importing that file into a new spreadsheet.

{OPTIONS} may include

col_formats => [ LIST ]

EXPERIMENTAL, likely to change when Spreadsheet::Read is integrated!

Elements of LIST may be "" (Standard), "Text", "MM/DD/YY", "DD/MM/YY", or "YY/MM/DD" to indicate the format of the corresponding column. The meaning of "Standard" is not well documented but appears to mean "General Number" in most cases. For details, see "Format Codes" in this old Open Office documentation.

silent => bool
verbose => bool
debug => bool

options NAME => EXPR, ... ;

options NAME ;

Set or retrieve miscellaneous sheet-global options. When setting, the previous value of the last option specified is returned. The only options currently defined are silent, verbose and debug.

$hash = attributes ;

Returns a reference to a hash in which you may store arbitrary data in the sheet object in memory.

STANDARD SHEET VARIABLES

These variables magically access the 'current sheet' in your package.

@rows

The spreadsheet data as an array of row objects.

Each row object is "dual-typed" (overloaded) to act as either an ARRAY or HASH reference to the cells in that row.

When used as a HASH ref, the key may be a alias, column title, letter-code etc. (any COLSPEC). When used as an ARRAY ref, the 0-based index specifies the column.

@linenums

The first line numbers of corresponding rows (a row can contain multiple lines if cells contain embedded newlines). Valid only if the data came from a CSV file.

$num_cols

The number of columns in the widest input row. Shorter rows are padded with empty cells when read so that all rows have the same number of columns in memory.

$title_rx and $title_row

$title_rx contains the 0-based row index of the title row and $title_row is the same as $rows[ $title_rx ].

The title row is auto-detected by default. See title_rx for how to control this.

If a column title is modified, set $title_rx = undef; to force re-detection.

$first_data_rx and $last_data_rx

Optional limits on the range of rows visited by apply() or sorted by sort_rows(). By default $first_data_rx is the first row following the title row (or 0 if no title row).

%colx (column key => column index)

%colx maps aliases, titles, etc. (all currently-valid COLSPECs) to the corresponding zero-based column indicies. See "COLSPECS" .

%colx_desc (column key => "debugging info")

COLSPECs (COLUMN SPECIFIERS)

Arguments which specify columns may be:

(1) a user-defined alias identifier
(2) an actual "column title" **
(3) a title with any leading & trailing spaces removed *
(4) an AUTOMATIC ALIAS identifier *
(5) a spreadsheet letter code (A,B,...Z, AA etc.) *
(6) a Regexp (qr/.../) which matches an actual title
(7) a numeric column index (0-based)
(8) '^' or '$' (means first or last column, respectively)

*These may only be used if they do not conflict an item listed higher up.

**Titles may be used directly if they can not be confused with a user-defined alias, the special names '^' or '$' or a numeric column index. See "CONFLICT RESOLUTION".

AUTOMATIC ALIASES are Perl identifiers derived from column titles by first removing leading or trailing spaces, and then replacing non-word characters with underscores and prepending an underscore if necessary. For example:

Title             Automatic Alias

"Address"         Address (no change needed)
"  First Name  "  First_Name
"First & Last"    First___Last
"+sizes"          _sizes
"1000s"           _1000s  (underscore avoids leading digit)

Aliases (both automatic and user-defined) are valid identifiers, so can be used as the names of tied variables and as bareword keys to %colx, %crow and related OO interfaces,

CONFLICT RESOLUTION

A conflict occurs when a column key potentially refers to multiple columns. For example, "A", "B" etc. are standard column names, but they might also be titles of other columns in which case those names refer to the other columns. Warnings are printed about conflicts unless the silent option is true (see options).

    User alias identifiers (defined using alias) are always valid.

    '^' and '$' always refer to the first and last column.

    Numeric "names" 0, 1, etc. always give a 0-based column index if the value is between 0 and num_cols (i.e. one past the end).

    Actual Titles refer to to their columns, except if they:

      are the same as a user-defined alias

      are '^' or '$'

      consist only of digits (without leading 0s) corresponding to a valid column index.

    Automatic Aliases and Standard column names ("A", "B", etc.) are available as column keys unless they conflict with a user-defined alias or an actual title.

Note: To unconditionally refer to numeric titles or titles which look like '^' or '$', use a Regexp qr/.../. Automatic Aliases can also refer to such titles if there are no conflicts.

Column positions always refer to the data before a command is executed. This is relevant for commands which re-number or delete columns.

OO DESCRIPTION (OBJECT-ORIENTED INTERFACE)

Spreadsheet::Edit->new(OPTIONS...)

Spreadsheet::Edit->new(clone => $existing_sheet)

Spreadsheet::Edit->new(rows => [rowref,rowref,...], linenums => [...], data_source => "where this came from");

Spreadsheet::Edit->new(num_cols => $number) # no initial content

Creates a new "sheet" object.

METHODS

Sheet objects have methods named identically to all the functions described previously (except for sheet, new_sheet, read_workbook and logmsg). Note that Perl always requires parenthesis around method arguments.

Besides all those, the following methods are available:

$sheet->rows() ; # Analogous to to \@rows

$sheet->linenums() ; # Analogous to \@linenums

$sheet->num_cols() ; # Analogous to $num_cols

$sheet->colx() ; # Analogous to \%colx

$sheet->colx_desc() ; # Analogous to \%colx_desc

$sheet->first_data_rx() ; # Analogous to $first_data_rx

$sheet->last_data_rx() ; # Analogous to $last_data_rx

$sheet->title_row() ; # Analogous to $title_row

$sheet->rx() ; # Current rx in apply, analogous to to $rx

$sheet->crow(); # Current row in apply (a dual-typed row object)

$sheet->linenum() ; # Analogous to to $linenum

$sheet->title_rx() ; # Analogous to to $title_rx

$sheet->title_rx(rxvalue) ; # (Re-)set the title row index

$sheet->get(rx,ident) ; # Analogous to to $rows[rx]{ident}

$sheet->set(rx,ident,value); # Analogous to to $rows[rx]{ident} = value

$sheet->data_source(); # "description of sheet" (e.g. path read)

$sheet->sheetname(); # valid if input was a spreadsheet, else undef

SEE ALSO

Spreadsheet::Edit::Preload

BUGS

Some vestigial support for formats remains from an earlier implementation, but this support is likely to be entirely replaced at some point.

Spreadsheets are currently read using the external programs gnumeric or unoconv to convert to a temporary CSV file. unoconv has a bug where reading will fail if Open/Libre Office is currently running, even on an unrelated document. This may be fixed in the future by using Spreadsheet::Read instead, although it uses modules based on Twig which are quite slow.

THREAD SAFETY

Unknown, and probably not worth the trouble to find out. The author wonders whether tied variables are compatible with the implementation of threads::shared. Even the OO API uses tied variables (for the magical row objects which behave as either an array or hash reference).

FUTURE IDEAS

Add format-processing support.
Add "column-major" views, to access a whole column as an array. Perhaps @cols and %cols would be sets of column arrays (@cols indexed by column index, %cols indexed by any COLSPEC). And tie_column_vars '@NAME' would tie user array variables to columns.

AUTHOR / LICENSE

Jim Avera (jim.avera at gmail). Public Domain or CC0.