NAME

Spreadsheet::Edit - Slice and dice spreadsheets, optionally using tied variables.

NON-OO SYNOPSIS

use Spreadsheet::Edit qw(:all);

# Examples assume a spreadsheet with these titles in the first row:
# "Account Number"  "Customer's Name"  "Email"  "Home-phone"  "Income"

read_spreadsheet "mailing_list.xls!Sheet1";

# alias an identifier to a long or complicated title
alias Name => qr/customer/i;  # matches "Customer's Name"

# ------------ without tied column variables -----------

# Print the data
printf "%20s %8s %8s %-13s %s\n", "Name","A/N","Income","Phone","Email";
apply {
  printf "%20s %8d %8.2f %-13s %s\n",
         $crow{Name},              # this key is an explicit alias
         $crow{"Account Number"},  #   ...actual title
         $crow{Income},            #   ...actual title
         $crow{Home_phone},        #   ...auto-generated alias
         $crow{Email} ;            #   ...actual title
};

# Randomly access rows.
print "Row 42: Column 'C' is ",      $rows[41]{C},    "\n";
print "Row 42: Customer's Name is ", $rows[41]{Name}, "\n";
print "Row 42: 3rd column is ",      $rows[41][2],    "\n";

# Split the "Customer's Name" into separate FName and LName columns
insert_cols '>Name', "FName", "LName";
apply {
  ($crow{FName}, $crow{LName}) = ($crow{Name} =~ /(.*) (.*)/)
    or die logmsg "Could not parse Name"; # logmsg adds current row number
};
delete_cols "Name";

# Sort by last name
sort_rows { $a->{LName} cmp $b->{LName} };

# ------------ using tied column variables -----------

our $Name;            # 'Name' is the explicit alias created above
our $Account_Number;  # Auto-generated alias for "Account Number"
our $Home_phone;      #   ditto
our $Income;          # 'Income' is an actual title
our $Email;           #   ditto
our ($FName, $LName); # These columns do not yet exist

tie_column_vars "Name", "Account_Number", qr/phone/, qr/^inc/i, "FName", "LName";

# Print the data
printf "%20s %8s %8s %-13s %s\n", "Name","A/N","Income","Phone","Email";
apply {
  printf "%20s %8d %8.2f %-13s%s\n",
         $Name, $Account_Number, $Income, $Home_phone, $Email;
};

# Split the "Customer's Name" into separate FName and LName columns
insert_cols '>Name', "FName", "LName";
apply {
  ($FName, $LName) = ($Name =~ /^(\S+) (\S+)$/)
    or die logmsg "Could not parse Name";
};
delete_cols "Name";

# Simple mail-merge
use POSIX qw(strftime);
apply {
  return
    if $Income < 100000;  # not in our audience
  open SENDMAIL, "|sendmail -t -oi" || die "pipe:$!";
  print SENDMAIL "To: $FName $LName <$Email>\n";
  print SENDMAIL strftime("Date: %a, %d %b %y %T %z\n", localtime(time));
  print SENDMAIL <<EOF ;
From: sales\@example.com
Subject: Help for the 1%

Dear $FName,
  If you have disposable income, we can help with that.
Sincerely,
Your investment advisor.
EOF
  close SENDMAIL || die "sendmail failed ($?)\n";
};

# ------------ multiple sheets --------------

our ($Foo, $Bar, $Income);

read_spreadsheet "file1.csv";
tie_column_vars;              # tie all vars that ever become valid

my $s1 = sheet undef ;        # Save ref to current sheet & forget it

read_spreadsheet "file2.csv"; # Auto-creates sheet bc current is undef
tie_column_vars;

my $s2 = sheet ;              # Save ref to second sheet

print "$Foo $Bar $Income\n";  # these refer to $s2, the current sheet

sheet $s1 ;
print "$FName $LName $Income\n"; # these now refer to the original sheet

# ------------ create sheet from memory --------------

my $s3 = new_sheet
    data_source => "my own data",
    rows => [
      ["This is a row before the title row"                       ],
      ["Full Name",  "Address",         "City",   "State", "Zip"  ],
      ["Joe Smith",  "123 Main St",     "Boston", "CA",    "12345"],
      ["Mary Jones", "999 Olive Drive", "Fenton", "OH",    "67890"],
    ],
    ;
$s3=>title_rx(1);
...

OO SYNOPSIS

use Spreadsheet::Edit ();

my $sheet = Spreadsheet::Edit->new();
$sheet->read_spreadsheet("mailing_list.xls!sheet name");
$sheet->alias( Name => qr/customer/i );  # matches "Customer's Name"

# Randomly access rows.
# Sheet objects, when used as an ARRAYref, act like \@rows
print "Row 42: Name is ",     $sheet->[41]{Name}, "\n";
print "Row 42, Column 3 is ", $sheet->[41][2],    "\n";

# Print the data.
# Sheet objects, when used as an HASHref, act like \%crow
printf "%20s %8s %8s %-13s %s\n", "Name","A/N","Income","Phone","Email";
$sheet->apply( sub{
    printf "%20s %8d %8.2f %-13s%s\n",
           $sheet->{Name},
           $sheet->{"Account Number"},
           $sheet->{Income},
           $sheet->{Home_phone},
           $sheet->{Email} ;
});

# Another way:
$sheet->apply( sub{
    my $r = $sheet->crow();
    printf "%20s %8d %8.2f %-13s%s\n",
           $r->{Name}, $r->{"Account Number"}, $r->{Income},
           $r->{Home_phone}, $r->{Email} ;
});

# Another way:
$sheet->apply( sub{
    my $r = $sheet->crow();
    printf "%20s %8d %8.2f %-13s%s\n",
           $r->[0], $r->[1], $r->[4], $r->[3], $r->[2] ;
});

# Split the "Customer's Name" into separate FName and LName columns
$sheet->insert_cols('>Name', "FName", "LName");
$sheet->apply( sub {
    my $r = $sheet->crow();
    ($r->{FName}, $r->{LName}) = ($r->{Name} =~ /(.*) (.*)/)
      or die Spreadsheet::Edit::logmsg("Could not parse Name");
});
$sheet->delete_cols( "Name" );

INTRODUCTION

    Skip ahead to "FUNCTIONS" for a list of operations (OO methods are named the same).

Columns may be referenced by title without knowing their absolute positions. Optionally global (package) variables may be tied to columns. Tabular data can come from Spreadsheets, CSV files, or your code, and be written similarly. Both Functional and Object-Oriented (OO) APIs are provided.

A table in memory is stored in a sheet object, which contains an array of rows, each of which is an array of cell values.

Cells within a row may be accessed by name (e.g. column titles) or by absolute position (letter code like "A", "B" etc. or zero-based index).

The usual paradigm is to iterate over rows applying a function to each, vaguely inspired by 'sed' and 'awk' (see apply below). Random access is also supported.

Tied variables can be used with the functional API. These variables refer to columns in the current row during a apply operation.

Note: Only cell values are handled; there is no provision for processing formatting information from spreadsheets. The author has a notion to add support for formats, perhaps integrating with Spreadsheet::Read and Spreadsheet::Write or the packages they use. Please contact the author if you want to help.

HOW TO IMPORT

By default only functions are imported, but most people will

use Spreadsheet::Edit ':all';

to import functions and helper variables (see STANDARD SHEET VARIABLES and VARIABLES USED DURING APPLY).

You can rename imported items using the '-as' notation shown in Exporter::Tiny::Manual::QuickStart.

Purely-OO applications can "use Spreadsheet::Edit ();".

THE 'CURRENT SHEET'

The Functions and helper variables implicitly operate on a package-global "current sheet" object, which can be switched at will. OO Methods operate on the sheet object they are called on.

Functions which operates on the "current sheet" have corresponding OO methods with the same names and arguments (note that method args must be enclosed by parenthesis).

TIED COLUMN VARIABLES

Package variables can refer directly to columns in the 'current sheet' during apply. For example $Email and $FName in the SYNOPSIS above.

See tie_column_vars for details.

THE FUNCTIONS

In the following, {OPTIONS} refers to an optional first argument which, if present, is a hashref giving additional parameters. For example in

read_spreadsheet {sheetname => 'Sheet1'}, '/path/to/file.xlsx';

the {...} hash is optional and specifies the sheet name.

$curr_sheet = sheet ;

$prev_sheet = sheet $another_sheet ;

$prev_sheet = sheet undef ;

[Functional API only] Retrieve, change, or forget the 'current sheet' object used by the functional API.

Changing the current sheet immediately changes what is referenced by tied column variables and STANDARD SHEET VARIABLES (described later).

{OPTIONS} may specify package => 'pkgname' to operate on the specified package instead of the caller's package (useful in library code).

read_spreadsheet CSVFILEPATH

read_spreadsheet SPREADSHEETPATH

read_spreadsheet "SPREADSHEETPATH!SHEETNAME"

Replace any existing data with content from the given file.

The Functional API will create a new sheet object if there is no "current sheet".

The file may be a .csv or any format supported by Libre Office or gnumeric.

By default column titles are auto-detected and an exception is thrown if a plausible title row can not be found.

{OPTIONS} may include:

Auto-detection options:

    required   => COLSPEC or [COLSPEC,...]  # any required title(s)
    min_rx     => NUM,  # first rx which may contain the title row.
    max_rx     => NUM,  # maximum rx which may contain the title row.
    first_cx   => NUM,  # first column ix which must contain required titles
    last_cx    => NUM,  # last column ix which must contain required titles

The first row is used which includes the required title(s), if any, and has non-empty titles in all columns, or columns first_cx through last_cx.

    title_rx => rx     # specify title row (first row is 0)
    title_rx => undef  # no title row

These disable auto-detection and explicitly specify the title row index, or with undef, that there are no titles. See also the title_rx function/method.

Other options:

sheetname => SHEETNAME

Specify which sheet in a multi-sheet workbook (i.e. spreadsheet file) to read. Alternatively, the sheet name may be appended to the input path after '!' as shown in the example.

If no SHEETNAME is given then the sheet which was "active" when the workbook was saved will be retrieved.

silent => bool
verbose => bool
debug => bool

Probably what you expect.

Other key => value pairs override details of CSV parsing.

See Text::CSV. UTF-8 encoding is assumed by default.

Due to bugs in Libre/Open Office, spreadsheet files can not be read if LO/OO is currently running, even for unrelated purposes (see "BUGS"). This problem does not occur with .csv files

alias IDENT => COLSPEC, ... ;

alias IDENT => qr/regexp/, ... ;

Create alternate identifiers for specified columns.

Each IDENT, which must be a valid Perl identifier, will henceforth refer to the specified column even if the identifier is the same as the title or letter code of a different column.

$row{IDENT} and a tied variable $IDENT will refer to the specified column.

Aliases automatically track the column if it's position changes.

Regular Expressions are matched against titles only, and an exception is thrown if more than one title matches.

Otherwise, COLSPECs may be titles, existing alias names, column letters, etc. (see "COLUMN SPECIFIERS" for details).

The COLSPEC is evaluated before the alias is created, so

alias B => "B";

would make "B" henceforth refer to the column with title "B", if one exists, or the second column (treating COLSPEC as a letter-code), even if the column is later moved.

RETURNS: The 0-based column indices of the aliased column(s).

unalias IDENT, ... ;

Forget alias(es). Any masked COLSPECs become usable again.

tie_column_vars VARNAME, ...

Create tied package variables (scalars) for use during apply.

Each variable is a scalar corresponding to a column, and reading or writing it accesses the corresponding cell in the row being visited during apply.

The '$' may be omitted in the VARNAME arguments to tie_column_vars;

You must separately declare these variables with our $NAME, except in the special case described in "Use in BEGIN() or module import methods" later.

The variable name itself implies the column it refers to.

Variable names may be:

  • User-defined alias names (see "alias")

  • Titles which happen to be valid Perl identifiers

  • Identifiers derived from titles by replacing offending characters with underscrores (see "AUTOMATIC ALIASES"),

  • Spreadsheet column letters like "A", "B" etc.

See "CONFLICT RESOLUTION" for how ambiguity is resolved.

Multiple calls accumulate, including with different sheets.

Variable bindings are dynamically evaluated during each access by using the variable's identifier as a COLSPEC with the 'current sheet' in your package. This means that it does not matter which sheet was 'current' when tie_column_vars was called with a particular name; it only matters that the name of a tied variable is a valid COLSPEC in the 'current sheet' when that variable is referenced (otherwise a read returns undef and a write throws an exception). [*Need clarification*]

{OPTIONS} may specify:

package => "pkgname"

Tie variables in the specified package instead of the caller's package.

verbose => bool
debug => bool

Print trace messages.

tie_column_vars ':all'

With the :all token all possible variables are tied, corresponding to the aliases, titles, non-conflicting column letters etc. which exist for the current sheet.

In addition, variables will be tied in the future whenever new identifiers become valid (for example when a new alias is created, column added, or another file is read into the same sheet).

Although convenient this is insecure because malicious titles could clobber unintended globals.

If VARNAMES are also specified, those variables will be tied immediately even if not yet usable; an exception occurs if a tied variable is referenced before the corresponding alias or title exists. [*Need clarification* -- exception even for reads??]

Use in BEGIN{} or module import methods

tie_column_vars imports the tied variables into your module, or the module specified with package => "pkgname" in {OPTIONS}.

It is unnecessary to declare tied variables if the import occurs before code is compiled which references the variables. This can be the case if tie_column_vars is called in a BEGIN{} block or in the import method of a module loaded with use.

Spreadsheet::Edit::Preload makes use of this.

$rowindex = title_rx ;

Retrieve the current title row rx, or undef if there are no titles

title_rx ROWINDEX ;

Make the specified row be the title row. Titles in that row are immediately (re-)examined and the corresponding COLSPECs become valid, e.g. you can reference a column by it's title or a derived identifier.

Note: Setting title_rx this way is rarely needed because by default read_spreadsheet sets the title row.

title_rx undef ;

Titles are disabled and any existing COLSPECs derived from titles are invalidated.

title_rx {AUTODETECT_OPTIONS} 'auto';

Immediately perform auto-detection of the title row using {AUTODETECT_OPTIONS} to modify any existing auto-detect options (see read_spreadsheet for a description of the options).

apply {code} [COLSPEC*] ;

apply_all {code} [COLSPEC*] ;

apply_torx {code} RX-OR-RXLIST [,COLSPEC*] ;

apply_exceptrx {code} RX-OR-RXLIST [,COLSPEC*] ;

Execute the specified code block (or referenced sub) once for each row.

Note that there is no comma after a bare {code} block.

While executing your code, tied column variables and the sheet variables @crow, %crow, $rx and $linenum and corresponding OO methods will refer to the row being visited.

If a list of COLSPECs is specified, then

    @_ is bound to the columns in the order specified
    $_ is bound to the first such column

apply normally visits all rows which follow the title row, or all rows if there is no title row. first_data_rx and last_data_rx, if defined, further limit the range visited.

apply_all unconditionally visits every row, including any title row.

apply_torx or apply_exceptrx visit exactly the indicated rows. RX-OR-RXLIST may be either a single row index or a [list of rx];

Rows may be safely inserted or deleted during 'apply'; rows inserted after the currently-being-visited row will be visited at the proper time.

Nested and recursive applys are allowed. When an 'apply' changes the 'current sheet', tied variables then refer to the other sheet and any apply active for that sheet. With nested 'apply's, take care restore the original sheet before returning (perhaps using Guard::scope_guard).

MAGIC VARIABLES USED DURING APPLY

These variables refer to the row currently being visited:

    @crow is an array aliased to the current row's cells.

    %crow is a hash aliased to the same cells, indexed by alias, title, letter code, etc. (any COLSPEC).

    $rx is the 0-based index of the current row.

    $linenum is the starting line number of the current row if the data came from a .csv file.

    For example, the "Account Number" column in the SYNOPSIS may be accessed many ways:

    alias AcNum => "Account Number";
    apply {
    
      $crow{"Account Number"}           # %crow indexed by title
      $crow{AcNum}                      #   using an explicit alias
      $crow{Account_Number}             #   using the AUTOMATIC ALIAS
    
      $crow[ $colx{"Account Number"} ]; # @crow indexed by a 0-based index
      $crow[ $colx{"AcNum"} ];          #  ...obtained from %colx
      $crow[ $colx{"Account_Number"} ]; #
    
      $rows[$rx]->[ $colx{Account_Number} ] # Directly accessing @rows
    
      # See "TIED COLUMN VARIABLES" for a sweeter alternative
    };

delete_col COLSPEC ;

delete_cols COLSPEC+ ;

The indicated columns are removed. Remaining title bindings are adjusted to track shifted columns.

only_cols COLSPEC+ ;

All columns except the specified columns are deleted.

move_col POSITION, SOURCE ;

move_cols POSITION, SOURCES... ;

Relocate the indicated column(s) (SOURCES) so they are adjacent, in the order specified, starting at the position POSITION.

POSITION may be ">COLSPEC" to place moved column(s) immediately after the indicated column (">$" to place at the end), or POSITION may directly specify the destination column using an unadorned COLSPEC.

A non-absolute COLSPEC indicates the initial position of the referenced column.

insert_col POSITION, newtitle ;

insert_cols POSITION, newtitles... ;

One or more columns are created starting at a position specified the same way as in move_cols (later columns are moved rightward).

POSITION may be ">$" to place new column(s) at the far right.

A new title must be specified for each new column. If there is no title row, specify undef for each position.

Returns the new column index or indices.

split_col {code} COLSPEC, POSITION, newtitles... ;

New columns are created starting at POSITION as with insert_cols, and populated with data from column COLSPEC.

{code} is called for each row with $_ bound to the cell at COLSPEC and @_ bound to cell(s) in the new column(s). It is up to your code to read the old column ($_) and write into the new columns (@_).

The old column is left as-is (not deleted).

If there is no title row, specify undef for each new title.

sort_rows {rx cmp function}

sort_rows {rx cmp function} $first_rx, $last_rx

If no range is specified, then the range is the same as for apply (namely: All rows after the title row unless limited by first_data_rx .. last_data_rx).

In the comparison function, globals $a and $b will contain row objects, which are dual-typed to act as either an array or hash ref to the cells in their row. The corresponding original row indicies are also passed as parameters in @_.

Rows are not actually moved until after all comparisons have finished.

RETURNS: A list of the previous row indicies of all rows in the sheet.

# Sort on the "LName" column using row indicies
# (contrast with the example in SYNOPSIS which uses $a and $b)
sort_rows { my ($rxa, $rxb) = @_;
            $rows[$rxa]{LName} cmp $rows[$rxb]{LName}
          };

rename_cols COLSPEC, "new title", ... ;

Multiple pairs may be given. Title cell(s) are updated as indicated.

Existing user-defined aliases are not affected, i.e., they continue to refer to the same columns as before even if the titles changed.

join_cols_sep STRING COLSPEC+ ;

join_cols {code} COLSPEC+ ;

The specified columns are combined into the first-specified column and the other columns are deleted.

The first argument of join_cols_sep should be a fixed separator. The first argument of join_cols may be a {code} block or subref;

If a separator string is specified it is used to join column content together.

If a {code} block or sub ref is specified, it is executed once for each row following the title row, with $_ bound to the first-named column, i.e. the surviving column, and @_ bound to all named columns in the order given.

It is up to your code to combine the data by reading @_ and writing $_ (or, equivalently, by writing $_[0]).

first_data_rx and last_data_rx are ignored, and the title is not modified.

reverse_cols

The order of the columns is reversed.

insert_row

insert_row 'END' [,$count]

insert_rows $rowx [,$count]

Insert one or more empty rows at the indicated position (default: at end). $rowx, if specified, is either a 0-based offset for the new row or 'END' to add the new row(s) at the end. Returns the index of the first new row.

delete_rows $rowx,... ;

delete_rows 'LAST',... ;

The indicated data rows are deleted. $rowx is a zero-based row index or the special token "LAST" to indicate the last row (same as $#rows). Any number of rows may be deleted in a single command, listed individually.

transpose

Invert the relation, i.e. rotate and flip the table. Cells A1,B1,C1 etc. become A1,A2,A3 etc. Any title_rx is forgotten.

$href = read_workbook SPREADSHEETPATH

**NOT YET IMPLEMENTED**

[Function only, not callable as a method] All sheets in the specified document are read into memory without changing the 'current sheet'. A hashref is returned:

{
  "sheet name" => (Spreadsheet::Edit object),
  ...for each sheet in the workbook...
}

To access one of the workbook sheets, execute

sheet $href->{"sheet name"};  # or call OO methods on it

If SPREADSHEETPATH was a .csv file then the resulting hash will have only one member with an indeterminate key.

new_sheet

[functional API only] Create a new empty sheet and make it the 'current sheet', returning the sheet object.

Rarely used because a new sheet is automatically created by read_spreadsheet if your package has no current sheet.

{OPTIONS} may include:

data_source => "text..."

This string will be returned by the data_source method, overriding any default.

rows => [[A1_value,B1_value,...], [A2_value,B2_value,...], ...],
linenums => [...] #optional

This makes the sheet object hold data already in memory. The data should not be modified directly while the sheet object exists.

clone => $existing_sheet

A deep copy of an existing sheet is made.

num_cols => $number # with no initial content

An empty sheet is created but with a fixed number of columns. When rows are later created they will be immediately padded with empty cells if necessary to this width.

write_csv *FILEHANDLE

write_csv $path

Write the current data to the indicated path or open file handle as a CSV text file. The default encoding is UTF-8 or, if read_spreadsheet was most-recently used to read a csv file, the encoding used then.

{OPTIONS} may include

options for Text::CSV

Usually none need be specified because we supply sane defaults.

silent => bool
verbose => bool
debug => bool

write_spreadsheet OUTPUTPATH

Write the current data to a spreadsheet (.ods, .xlsx, etc.) by first writing to a temporary CSV file and then importing that file into a new spreadsheet.

{OPTIONS} may include

col_formats => [ LIST ]

EXPERIMENTAL, likely to change when Spreadsheet::Read is integrated!

Elements of LIST may be "" (Standard), "Text", "MM/DD/YY", "DD/MM/YY", or "YY/MM/DD" to indicate the format of the corresponding column. The meaning of "Standard" is not well documented but appears to mean "General Number" in most cases. For details, see "Format Codes" in this old Open Office documentation.

silent => bool
verbose => bool
debug => bool

options NAME => EXPR, ... ;

options NAME ;

Set or retrieve miscellaneous sheet-specific options. When setting, the previous value of the last option specified is returned. The only options currently defined are silent, verbose and debug.

$hash = attributes ;

Returns a reference to a hash in which you may store arbitrary data associated with the sheet object.

spectocx COLSPEC or qr/regexp/, ... ;

Returns the 0-based indicies of the specified colomn(s). Throws an exception if there is no such column. A regexp may match multiple columns. See also %colx.

logmsg [FOCUSARG,] string, string, ...

(must be explicitly imported)

Concatenate strings, prefixed by a description of the 'current sheet' and row during apply, if any (or with the sheet and/or row given by FOCUSARG).

The resulting string is returned, with "\n" appended if it was not already terminated by a newline.

The first argument is used as FOCUSARG if it is a sheet object, [sheet_object], or [sheet_object, rowindex], and specifies the sheet and/or row to describe in the message prefix. Otherwise the first argument is not special and is simply the first message string.

The details of formatting the sheet may be customized with a call-back given by a {logmsg_pfx_gen} attribute. See comments in the source for how this works.

STANDARD SHEET VARIABLES

These variables magically access the 'current sheet' in your package.

@rows

The spreadsheet data as an array of row objects.

Each row object is "dual-typed" (overloaded) to act as either an ARRAY or HASH reference to the cells in that row.

When used as a HASH ref, the key may be a alias, column title, letter-code etc. (any COLSPEC). When used as an ARRAY ref, the 0-based index specifies the column.

@linenums

The first line numbers of corresponding rows (a row can contain multiple lines if cells contain embedded newlines). Valid only if the data came from a CSV file.

$num_cols

The number of columns in the widest input row. Shorter rows are padded with empty cells when read so that all rows have the same number of columns in memory.

$title_rx and $title_row

$title_rx contains the 0-based row index of the title row and $title_row is an alias for $rows[ $title_rx ].

The title row is auto-detected by default. See read_spreadsheet and title_rx for how to control this.

If a column title is modified, set $title_rx = undef; to force re-detection.

$first_data_rx and $last_data_rx

Optional limits on the range of rows visited by apply() or sorted by sort_rows(). By default $first_data_rx is the first row following the title row (or 0 if no title row).

%colx (column key => column index)

%colx maps aliases, titles, etc. (all currently-valid COLSPECs) to the corresponding zero-based column indicies. See "COLSPECS" .

%colx_desc (column key => "debugging info")

COLSPECs (COLUMN SPECIFIERS)

Arguments which specify columns may be:

(1) a user-defined alias identifier
(2) an actual "column title" **
(3) an actual title with any leading & trailing spaces removed *
(4) an AUTOMATIC ALIAS identifier *
(6) a Regexp (qr/.../) which matches an actual title
(7) a numeric column index (0-based)
(8) '^' or '$' (means first or last column, respectively)

*These may only be used if they do not conflict with an item listed higher up.

**Titles may be used directly if they can not be confused with a user-defined alias, the special names '^' or '$' or a numeric column index. See "CONFLICT RESOLUTION".

AUTOMATIC ALIASES are Perl identifiers derived from column titles by first removing leading or trailing spaces, and then replacing non-word characters with underscores and prepending an underscore if necessary. For example:

Title             Automatic Alias

"Address"         Address (no change needed)
"  First Name  "  First_Name
"First & Last"    First___Last
"+sizes"          _sizes
"1000s"           _1000s  (underscore avoids leading digit)

Aliases (both automatic and user-defined) are valid identifiers, so can be used as the names of tied variables, bareword keys to %colx and %crow, and related OO interfaces,

CONFLICT RESOLUTION

A conflict occurs when a column key potentially refers to multiple columns. For example, "A", "B" etc. are standard column names, but they might also be the actual titles of other columns. Warnings are printed about conflicts unless the silent option is true (see options).

    User alias identifiers (defined using alias) are always valid.

    '^' and '$' always refer to the first and last column.

    Numeric "names" 0, 1, etc. always give a 0-based column index if the value is between 0 and num_cols (i.e. one past the end).

    Actual Titles refer to to their columns, except if they:

      are the same as a user-defined alias

      are '^' or '$'

      consist only of digits (without leading 0s) corresponding to a valid column index.

    Automatic Aliases and Standard column names ("A", "B", etc.) are available as column keys unless they conflict with a user-defined alias or an actual title.

Note: To unconditionally refer to numeric titles or titles which look like '^' or '$', use a Regexp qr/.../. Automatic Aliases can also refer to such titles if there are no conflicts.

Column positions always refer to the data before a command is executed. This is relevant for commands which re-number or delete columns.

OO DESCRIPTION (OBJECT-ORIENTED INTERFACE)

All the Functions listed above (except for new_sheet) have corresponding methods with the same arguments.

However method arguments must be enclosed in parenthesis and bare {code} blocks may not be used; a sub{...} ref should be passed to apply etc.

OO-SPECIFIC METHODS

Spreadsheet::Edit->new(OPTIONS...)

Creates a new "sheet" object.

OPTIONS are the same as described for the new_sheet Function above, except that they may be specified as key => value pairs of arguments instead of (or in addition to) an {OPTIONS} hashref.

$sheet->rows() ; # Analogous to to \@rows

$sheet->linenums() ; # Analogous to \@linenums

$sheet->num_cols() ; # Analogous to $num_cols

$sheet->colx() ; # Analogous to \%colx

$sheet->colx_desc() ; # Analogous to \%colx_desc

$sheet->first_data_rx() ; # Analogous to $first_data_rx

$sheet->last_data_rx() ; # Analogous to $last_data_rx

$sheet->title_rx() ; # Analogous to to $title_rx

$sheet->title_row() ; # Analogous to $title_row

$sheet->rx() ; # Current rx in apply, analogous to to $rx

$sheet->title_rx(rxvalue) ; # Analogous to assigning to $title_rx (changes title row)

$sheet->crow(); # Current row in apply (a dual-typed row object)

$sheet->linenum() ; # Analogous to to $linenum

$sheet->get(rx,ident) ; # Analogous to to $rows[rx]{ident}

$sheet->set(rx,ident,value); # Analogous to to $rows[rx]{ident} = value

$sheet->data_source(); # Returns "description of sheet" (e.g. path read)

$sheet->sheetname(); # valid if input was a spreadsheet, else undef

SEE ALSO

Spreadsheet::Edit::Preload

BUGS

Some vestigial support for formats remains from an earlier implementation, but this support is likely to be entirely replaced at some point.

Reading a spreadsheet (but not csv) may fail if Libre Office or Open Office are currently running for any purpose; this seems to be a bug or limitation where batch-mode operations share the same profile as interactive sessions. In any case, ssconvert (gnumeric) will be used if it is installed, and does not have this limitation. In the future Spreadsheet::Read might be used instead of external programs, although it uses Twig and is quite a bit slower.

THREAD SAFETY

Unknown, and probably not worth the trouble to find out.

FUTURE IDEAS

Add format-processing support.
Add "column-major" views.

This would allow accessing a whole column as an array. Perhaps @cols and %cols would be sets of column arrays (@cols indexed by column index, %cols indexed by any COLSPEC). And tie_column_vars '@NAME' would tie user array variables to columns.

AUTHOR

Jim Avera (jim.avera at gmail)

LICENSE

Public Domain or CC0.