NAME
Spreadsheet::Edit - Slice and dice spreadsheets, optionally using tied variables.
NON-OO SYNOPSIS
use Spreadsheet::Edit qw(:all);
# Examples assume a spreadsheet with these titles in the first row:
# "Account Number" "Customer's Name" "Email" "Home-phone" "Income"
read_spreadsheet "mailing_list.xls!Sheet1";
# alias an identifier to a long or complicated title
alias Name => qr/customer/i; # matches "Customer's Name"
# ------------ without tied column variables -----------
# Print the data
printf "%20s %8s %8s %-13s %s\n", "Name","A/N","Income","Phone","Email";
apply {
printf "%20s %8d %8.2f %-13s %s\n",
$crow{Name}, # this key is an explicit alias
$crow{"Account Number"}, # ... actual title
$crow{Income}, # ... actual title
$crow{Home_phone}, # ... auto-generated alias
$crow{Email} ; # ... actual title
};
# Randomly access rows.
print "Row 42: Column 'C' is ", $rows[41]{C}, "\n";
print "Row 42: 3rd column is ", $rows[41][2], "\n";
print "Row 42: Customer's Name is ", $rows[41]{Name}, "\n";
# Split the "Customer's Name" into separate FName and LName columns
insert_cols '>Name', "FName", "LName";
apply {
($crow{FName}, $crow{LName}) = ($crow{Name} =~ /(.*) (.*)/)
or die logmsg "Could not parse Name"; # logmsg adds current row number
};
delete_cols "Name";
# Sort by last name
sort_rows { $a->{LName} cmp $b->{LName} };
# ------------ using tied column variables -----------
our $Name; # 'Name' is the explicit alias created above
our $Account_Number; # Auto-generated alias for "Account Number"
our $Home_phone; # ditto for "Home-phone"
our $Income; # 'Income' is an actual title
our $Email; # and so is 'Email'
our ($FName, $LName); # These columns do not yet exist
tie_column_vars "Name", "Account_Number",
qr/phone/, qr/^inc/i, "FName", "LName";
# Print the data
printf "%20s %8s %8s %-13s %s\n", "Name","A/N","Income","Phone","Email";
apply {
printf "%20s %8d %8.2f %-13s%s\n",
$Name, $Account_Number, $Income, $Home_phone, $Email;
};
# Split the "Customer's Name" into separate FName and LName columns
insert_cols '>Name', "FName", "LName";
apply {
($FName, $LName) = ($Name =~ /^(\S+) (\S+)$/)
or die logmsg "Could not parse Name";
};
delete_cols "Name";
# Simple mail-merge
use POSIX qw(strftime);
apply {
return
if $Income < 100000; # not in our audience
open SENDMAIL, "|sendmail -t -oi" || die "pipe:$!";
print SENDMAIL "To: $FName $LName <$Email>\n";
print SENDMAIL strftime("Date: %a, %d %b %y %T %z\n", localtime(time));
print SENDMAIL <<EOF ;
From: sales\@example.com
Subject: Help for the 1%
Dear $FName,
If you have disposable income, we can help with that.
Sincerely,
Your stock broker.
EOF
close SENDMAIL || die "sendmail failed ($?)\n";
};
# ------------ multiple sheets --------------
our ($Foo, $Bar, $Income);
read_spreadsheet "file1.csv";
tie_column_vars ':all'; # tie all vars that ever become valid
my $s1 = sheet undef ; # Save ref to current sheet & forget it
read_spreadsheet "file2.csv"; # Auto-creates sheet bc current is undef
tie_column_vars ':all';
my $s2 = sheet ; # Save ref to second sheet
print "$Foo $Bar $Income\n"; # these refer to $s2, the current sheet
sheet $s1 ;
print "$FName $LName $Income\n"; # these now refer to the original sheet
# ------------ create sheet from memory --------------
my $s3 = new_sheet
data_source => "my own data",
rows => [
["This is a row before the title row" ],
["Full Name", "Address", "City", "State", "Zip" ],
["Joe Smith", "123 Main St", "Boston", "CA", "12345"],
["Mary Jones", "999 Olive Drive", "Fenton", "OH", "67890"],
],
;
$s3->title_rx(1);
...
OO SYNOPSIS
use Spreadsheet::Edit ();
my $sheet = Spreadsheet::Edit->new();
$sheet->read_spreadsheet("mailing_list.xls!sheet name");
$sheet->alias( Name => qr/customer/i ); # matches "Customer's Name"
# Randomly access rows.
# Sheet objects, when used as an ARRAYref, act like \@rows
print "Row 42: Name is ", $sheet->[41]{Name}, "\n";
print "Row 42, Column 3 is ", $sheet->[41][2], "\n";
# Print the data.
# Sheet objects, when used as an HASHref, act like \%crow
printf "%20s %8s %8s %-13s %s\n", "Name","A/N","Income","Phone","Email";
$sheet->apply( sub{
printf "%20s %8d %8.2f %-13s%s\n",
$sheet->{Name},
$sheet->{"Account Number"},
$sheet->{Income},
$sheet->{Home_phone},
$sheet->{Email} ;
});
# Another way:
$sheet->apply( sub{
my $r = $sheet->crow();
printf "%20s %8d %8.2f %-13s%s\n",
$r->{Name}, $r->{"Account Number"}, $r->{Income},
$r->{Home_phone}, $r->{Email} ;
});
# Another way:
$sheet->apply( sub{
my $r = $sheet->crow();
printf "%20s %8d %8.2f %-13s%s\n",
$r->[0], $r->[1], $r->[4], $r->[3], $r->[2] ;
});
# Split the "Customer's Name" into separate FName and LName columns
$sheet->insert_cols('>Name', "FName", "LName");
$sheet->apply( sub {
my $r = $sheet->crow();
($r->{FName}, $r->{LName}) = ($r->{Name} =~ /(.*) (.*)/)
or die Spreadsheet::Edit::logmsg("Could not parse Name");
});
$sheet->delete_cols( "Name" );
INTRODUCTION
You may want to skip ahead to "LIST OF FUNCTIONS (and OO methods)".
Columns may be referenced by title without knowing their positions. Optionally, global (i.e. package) variables may be tied to columns and used during apply()
.
Data tables can come from Spreadsheets, CSV files, or your code.
A table in memory (that is, a sheet
object) contains an array of rows, each of which is an array of cell values.
Rows are overloaded to act as either arrays or hashes; when used as a hash, cells are accessed by name (e.g. column titles), or letter code ("A", "B" etc.)
The usual paradigm is to iterate over rows applying a function to each, vaguely inspired by 'sed' and 'awk' (see apply
below). Random access is also supported.
Note: Only cell values are handled; there is no provision for processing formatting information from spreadsheets. The author has a notion to add support for formats, perhaps integrating with Spreadsheet::Read and Spreadsheet::Write or the packages they use. Please contact the author if you want to help.
HOW TO IMPORT
By default only functions are imported, but to fully use the functional API:
use Spreadsheet::Edit ':all';
which imports functions and helper variables (see STANDARD SHEET VARIABLES and VARIABLES USED DURING APPLY).
You can rename imported items using the '-as' notation shown in Exporter::Tiny::Manual::QuickStart.
Purely-OO applications can "use Spreadsheet::Edit ();".
THE 'CURRENT SHEET'
Functions and helper variables implicitly operate on a package-global "current sheet" object, which can be switched at will. OO Methods operate on the sheet
object they are called on.
Functions which operate on the "current sheet" have corresponding OO methods with the same names and arguments (except that method arguments must be enclosed by parenthesis).
TIED COLUMN VARIABLES
Package variables can refer directly to columns in the 'current sheet' during apply
. For example $Email
and $FName
in the SYNOPSIS above.
See tie_column_vars
for details.
LIST OF FUNCTIONS (and OO methods)
In the following, {OPTIONS} refers to an optional first argument which, if present, is a hashref giving additional parameters. For example in
read_spreadsheet {sheetname => 'Sheet1'}, '/path/to/file.xlsx';
the {...} hash is optional and specifies the sheet name.
read_spreadsheet CSVFILEPATH
read_spreadsheet SPREADSHEETPATH
read_spreadsheet "SPREADSHEETPATH!SHEETNAME"
Replace any existing data with content from the given file.
The Functional API will create a new sheet object if there is no "current sheet" (see also sheet
and new_sheet
);
The file may be a .csv or any format supported by Libre Office or gnumeric.
By default column titles are auto-detected and an exception is thrown if a plausible title row can not be found.
{OPTIONS} may include:
Auto-detection options:
required => COLSPEC or [COLSPEC,...] # any required title(s)
min_rx => NUM, # first rx which may contain the title row.
max_rx => NUM, # maximum rx which may contain the title row.
first_cx => NUM, # first column ix which must contain required titles
last_cx => NUM, # last column ix which must contain required titles
The first row is used which includes the required
title(s), if any, and has non-empty titles in all columns, or columns first_cx
through last_cx
.
Or to specify the title row explicitly:
title_rx => rx # title row index (first is 0)
title_rx => undef # specify that there are no titles
See also the title_rx
function/method.
Other options:
- sheetname => SHEETNAME
-
Specify which sheet in a multi-sheet workbook (i.e. spreadsheet file) to read. Alternatively, the sheet name may be appended to the input path after '!' as shown in the example.
If no SHEETNAME is given then the sheet which was "active" when the workbook was saved will be retrieved.
- silent => bool
- verbose => bool
- debug => bool
-
Probably what you expect.
- Other
key => value
pairs override details of CSV parsing. -
See Text::CSV. UTF-8 encoding is assumed by default.
Due to bugs in Libre/Open Office, spreadsheet files can not be read if LO/OO is currently running, even for unrelated purposes (see "BUGS"). This problem does not occur with .csv files
alias IDENT => COLSPEC, ... ;
alias IDENT => qr/regexp/, ... ;
Create alternate identifiers for specified columns.
Each IDENT, which must be a valid Perl identifier, will henceforth refer to the specified column even if the identifier is the same as the title or letter code of a different column.
$row{IDENT}
and a tied variable $IDENT
will refer to the specified column.
Aliases automatically track the column if it's position changes.
Regular Expressions are matched against titles only, and an exception is thrown if more than one title matches.
Otherwise, COLSPECs may be titles, existing alias names, column letters, etc. (see "COLUMN SPECIFIERS" for details).
The COLSPEC is evaluated before the alias is created, so
alias B => "B";
would make "B" henceforth refer to the column with title "B", if one exists, or the second column (treating COLSPEC as a letter-code), even if the column is later moved.
RETURNS: The 0-based column indices of the aliased column(s).
unalias IDENT, ... ;
Forget alias(es). Any masked COLSPECs become usable again.
tie_column_vars VARNAME, ...
Create tied package global variables (scalars) for use during apply
.
Each variable corresponds to a column, and reading or writing it accesses the corresponding cell in the row being visited during apply
.
The '$' may be omitted in the VARNAME arguments to tie_column_vars
;
You must separately declare these variables with our $NAME
, except in the special case described at "Use in BEGIN() or module import methods".
The variable name itself implies the column it refers to.
Variable names may be:
User-defined alias names (see "alias")
Titles which happen to be valid Perl identifiers
Identifiers derived from titles by replacing offending characters with underscrores (see "AUTOMATIC ALIASES"),
Spreadsheet column letters like "A", "B" etc.
See "CONFLICT RESOLUTION" for how ambiguity is resolved.
Multiple calls accumulate, including with different sheets.
Variable bindings are dynamically evaluated during each access by using the variable's identifier as a COLSPEC with the 'current sheet' in your package. This means that it does not matter which sheet was 'current' when tie_column_vars
was called with a particular name; it only matters that the name of a tied variable is a valid COLSPEC in the 'current sheet' when that variable is referenced (otherwise a read returns undef and a write throws an exception). [*Need clarification*]
{OPTIONS} may specify:
- package => "pkgname"
-
Tie variables in the specified package instead of the caller's package.
- verbose => bool
- debug => bool
-
Print trace messages.
tie_column_vars ':all'
With the :all token all possible variables are tied, corresponding to the aliases, titles, non-conflicting column letters etc. which exist for the current sheet.
In addition, variables will be tied in the future whenever new identifiers become valid (for example when a new alias
is created, column added, or another file is read into the same sheet).
Although convenient this is insecure if package (e.g "our") variables are used for any other purpose because they could be clobbered by malicious spreadsheet titles (lexical "my" variables are always safe). Perl's built-in punctuation variables and $ARGV can not be clobbered because those names are always excluded as column identifiers (see COLSEPCs). However names from use English;
are not similarly protected!
If VARNAMES are also specified, those variables will be tied immediately even if not yet usable; an exception occurs if a tied variable is referenced before the corresponding alias or title exists. [*Need clarification* -- exception even for reads??]
Use in BEGIN{} or module import methods
tie_column_vars
also imports the tied variables into your package (or as specified with package => "pkgname").
It is unnecessary to declare variables if they are imported before any code which references them is compiled. This can be the case when tie_column_vars
is called in a BEGIN{} block or in the import
method of a module loaded via use
.
Spreadsheet::Edit::Preload makes use of this.
$rowindex = title_rx ;
title_rx ROWINDEX ;
title_rx undef ;
Get or set the title row index. When setting, titles in that row are immediately (re-)examined and the corresponding COLSPECs become valid, e.g. you can reference a column by it's title or a derived identifier.
Note: Setting title_rx
this way is rarely needed because by default read_spreadsheet
sets the title row.
Setting to undef
disables titles; any existing COLSPECs derived from titles are invalidated.
title_rx {AUTODETECT_OPTIONS} 'auto';
Immediately perform (or repeat) auto-detection of the title row. See read_spreadsheet
for a description of the options.
apply {code} [COLSPEC*] ;
apply_all {code} [COLSPEC*] ;
apply_torx {code} RX-OR-RXLIST [,COLSPEC*] ;
apply_exceptrx {code} RX-OR-RXLIST [,COLSPEC*] ;
Execute the specified code block (or referenced sub) once for each row.
Note that there is no comma after a bare {code} block.
While executing your code, tied column variables and the sheet variables @crow
, %crow
, $rx
and $linenum
and corresponding OO methods will refer to the row being visited.
If a list of COLSPECs is specified, then
@_ is bound to the columns in the order specified
$_ is bound to the first such column
apply
normally visits all rows which follow the title row, or all rows if there is no title row. first_data_rx
and last_data_rx
, if defined, further limit the range visited.
apply_all
unconditionally visits every row, including any title row.
apply_torx
or apply_exceptrx
visit exactly the indicated rows. RX-OR-RXLIST may be either a single row index or a [list of rx];
Rows may be safely inserted or deleted during 'apply'; rows inserted after the currently-being-visited row will be visited at the proper time.
Nested and recursive apply
s are allowed. When an 'apply' changes the 'current sheet', tied variables then refer to the other sheet and any apply
active for that sheet. If using nested applys, take care to restore the original sheet before returning (Guard::scope_guard
is useful).
MAGIC VARIABLES USED DURING APPLY
These variables refer to the row currently being visited:
@crow is an array aliased to the current row's cells.
%crow is a hash aliased to the same cells, indexed by alias, title, letter code, etc. (any COLSPEC).
$rx is the 0-based index of the current row.
$linenum is the starting line number of the current row if the data came from a .csv file.
For example, the "Account Number" column in the SYNOPSIS may be accessed many ways:
alias AcNum => "Account Number";
apply {
$crow{"Account Number"} # %crow indexed by title
$crow{AcNum} # using an explicit alias
$crow{Account_Number} # using the AUTOMATIC ALIAS
$crow[ $colx{"Account Number"} ]; # @crow indexed by a 0-based index
$crow[ $colx{"AcNum"} ]; # ...obtained from %colx
$crow[ $colx{"Account_Number"} ]; #
$rows[$rx]->[ $colx{Account_Number} ] # Directly accessing @rows
# See "TIED COLUMN VARIABLES" for a sweeter alternative
};
delete_col COLSPEC ;
delete_cols COLSPEC+ ;
The indicated columns are removed. Remaining title bindings are adjusted to track shifted columns.
only_cols COLSPEC+ ;
All columns except the specified columns are deleted.
move_col POSITION, SOURCE ;
move_cols POSITION, SOURCES... ;
Relocate the indicated column(s) (SOURCES
) so they are adjacent, in the order specified, starting at the position POSITION
.
POSITION may be ">COLSPEC" to place moved column(s) immediately after the indicated column (">$" to place at the end), or POSITION may directly specify the destination column using an unadorned COLSPEC.
A non-absolute COLSPEC indicates the initial position of the referenced column.
insert_col POSITION, newtitle ;
insert_cols POSITION, newtitles... ;
One or more columns are created starting at a position specified the same way as in move_cols
(later columns are moved rightward).
POSITION may be ">$" to place new column(s) at the far right.
A new title must be specified for each new column. If there is no title row, specify undef
for each position.
Returns the new column index or indices.
split_col {code} COLSPEC, POSITION, newtitles... ;
New columns are created starting at POSITION as with insert_cols
, and populated with data from column COLSPEC.
{code}
is called for each row with $_ bound to the cell at COLSPEC and @_ bound to cell(s) in the new column(s). It is up to your code to read the old column ($_) and write into the new columns (@_).
The old column is left as-is (not deleted).
If there is no title row, specify undef
for each new title.
sort_rows {rx cmp function}
sort_rows {rx cmp function} $first_rx, $last_rx
If no range is specified, then the range is the same as for apply
(namely: All rows after the title row unless limited by first_data_rx .. last_data_rx).
In the comparison function, globals $a and $b will contain row objects, which are dual-typed to act as either an array or hash ref to the cells in their row. The corresponding original row indicies are also passed as parameters in @_
.
Rows are not actually moved until after all comparisons have finished.
RETURNS: A list of the previous row indicies of all rows in the sheet.
# Sort on the "LName" column using row indicies
# (contrast with the example in SYNOPSIS which uses $a and $b)
sort_rows { my ($rxa, $rxb) = @_;
$rows[$rxa]{LName} cmp $rows[$rxb]{LName}
};
sort_indicies {rx cmp function}
sort_indicies {rx cmp function} $first_rx, $last_rx
Like sort_rows
but returns [ref to array of rx values] indicating the sorted row order without actually chaning the sheet (rows are not moved).
rename_cols COLSPEC, "new title", ... ;
Multiple pairs may be given. Title cell(s) are updated as indicated.
Existing user-defined aliases are not affected, i.e., they continue to refer to the same columns as before even if the titles changed.
join_cols_sep STRING COLSPEC+ ;
join_cols {code} COLSPEC+ ;
The specified columns are combined into the first-specified column and the other columns are deleted.
The first argument of join_cols_sep
should be a fixed separator. The first argument of join_cols
may be a {code} block or subref;
If a separator string is specified it is used to join column content together.
If a {code} block or sub ref is specified, it is executed once for each row following the title row, with $_ bound to the first-named column, i.e. the surviving column, and @_ bound to all named columns in the order given.
It is up to your code to combine the data by reading @_ and writing $_ (or, equivalently, by writing $_[0]).
first_data_rx
and last_data_rx
are ignored, and the title is not modified.
reverse_cols
The order of the columns is reversed.
insert_row
insert_row 'END' [,$count]
insert_rows $rowx [,$count]
Insert one or more empty rows at the indicated position (default: at end). $rowx
, if specified, is either a 0-based offset for the new row or 'END' to add the new row(s) at the end. Returns the index of the first new row.
delete_rows $rowx,... ;
delete_rows 'LAST',... ;
The indicated data rows are deleted. $rowx
is a zero-based row index or the special token "LAST" to indicate the last row (same as $#rows
). Any number of rows may be deleted in a single command, listed individually.
transpose
Invert the relation, i.e. rotate and flip the table. Cells A1,B1,C1 etc. become A1,A2,A3 etc. Any title_rx is forgotten.
write_csv *FILEHANDLE
write_csv $path
Write the current data to the indicated path or open file handle as a CSV text file. The default encoding is UTF-8 or, if read_spreadsheet
was most-recently used to read a csv file, the encoding used then.
{OPTIONS} may include
- options for Text::CSV
-
Usually none need be specified because we supply sane defaults.
- silent => bool
- verbose => bool
- debug => bool
write_spreadsheet OUTPUTPATH
Write the current data to a spreadsheet (.ods, .xlsx, etc.) by first writing to a temporary CSV file and then importing that file into a new spreadsheet.
{OPTIONS} may include
- col_formats => [ LIST ]
-
REQUIRED OPTION. EXPERIMENTAL, likely to change when Spreadsheet::Read is integrated!
Elements of LIST may be "" (Standard), "Text", "MM/DD/YY", "DD/MM/YY", or "YY/MM/DD" to indicate the format of the corresponding column. The meaning of "Standard" is not well documented but appears to mean "General Number" in most cases. For details, see "Format Codes" in this old Open Office documentation.
- silent => bool
- verbose => bool
- debug => bool
options NAME => EXPR, ... ;
options NAME ;
Set or retrieve miscellaneous sheet-specific options. When setting, the previous value of the last option specified is returned. The only options currently defined are silent, verbose and debug.
$hash = attributes ;
Returns a reference to a hash in which you may store arbitrary data associated with the sheet object.
spectocx COLSPEC or qr/regexp/, ... ;
Returns the 0-based indicies of the specified colomn(s), or throws an exception if there is no such column. A regexp may match multiple columns. This allows probing column titles with fuzzy matching.
See also %colx
(or the corespnding method) which may be easier to use when testing exact column titles.
new_sheet
[functional API only] Create a new empty sheet and make it the 'current sheet', returning the sheet object.
Rarely used because a new sheet is automatically created by read_spreadsheet
if your package has no current sheet.
{OPTIONS} may include:
- data_source => "text..."
-
This string will be returned by the
data_source
method, overriding any default. - rows => [[A1_value,B1_value,...], [A2_value,B2_value,...], ...],
- linenums => [...] #optional
-
This makes the
sheet
object hold data already in memory. The data should not be modified directly while the sheetobject
exists. - clone => $existing_sheet
-
A deep copy of an existing sheet is made.
- num_cols => $number # with no initial content
-
An empty sheet is created but with a fixed number of columns. When rows are later created they will be immediately padded with empty cells if necessary to this width.
Note: See Spreasheet::Edit->new
if using the OO API.
$curr_sheet = sheet ;
$prev_sheet = sheet $another_sheet ;
$prev_sheet = sheet undef ;
[Functional API only] Retrieve, change, or forget the 'current sheet' object used by the functional API.
Changing the current sheet immediately changes what is referenced by tied column variables and STANDARD SHEET VARIABLES (described later).
{OPTIONS} may specify package => 'pkgname'
to operate on the specified package instead of the caller's package.
STANDARD SHEET VARIABLES
These variables magically access the 'current sheet' in your package.
- @rows
-
The spreadsheet data as an array of row objects.
Each row object is "dual-typed" (overloaded) to act as either an ARRAY or HASH reference to the cells in that row.
When used as a HASH ref, the key may be a alias, column title, letter-code etc. (any COLSPEC). When used as an ARRAY ref, the 0-based index specifies the column.
- @linenums
-
The first line numbers of corresponding rows (a row can contain multiple lines if cells contain embedded newlines). Valid only if the data came from a CSV file.
- $num_cols
-
The number of columns in the widest input row. Shorter rows are padded with empty cells when read so that all rows have the same number of columns in memory.
- $title_rx and $title_row
-
$title_rx
contains the 0-based row index of the title row and$title_row
is an alias for$rows[ $title_rx ]
.The title row is auto-detected by default. See
read_spreadsheet
andtitle_rx
for how to control this.If a column title is modified, set
$title_rx = undef;
to force re-detection. - $first_data_rx and $last_data_rx
-
Optional limits on the range of rows visited by
apply()
or sorted bysort_rows()
. By default $first_data_rx is the first row following the title row (or 0 if no title row). - %colx (column key => column index)
-
%colx
maps aliases, titles, etc. (all currently-valid COLSPECs) to the corresponding zero-based column indicies. See "COLSPECS" . - %colx_desc (column key => "debugging info")
COLSPECs (COLUMN SPECIFIERS)
Arguments which specify columns may be:
- (1) a user-defined alias identifier
- (2) an actual "column title" **
- (3) an actual title with any leading & trailing spaces removed *
- (4) an AUTOMATIC ALIAS identifier *
- (6) a Regexp (qr/.../) which matches an actual title
- (7) a numeric column index (0-based)
- (8) '^' or '$' (means first or last column, respectively)
*These may only be used if they do not conflict with an item listed higher up.
**Titles may be used directly if they can not be confused with a user-defined alias, Perl's built-in punctuation variables or ARGV, the special names '^' or '$', or a numeric column index (to access columns with such titles, create an alias using a regex). See "CONFLICT RESOLUTION".
AUTOMATIC ALIASES are Perl identifiers derived from column titles by first removing leading or trailing spaces, and then replacing non-word characters with underscores and prepending an underscore if necessary. For example:
Title Automatic Alias
"Address" Address (no change needed)
" First Name " First_Name
"First & Last" First___Last
"+sizes" _sizes
"1000s" _1000s (underscore avoids leading digit)
Aliases (both automatic and user-defined) are valid identifiers, so can be used as the names of tied variables, bareword keys to %colx
and %crow
, and related OO interfaces,
CONFLICT RESOLUTION
A conflict occurs when a column key potentially refers to multiple columns. For example, "A", "B" etc. are standard column names, but they might also be the actual titles of other columns. Warnings are printed about conflicts unless the silent option is true (see options
).
User alias identifiers (defined using alias
) are always valid.
'^' and '$' always refer to the first and last column.
Numeric "names" 0, 1, etc. always give a 0-based column index if the value is between 0 and num_cols (i.e. one past the end).
Actual Titles refer to to their columns, except if they:
are the same as a user-defined alias
are '^' or '$' or conflict with a Perl built-in variable
consist only of digits (without leading 0s) corresponding to a valid column index.
Automatic Aliases and Standard column names ("A", "B", etc.) are available as column keys unless they conflict with a user-defined alias or an actual title.
Note: To refer to titles which are excluded by these rules, use a Regexp qr/.../. Automatic Aliases can also refer to such titles if there are no conflicts.
Column positions always refer to the data before a command is executed. This is relevant for commands which re-number or delete columns.
OO DESCRIPTION (OBJECT-ORIENTED INTERFACE)
All the Functions listed above (except for new_sheet
and sheet
) have corresponding methods with the same arguments.
However method arguments must be enclosed in parenthesis; Bare {code} blocks may not be used, so a sub{...} ref should be passed to apply
, etc.
OO-SPECIFIC METHODS
Spreadsheet::Edit->new(OPTIONS...)
Creates a new "sheet" object.
OPTIONS are the same as described for the new_sheet
Function above, except that they may be specified as key => value pairs of arguments instead of (or in addition to) an {OPTIONS} hashref.
$sheet->rows() ; # Analogous to to \@rows
$sheet->linenums() ; # Analogous to \@linenums
$sheet->num_cols() ; # Analogous to $num_cols
$sheet->colx() ; # Analogous to \%colx
$sheet->colx_desc() ; # Analogous to \%colx_desc
$sheet->first_data_rx() ; # Analogous to $first_data_rx
$sheet->last_data_rx() ; # Analogous to $last_data_rx
$sheet->title_rx() ; # Analogous to to $title_rx
$sheet->title_row() ; # Analogous to $title_row
$sheet->rx() ; # Current rx in apply, analogous to to $rx
$sheet->title_rx(rxvalue) ; # Analogous to assigning to $title_rx (changes title row)
$sheet->crow(); # Current row in apply (a dual-typed row object)
$sheet->linenum() ; # Analogous to to $linenum
$sheet->get(rx,ident) ; # Analogous to to $rows[rx]{ident}
$sheet->set(rx,ident,value); # Analogous to to $rows[rx]{ident} = value
$sheet->data_source(); # Returns "description of sheet" (e.g. path read)
$sheet->sheetname(); # valid if input was a spreadsheet, else undef
UTILITY
logmsg [FOCUSARG,] string, string, ...
(must be explicitly imported)
Concatenate strings, prefixed by a description of the 'current sheet' and row during apply
, if any (or with the sheet and/or row given by FOCUSARG).
The resulting string is returned, with "\n" appended if it was not already terminated by a newline.
The first argument is used as FOCUSARG if it is a sheet object, [sheet_object], or [sheet_object, rowindex], and specifies the sheet and/or row to describe in the message prefix. Otherwise the first argument is not special and is simply the first message string.
The details of formatting the sheet may be customized with a call-back given by a {logmsg_pfx_gen}
attribute. See comments in the source for how this works.
SEE ALSO
BUGS
Some vestigial support for formats remains from an earlier implementation, but this support is likely to be entirely replaced at some point.
Reading a spreadsheet (but not csv) may fail if Libre Office or Open Office are currently running for any purpose; this seems to be a bug or limitation where batch-mode operations share the same profile as interactive sessions. In any case, ssconvert (gnumeric) will be used if it is installed, and does not have this limitation. In the future Spreadsheet::Read might be used instead of external programs, although it uses Twig and is quite a bit slower.
THREAD SAFETY
Unknown, and probably not worth the trouble to find out.
FUTURE IDEAS
- Add format-processing support.
- Add "column-major" views.
-
This would allow accessing a whole column as an array. Perhaps
@cols
and%cols
would be sets of column arrays (@cols indexed by column index, %cols indexed by any COLSPEC). Andtie_column_vars '@NAME'
would tie user array variables to columns.
AUTHOR
Jim Avera (jim.avera at gmail)
LICENSE
Public Domain or CC0.