NAME
Spreadsheet::ParseExcel - Get information from Excel file
SYNOPSIS
new interface
use strict;
use Spreadsheet::ParseExcel;
my $excel = Spreadsheet::ParseExcel::Workbook->Parse($file);
foreach my $sheet (@{$excel->{Worksheet}}) {
printf("Sheet: %s\n", $sheet->{Name});
$sheet->{MaxRow} ||= $sheet->{MinRow};
foreach my $row ($sheet->{MinRow} .. $sheet->{MaxRow}) {
$sheet->{MaxCol} ||= $sheet->{MinCol};
foreach my $col ($sheet->{MinCol} .. $sheet->{MaxCol}) {
my $cell = $sheet->{Cells}[$row][$col];
if ($cell) {
printf("( %s , %s ) => %s\n", $row, $col, $cell->{Val});
}
}
}
}
old interface use strict; use Spreadsheet::ParseExcel; my $oExcel = Spreadsheet::ParseExcel->new;
#1.1 Normal Excel97
my $oBook = $oExcel->Parse('Excel/Test97.xls');
my($iR, $iC, $oWkS, $oWkC);
print "FILE :", $oBook->{File} , "\n";
print "COUNT :", $oBook->{SheetCount} , "\n";
print "AUTHOR:", $oBook->{Author} , "\n";
for(my $iSheet=0; $iSheet < $oBook->{SheetCount} ; $iSheet++) {
$oWkS = $oBook->{Worksheet}[$iSheet];
print "--------- SHEET:", $oWkS->{Name}, "\n";
for(my $iR = $oWkS->{MinRow} ;
defined $oWkS->{MaxRow} && $iR <= $oWkS->{MaxRow} ; $iR++) {
for(my $iC = $oWkS->{MinCol} ;
defined $oWkS->{MaxCol} && $iC <= $oWkS->{MaxCol} ; $iC++) {
$oWkC = $oWkS->{Cells}[$iR][$iC];
print "( $iR , $iC ) =>", $oWkC->Value, "\n" if($oWkC); # Formatted Value
print "( $iR , $iC ) =>", $oWkC->{Val}, "\n" if($oWkC); # Original Value
}
}
}
DESCRIPTION
Spreadsheet::ParseExcel makes you to get information from Excel95, Excel97, Excel2000 file.
Functions
- new
-
$oExcel = Spreadsheet::ParseExcel->new( [ CellHandler => \&subCellHandler, NotSetCell => undef | 1, ]);
Constructor.
- CellHandler (experimental)
-
specify callback function when a cell is detected.
subCellHandler gets arguments like below:
sub subCellHandler ($oBook, $iSheet, $iRow, $iCol, $oCell);
CAUTION : The atributes of Workbook may not be complete. This function will be called almost order by rows and columns. Take care almost, not perfectly.
- NotSetCell (experimental)
-
specify set or not cell values to Workbook object.
- Parse
-
$oWorkbook = $oParse->Parse($sFileName [, $oFmt]);
return "Workbook" object. if error occurs, returns undef.
- $sFileName
-
name of the file to parse
From 0.12 (with OLE::Storage_Lite v.0.06), scalar reference of file contents (ex. \$sBuff) or IO::Handle object (including IO::File etc.) are also available.
- $oFmt
-
"Formatter Class" to format the value of cells.
- ColorIdxToRGB
-
$sRGB = $oParse->ColorIdxToRGB($iColorIdx);
ColorIdxToRGB returns RGB string corresponding to specified color index. RGB string has 6 characters, representing RGB hex value. (ex. red = 'FF0000')
Workbook
Spreadsheet::ParseExcel::Workbook
Workbook class has these methods :
- Parse
-
(class method) : same as Spreadsheet::ParseExcel
- Worksheet
-
$oWorksheet = $oBook->Worksheet($sName);
Worksheet returns a Worksheet object with $sName or undef. If there is no worksheet with $sName and $sName contains only digits, it returns a Worksheet object at that position.
Workbook class has these properties :
- File
-
Name of the file
- Author
-
Author of the file
- Flg1904
-
If this flag is on, date of the file count from 1904.
- Version
-
Version of the file
- SheetCount
-
Numbers of "Worksheet" s in that Workbook
- Worksheet[SheetNo]
-
Array of "Worksheet"s class
- PrintArea[SheetNo]
-
Array of PrintArea array refs.
Each PrintArea is : [ StartRow, StartColumn, EndRow, EndColumn]
- PrintTitle[SheetNo]
-
Array of PrintTitle hash refs.
Each PrintTitle is : { Row => [StartRow, EndRow], Column => [StartColumn, EndColumn]}
Worksheet
Spreadsheet::ParseExcel::Worksheet
Worksheet class has these methods:
- Cell ( ROW, COL )
-
Return the Cell object at row ROW and column COL if it is defined. Otherwise return undef.
- RowRange ()
-
Return a two-element list (MIN, MAX) containing the minimum and maximum of defined rows in the worksheet If there is no row defined MAX is smaller than MIN.
- ColRange ()
-
Return a two-element list (MIN, MAX) containing the minimum and maximum of defined columns in the worksheet If there is no row defined MAX is smaller than MIN.
Worksheet class has these properties:
- Name
-
Name of that Worksheet
- DefRowHeight
-
Default height of rows
- DefColWidth
-
Default width of columns
- RowHeight[Row]
-
Array of row height
- ColWidth[Col]
-
Array of column width (undef means DefColWidth)
- Cells[Row][Col]
-
Array of "Cell"s information in the worksheet
- Landscape
-
Print in horizontal(0) or vertical (1).
- Scale
-
Print scale.
- FitWidth
-
Number of pages with fit in width.
- FitHeight
-
Number of pages with fit in height.
- PageFit
-
Print with fit (or not).
- PaperSize
-
Paper size. The value is like below:
Letter 1, LetterSmall 2, Tabloid 3 , Ledger 4, Legal 5, Statement 6 , Executive 7, A3 8, A4 9 , A4Small 10, A5 11, B4 12 , B5 13, Folio 14, Quarto 15 , 10x14 16, 11x17 17, Note 18 , Envelope9 19, Envelope10 20, Envelope11 21 , Envelope12 22, Envelope14 23, Csheet 24 , Dsheet 25, Esheet 26, EnvelopeDL 27 , EnvelopeC5 28, EnvelopeC3 29, EnvelopeC4 30 , EnvelopeC6 31, EnvelopeC65 32, EnvelopeB4 33 , EnvelopeB5 34, EnvelopeB6 35, EnvelopeItaly 36 , EnvelopeMonarch 37, EnvelopePersonal 38, FanfoldUS 39 , FanfoldStdGerman 40, FanfoldLegalGerman 41, User 256
- PageStart
-
Start page number.
- UsePage
-
Use own start page number (or not).
-
Margins for left, right, top, bottom, header and footer.
- HCenter
-
Print in horizontal center (or not)
- VCenter
-
Print in vertical center (or not)
- Header
-
Content of print header. Please refer Excel Help.
-
Content of print footer. Please refer Excel Help.
- PrintGrid
-
Print with Gridlines (or not)
- PrintHeaders
-
Print with headings (or not)
- NoColor
-
Print in black-white (or not).
- Draft
-
Print in draft mode (or not).
- Notes
-
Print with notes (or not).
- LeftToRight
-
Print left to right(0) or top to down(1).
- HPageBreak
-
Array ref of horizontal page breaks.
- VPageBreak
-
Array ref of vertical page breaks.
- MergedArea
-
Array ref of merged areas. Each merged area is : [ StartRow, StartColumn, EndRow, EndColumn]
Cell
Spreadsheet::ParseExcel::Cell
Cell class has these properties:
- Value
-
Method Formatted value of that cell
- Val
-
Original Value of that cell
- Type
-
Kind of that cell ('Text', 'Numeric', 'Date')
If the Type was detected as Numeric, and the field format is defined and matches m{^[dmy][-\\/dmy]*$}, it will be set to Date.
- Code
-
Character code of that cell (undef, 'ucs2', '_native_') undef tells that cell seems to be ascii. '_native_' tells that cell seems to be 'sjis' or something like that.
- Format
-
"Format" for that cell.
- Merged
-
That cells is merged (or not).
- Rich
-
Array ref of font informations about each characters.
Each entry has : [ Start Position, Font Object]
For more information please refer sample/dmpExR.pl
Format
Spreadsheet::ParseExcel::Format
Format class has these properties:
- Font
-
"Font" object for that Format.
- AlignH
-
Horizontal Alignment.
0: (standard), 1: left, 2: center, 3: right, 4: fill , 5: justify, 7:equal_space
Notice: 6 may be merge but it seems not to work.
- AlignV
-
Vertical Alignment.
0: top, 1: vcenter, 2: bottom, 3: vjustify, 4: vequal_space
- Indent
-
Number of indent
- Wrap
-
Wrap (or not).
- Shrink
-
Display in shrinking (or not)
- Rotate
-
In Excel97, 2000 : degrees of string rotation. In Excel95 or earlier : 0: No rotation, 1: Top down, 2: 90 degrees anti-clockwise, 3: 90 clockwise
- JustLast
-
JustLast (or not). I have never seen this attribute.
- ReadDir
-
Direction for read.
- BdrStyle
-
Array ref of border styles : [Left, Right, Top, Bottom]
- BdrColor
-
Array ref of border color indexes : [Left, Right, Top, Bottom]
- BdrDiag
-
Array ref of diag border kind, style and color index : [Kind, Style, Color] Kind : 0: None, 1: Right-Down, 2:Right-Up, 3:Both
- Fill
-
Array ref of fill pattern and color indexes : [Pattern, Front Color, Back Color]
- Lock
-
Locked (or not).
- Hidden
-
Hidden (or not).
- Style
-
Style format (or Cell format)
Font
Spreadsheet::ParseExcel::Font
Format class has these properties:
- Name
-
Name of that font.
- Bold
-
Bold (or not).
- Italic
-
Italic (or not).
- Height
-
Size (height) of that font.
- Underline
-
Underline (or not).
- UnderlineStyle
-
0: None, 1: Single, 2: Double, 0x21: Single(Account), 0x22: Double(Account)
- Color
-
Color index for that font.
- Strikeout
-
Strikeout (or not).
- Super
-
0: None, 1: Upper, 2: Lower
Formatter class
Spreadsheet::ParseExcel::Fmt*
Formatter class will convert cell data.
Spreadsheet::ParseExcel includes 2 formatter classes: FmtDefault and FmtJapanese. You can create your own FmtClass as you like.
Formatter class(Spreadsheet::ParseExcel::Fmt*) should provide these functions:
- ChkType($oSelf, $iNumeric, $iFmtIdx)
-
tells type of the cell that has specified value.
- TextFmt($oSelf, $sText, $sCode)
-
converts original text into applicatable for Value.
- ValFmt($oSelf, $oCell, $oBook)
-
converts original value into applicatable for Value.
- FmtString($oSelf, $oCell, $oBook)
-
get format string for the $oCell.
KNOWN PROBLEMS
This module can not get the values of formulas in Excel files made with Spreadsheet::WriteExcel. Normaly (ie. By Excel application), formula has the result with it, but Spreadsheet::WriteExcel writes formula with no result. If you set your Excel application "Auto Calculation" off, (maybe [Tool]-[Option]-[Calculation] or something) You will see the same result.
If Excel has date fields where the specified format is equal to what happens to be the system-default for the short-date locale setting for the current user, Excel does not store the format, but uses the internal format number 14, which defaults to "m-d-yy" on every other system.
If Date fields have a format specified, and the date-separation character is the same as what happens to be the current date separation character for the user's regional setting, that character will change when the user changes the character from '/' to '-' or vice-versa. All date formats in the sheet will change. This means that if the author of the sheet used the date format 'dd-mm-yyyy', and his separation character was '-', the fields might be formatted like 'dd/mm/yyyy' on a system where the user set his separation character to '/', causing inconsistent date displays.
AUTHOR
Current maintainer: John McNamara jmcnamara@cpan.org
Maintainer 0.27-0.33: Gabor Szabo szabgab@cpan.org
http://www.szabgab.com/
Original author: Kawai Takanori (Hippo2000) kwitknr@cpan.org
http://member.nifty.ne.jp/hippo2000/ (Japanese)
http://member.nifty.ne.jp/hippo2000/index_e.htm (English)
SEE ALSO
XLHTML, OLE::Storage, Spreadsheet::WriteExcel, OLE::Storage_Lite
This module is based on herbert within OLE::Storage and XLHTML.
XLSTools: http://perl.jonallen.info/projects/xlstools
TODO
- Add tests, and more tests
- Spreadsheet::ParseExcel : Password protected data, Formulas support, HyperLink support, Named Range support
- Spreadsheet::ParseExcel::SaveParser : Catch up Spreadsheet::WriteExce feature, Create new Excel fle
See also:
L<http://www.cpanforum.com/dist/Spreadsheet-ParseExcel>
and
http://www.perlmonks.org/index.pl?node_id=490656
http://www.perlmonks.org/index.pl?node_id=379743
http://www.perlmonks.org/index.pl?node_id=433192
http://www.perlmonks.org/index.pl?node_id=422147
COPYRIGHT
Copyright (c) 2006-2008 Gabor Szabo Copyright (c) 2000-2006 Kawai Takanori All rights reserved.
You may distribute under the terms of either the GNU General Public License or the Artistic License, as specified in the Perl README file.
ACKNOWLEDGEMENTS
First of all, I would like to acknowledge valuable program and modules: XHTML, OLE::Storage and Spreadsheet::WriteExcel.
In no particular order: Yamaji Haruna, Simamoto Takesi, Noguchi Harumi, Ikezawa Kazuhiro, Suwazono Shugo, Hirofumi Morisada, Michael Edwards, Kim Namusk, Slaven Rezić, Grant Stevens, H.Merijn Brand and many many people + Kawai Mikako.
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 2508:
Non-ASCII character seen before =encoding in 'Rezić,'. Assuming UTF-8