NAME
OpenOffice::OODoc::File - I/O operations with OpenDocument files
DESCRIPTION
The explicit use of this module is generally required only in programs which need to do some raw data import or export operations in OpenDocument files. This module can be used as a special wrapper of Archve::Zip.
However, a look at the constructor and the save() methods is recommended, in order to get a better knowledge of the file interface which is used by the document-oriented modules (i.e. OODoc::XPath and its derivatives).
If the program is only concerned with a single XML element from the file, it is unnecessary to create an OODoc::File explicitly. Just build an OODoc::XPath object with a filename as parameter. The XPath will create a "private" File itself and use it to access the data invisibly.
Please note that OODoc::File is able to handle all standard zip archives and not just OpenOffice.org files. Filenames do not have to have a ".sx?" extension. It allows access by other modules, if required, to XML/OpenOffice data in compressed archives which are not necessarily in OpenOffice.org format.
Methods
Constructor : OpenOffice::OODoc::File->new(filename|handle)
Short Forms (equivalent):
odfFile(), odfContainer(), or odfPackage()
Returns an instance of OODoc::File
if
the argument corresponds to a
valid and accessible zip archive. The argument may be either a
file path/name or an
open
(in
binmode
) and seekable IO::File.
The file or file handle used here is used in order to initialize the
ODF Connector
for
subsequent processing through the other submodules
of the OpenOffice::OODoc API. It becomes the
default
target file
if
the data are updated and the changes committed using save().
If the OODoc::File object is loaded
for
update, it's strongly
write
. However,
if
the source is a file name, the same file name may
be safely used as the source and the target (of course and as usual,
any sensitive document must be backed up
before
processing).
If the
"create"
option is set (see below), a file path/name may be
provided, but a file handle must be avoided.
See the explanations about the save() method below.
Examples:
my
$container1
= odfContainer(
"doc1.odt"
);
my
$fh
= new IO::File
"< doc2.odt"
;
binmode
$fh
;
my
$container2
= odfContainer(
$fh
);
An optional argument can be passed in hash
format
(
key
=> value),
after
the filename. Like this:
work_dir
=>
"path"
which designates the path to the XML working files also generated
during a save
for
this object (
each
OpenOffice::OODoc::File object
can have its own working directory); without this option, the
working directory is set according to the content of the class
variable
$OpenOffice::OODoc::File::WORKING_DIRECTORY
.
Note:
no
content checking is carried out. The archive can be opened
whether or not it is an OpenOffice.org document.
It's possible to create an OpenOffice::OODoc::File object without
providing an existing OpenOffice.org file. To
do
so, there is a
special option:
create
=>
"class"
where
"class"
is the document class according to the OpenOffice.org
terminology, so it is one of the following
values
:
"text"
,
spreadsheet
", "
presentation
", "
drawing". These
values
are the same
as the legal parameters of contentClass() in OpenOffice::OODoc::XPath.
If
"create"
is provided, the first argument must not be a file handle,
knowing that the target (
if
the data is later saved) can't be an
open
file, and that there is by definition
no
source file. Note that,
for
a creation, the value of the file argument is ignored so it may be
undef
or an empty string; however a valid path/name may be provided in
order to provide a
default
target
for
a subsequent save().
For a very advanced
use
, it's possible to combine
"create"
with
an
additional option
template_path
=>
"path"
to generate the new file from special, user-provided ODF templates
instead of those included in the installation. If this option is
not provided, the general template path (possibly changed
with
the templatePath() function) is used. The template path must
indicate a directory which contains a set of ODF files whose
names are
"template.???"
, where
"???"
represents the conventional
ODF and OOo suffixes, knowing that the supported suffixes are
presently odt, ods, odp, odg, sxw, sxc, sxi, sxd. Of course, some
applications don't need all the templates;
for
example, the
"template.sx?"
files should be omitted
if
the user don't want to
When the
"create"
option is used, it's possible to provide an
"opendocument"
option in order to
override
the installation-level
default
file
format
for
new documents. If this option is set to
"1"
,
"on"
or
"true"
, the new document will comply to the OASIS
OpenDocument
format
;
if
it's set to
"0"
,
"off"
or
"false"
, the new
document will be created according to the OpenOffice.org 1.0
format
.
The
"opendocument"
option is ignored without the
"create"
one (this
tool is not a
format
converter
for
existing documents). Remember that
the
default
format
is set to OpenDocument in the CPAN distribution
and the OpenOffice.org 1.0
format
is deprecated.
The returned object of new(),
if
successful, is a valid File object,
whose methods listed below become available.
If unsuccessful (generally due to non-existent file or invalid zip
archive or even a corrupt zip archive), the constructor returns a
null value (
undef
), and an error message is sent to the standard
output.
extract(<member>)
Returns the decompressed content of the requested member,
if
contained in the archive and corresponds to an XML element of the
currently active OpenOffice.org file instance. The <member>
parameter must therefore correspond to one of the members of the
file (see Introduction). If the application uses any of the words
"content"
,
"meta"
,
"styles"
or
"settings"
, in upper or lower case,
the .xml extension is automatically added but any other names are
accepted without change
if
they are indeed existing members of the
archive.
The following statements are equivalent:
my
$content
=
$archive
->extract(
'META'
);
my
$content
=
$archive
->extract(
'meta.xml'
);
my
$content
=
$archive
->extract('meta);
After the above calls, the variable
$content
contains the XML
document which represents the metadata of an OpenOffice.org file.
This content can be used,
for
example, to instance a Meta object.
Note: in most
"normal"
cases, this method does not have to be called
explicitly as it is called silently by
each
occurrence of XPath
(therefore by Text and Meta which are derivatives of it), but only
if
XPath is constructed referencing an OODoc::File object as a
parameter (see OODoc::XPath). An extract call is only useful
when
exporting the XML or handling it outside of OODoc::XPath.
On error (e.g. unknown archive member), a null value is returned and
an error message is produced.
link(<XPath_object>)
Connects a File object to an XPath object
given
as an argument. This
connection
has
two output products:
- immediately calls the extract method using the corresponding
"specialist"
component of the XPath object (metadata
if
OODoc::Meta, content
if
OODoc::Text, etc).
- stores the
link
for
later updates to all OpenOffice.org file
members which may have been modified by XPath objects (in case a
save is called, see below).
Note: This method is used by OODoc::XPath to
connect
as
"clients"
to
OODoc::File objects. It does not have to be called directly by
raw_delete(member)
Orders the deletion of any OpenOffice.org file member.
Example:
$archive
->raw_delete(
"Pictures/100000AEFGH.jpg"
);
deletes the physical content of an image loaded in the file.
It is entirely up to the application to ensure that such a deletion
does not compromise the integrity of the file as
no
dependency
checking is carried out here. In the above example, the
delete
operation could be particularly justified
if
the
"image"
member
which referenced this content had been (or was going to be)
otherwise removed, or
if
it had been replaced by an external
reference.
This method can be used to remove any XML or non-XML member. It can
be combined
with
raw_import() in order to effect a raw replacement
of content without interpretation. Caution: this method should not
be used
for
an XML member (content, style, meta, etc.) which is
currently
"active"
(i.e. linked to an active OODoc::XPath instance),
unless
the member
has
been loaded as
"read only"
(search in the
OpenOffice::OODoc::XPath
for
the
"read_only"
option).
Note: calls to this method only prepare the deletion, which is
actually carried out by the save() method
if
it occurs
before
the
end of the program. If save() is called
with
a filename which is
different from the source filename, the source file remains
unchanged and the deleted member is simply not transferred to the
target file.
raw_export(member [, destination])
Decompresses and exports the physical content of a
given
member (XML
or non-XML) of an archive. If the second argument is used, it passes
the destination filename (perhaps
with
access path). If not, the
file is exported using its internal archive name. Examples:
$archive
->raw_export(
"styles.xml"
);
exports the
"styles.xml"
member into a file of the same name in the
current directory.
$archive
->raw_export(
"styles.xml"
,
"/tmp/my_style.xml"
);
exports the same XML member to a
given
path.
raw_export executes immediately (and is not deferred like
raw_import).
If successful, the returned value is the filename of the exported
file.
raw_import(member [, source])
Creates or replaces the indicated member by importing an external
source file. If the second argument is omitted, the source file is
taken to have the same access path as the internal member.
Example:
$arch1
->raw_export(
"styles.xml"
,
"/tmp/styles.xml"
);
$arch2
->raw_import(
"styles.xml"
,
"/tmp/styles.xml"
);
or, in more compact form:
$arch2
->raw_import
(
"styles.xml"
,
$arch1
->raw_export(
"styles.xml"
)
);
The above sequence requests the
import
of the member
"styles.xml"
from an archive called
$arch1
into
$arch2
(a direct means of using
the styles and page layout of one document as a template
for
another).
The imported files can be any type and have any content. This
"raw"
method treats an OpenOffice.org file as any other zip archive. It
notably allows the
import
of non-XML members (images, sounds,
programs, etc) which the application deals
with
(and which can be
ignored by the office application).
Caution: the
import
is only completed
when
a save() method is called
by the importing object. It can only succeed
if
the source file is
available at that very moment. A raw_import method can be called
before
the imported file is available (
no
check of availability is
made). An error will be caused
if
the file is absent at the
time
of
the save() call. If several raw_import statements are run against the
same filename, there will actually be a corresponding number of
copies of the file in its final state which are imported at the
moment of the save() call, even
if
it had perhaps been modified in the
meantime (probably not a very useful outcome).
save([filename|handle])
Commits all the changes made in the members of the archive and makes
the resulting content persistent.
In addition, the external, non-XML resources (i.e. image files) that
have previously been targeted by
import
methods,
if
any, are
physically imported
when
save() is called (and not
before
). If these
resources are not available at this
time
, they are ignored and a
warning is issued
for
each
missing file.
Example:
$archive
->save(
"target.odt"
);
The target may be a file name (
with
optional path) or an IO::File
object (that must be
open
, in
binmode
and writable).
Without argument, save(), attempts to
write
the on the file or handle
that was specified as the source file at the creation
time
,
if
any.
the same IO::File
for
writing the output is presently not recommended;
in some situations, bi-directional I/Os against an application-provided
open
IO::File may result in unpredictable results
with
the current
implementation. Of course,
if
the OODoc::File object
has
been
initialized
with
the
"create"
option and
with
an
undef
value or an
empty string as file name, save() requires an explicit target (a valid
file/path or an
open
and writable IO::File).
Please note that OODoc::File does not check the content, and the save()
method can be used to force through any data which may produce a
file not compliant
with
the ODF packaging specification.
The filename argument is optional. If it is omitted, the source file
previously supplied by the constructor call (
if
any) is updated.
Without a filename argument, save()
if
the source document was got
through a IO::Handle.
Even though the life of an OODoc::File object does not necessarily
end
with
a save, it is recommended that you avoid repeated
alternation between save and extract (the object's behaviour in this
situation
has
not been tested). Normally it is preferable to call a
save once and
for
all at the end of a series of updates.
Only a call to OODoc::File's save() method saves content, metadate and
presentation changes made by other OODoc components to the
OpenOffice.org file, including raw imports of external data
(raw_import). However, the XML members currently associated
with
"read only"
OODoc::XPath objects are not changed in the file.
No file is created or modified
before
this method is
called,
with
the exception of external files created by raw_export.
Nevertheless File's save can be called automatically and silently by
an OODoc::XPath object but only where it
has
been called as a
parameter explicitly
for
this purpose (see the chapter on
OODoc::XPath).
All XPath objects which are
"connected"
to a File object by
link
must be present at the
time
of the save call. If one of these
objects
has
meanwhile been deleted, the consequences are
unpredictable and, in any case, any document updates it could have
made are lost.
templatePath([path])
Class function (not to be used as a method).
Can be replaced by odfTemplatePath(), see OODoc man page.
Accessor to get/set the path
for
a user-
defined
set of ODF templates,
to be used in case of new document creation. This path is empty by
default
. Without an explicit template path, the
default
XML templates
provided
with
the OpenOffice::OODoc distribution are automatically
selected.
The template path must designate a directory containing 4 regular
ODF files,
each
one corresponding to a supported ODF document class,
i.e.
"template.odt"
,
"template.ods"
,
"template.odp"
,
"template.odg"
.
However, the user don't need to provide templates that will not be
used.
Note that it's always possible to
override
the
default
template path
when
a new document is created, thanks to the
'template_path'
option.
Properties
No class variables are exported.
The class variable
$OpenOffice::OODoc::File::WORKING_DIRECTORY
indicates the directory to be used
for
temporary files (used but the
save() method)
when
no
object-specific path is provided through the
'work_dir'
option. By
default
, the working directory is the current
directory (
'.'
).
The
$OpenOffice::OODoc::File::TEMPLATE_PATH
variable, empty by
default
, can contain an alternative path
for
document generation
template files; it can be set
with
the templatePath() function.
The
$OpenOffice::OODoc::File::
$DEFAULT_OFFICE_FORMAT
variable,
whose
default
is 2, controls the
default
format
for
newly created
files (
when
the
format
is not explicitly selected by the application).
Allowed
values
are
"1"
for
OpenOffice.org 1.0 and
"2"
for
OASIS OpenDocument. In a regular installation, this variable is
automatically set according to the <File-DEFAULT_OFFICE_FORMAT>
element of the config.xml file (see INSTALL).
Instance hash variables are:
'linked'
=> list of connected OODoc::XPath instances
'members'
=> list of file member (*.xml and others)
'raw_members'
=> list of
import
files
'temporary_files'
=> created temporary files.
Where
$f
is a
given
instance of OODoc::File, the table
@{
$f
->{
'linked'
}}
is a list of OODoc::XPath objects which were connected to
$f
by the
link
method and
@{
$f
->{
'members'
}}
is a list of members found in the archive
when
new is called.
These variables can be
read
at any
time
even though they were
normally designed to be used internally by OODoc::File. Unless you
are just finding out exactly what they
do
, it is dangerous to modify
them. Applications
do
not normally need to access them.
AUTHOR/COPYRIGHT
Developer/Maintainer: Jean-Marie Gouarne http://jean.marie.gouarne.online.fr
Contact: jmgdoc@cpan.org
Copyright 2004-2008 by Genicorp, S.A. http://www.genicorp.com
Initial English version of the reference manual by Graeme A. Hunter (graeme.hunter@zen.co.uk).
License: GNU Lesser General Public License v2.1