$Id: Changes,v 1.31 2002/09/17 15:13:19 mrodrigu Exp $
CHANGES
Changes in 3.07
Fixed the way weaken is imported from Scalar::Util
Changes in 3.06
Added XML::Twig::Elt trimmed_text and related methods (trimmed_field,
first_child_trimmed_text, last_child_trimmed_text...)
Added XML::Twig::Elt replace_with method
Added XML::Twig::Elt cut_children method
Added XML::Twig contains_only method
Added *[att=~ /regexp/] condition type (suggested by Nikola Janceski)
Fixed a bug in the way handlers for gi, path and subpath were chained
(Thanks to Tommy Wareing)
Fixed a bug where entities caused an error on other handlers (Thanks
to Tommy Wareing)
Fixed a bug with string(sub_elt)=~ /regexp/ (thanks to Tommy Wareing)
Fixed a bug with output_filter used with expand_external_entities
(thanks to Tommy Wareing)
Fixed (yet another!) bug with whitespace handling (whitespace, then an
entity made the whitespace move after the entity) (spotted by the usual
Tommy Wareing)
Added an error message when pasting on an undef reference (suggestion
of Tommy Wareing)
Fixed a bug in in_context (found by Tommy Wareing)
Fixed a bug when loading the DTD (local undef $/ did not stay local,
bug found and patch sent by Steve Pomeroy and Henry Cipolla)
Fixed a bug in setting output filter
Fixed a bug in using a filehandle with twig_print_outside_roots
Added safe_encode_hex filter
fixed bug in set_indent, $INDENT not set properly (thanks
to Eric Jain)
fixed dependencies (no check with 5.8.0, added Scalar::Util
as a possible source for weaken)
Added no_prolog option to XML:Twig::new
Tested build on Windows (thanks to Cory Trese and Josh Hawkins)
Changed in 3.05
Added _ALPHA_ SAX export methods:
XML::Twig toSAX1, toSAX2, flush_toSAX1, flush_toSAX2
XML::Twig::Elt toSAX1, toSAX2
The following gotchas apply:
- these methods work only for documents that are completely
loaded by XML::Twig (ie if you use twig_roots the data
outside of the roots will not be output as SAX).
- SAX1 support is a bit dodgy: the encoding is not preserved
(it is always set to 'UTF-8'),
- locator is not supported (and probably will not, what's the
location of a newly created element?)
Also when exporting SAX you should consider setting Twig to a
mode where all aspects of the XML are treated as nodes by XML::Twig,
by setting the following options when you create the twig:
comments => 'process', pi => 'process', keep_spaces => 1
twig_print_outside_roots now supports a file handle ref as argument:
the untouched part of the tree will be output to the filehandle:
Added the 'indented_c' style that gives a slightly more compact pretty
print than 'indented': the end tags are on the same line as the
preceeding text (suggestion of Hugh Myers)
Added option in get_xpath (aka find_nodes) to apply the query to
a list of elements
Added processing of conditions on the current node in get_xpath:
my @result= get_xpath( q{.[@att="val"]});
This is of course mostly useful with the previous option.
The idea stemmed from a post from Liam Quin to the perl-xml list
Added XML::Twig xml_version, set_xml_version, standalone, set_standalone
methods on the XML declaration
Fixed a bug in change_gi (which simply did not work at all), found
by Ron Hayden.
Fixed bug in space handling with CDATA (spaces before the CDATA section
were moved to within the section), comments and PI's
Fixed bug in parse_url (exit was not called at the end of the child),
found by David Kulp
Cleanup a bit the code that parses xpath expressions (still some work
to be done on this though), fixed a bug with last, found by Roel de Cock
Fixed the SYNOPSIS (parsefile is used to parse files, spotted by
e.sammer)
Fixed a bug in pretty printing (reported by Zhu Zhou)
Fixed a bugin the install: the Makefile now uses the same perl used
to perl Makefile.PL to run speedup and check_optional_modules
(reported by Ralf Santos)
Fixed bugs in pretty printing when using flush, trying to figure out
as well as possible if an element contains other elements or text
(there is still a gotcha, see the BUGS section in the docs)
Fixed a bug that caused the XML declaration and the DTD not to be reset
between parses
Improved the conversion functions (errors are now reported when the
function is created and not when it is first used)
Added the output_encoding option to XML::Twig->new, which allows
specifying an encoding for the output: the conversion filter is
created using Encode (perl 5.8.0) Text::Iconv or Unicode::* The
XML declaration is also updated
#CDATA and #ENT can now be used in handler expressions
added XML::Twig::Elt remove_cdata method, which turns CDATA sections
into regular PCDATA elements
set_asis can now be used to output CDATA sections un-escaped (and without
the CDATA section markers)
Changed in 3.04
Fixed handlers for XML::Parser 2.27 so the module can pass the tests
Changed in 3.03
fixed bugs in entity handling in twig_roots mode
added the ignore_elts option, to skip completely elements
enhanced the XPath-like syntax in navigation and get_xpath
methods: added operators (>, < ...)
fixed [RT 168]: setTwigHandler failed when no handler was already set
(thanks to Jerry)
turned %valid_option into a package global so AnyData can access it
fixed a bug in sprint that prevented it from working with filters
fixed a bug in erase when erasing an empty element that was the
last child of its parent ([RT390], thanks to Julian Arnold)
copy now correctly copies the asis status of elements
fixed typos on the docs (thanks to Shlomo Yona)
added tests (for erase and entities in twig_roots mode)
Changed in 3.02
Tweaked speedup to replace constructs that did not work in
perl 5.005003
Changed in 3.01
Fixed the directory name in the tar file
Changed in 3.00
WARNING: THIS CHANGE IS NOT BACKWARD COMPATIBLE
But it is The Right Thing To Do
In normal mode (when KeepEncoding is not used) the XML data is
now stored as parsed by XML::Parser, ie the base entities are
expanded. The "print" methods (print, sprint and flush, plus the
new xml_string, pcdata_xml_string and att_xml_string) return the
data in XML-escaped form: & and < are escaped in PCDATA and
&, < and the quote (" by default) are turned to & < and
" (or ' if the quote is '). The "text" methods (text,
att and pcdata) return the stored text as is.
So if you want to output XML you should use the "print" methods
and if you want to output text you should use the "text" methods.
Note that this breaks the trick consisting in adding tags to the
content of an element: $elt->prefix( "<b>") no longer adds a <b>
tag before an element. $elt->print will now output "<b>...".
(but you can still use it by marking those elements as 'asis').
It also fixes the annoying ' thingie that used to replace '
in the data.
When the KeepEncoding option is used this is not true, the data
is stored asis, base entities are kept un-escaped.
Note that KeepEncoding is a global setting, if you use several twigs,
some with KeepEncoding and some without then you will have to manually
set the option using the set_keep_encoding method, otherwise the last
XML::Twig::new call will have set it
In addition when the KeepEncoding option is used the start tag is
parsed using a custom function parse_start_tag, which works only
for 1-byte encodings (it is regexp-based). This method can be
overridden using the ParseStartTag (or parse_start_tag) option
when creating the twig. This function takes the original string as
input and returns the gi and the attributes (in a hash).
If you write a function that works for multi-byte encodings I would
very much appreciate if you could send it back to me so I can add it
to the module, so other users can benefit from it.
An additional option ExpansExternalEnts will expand external entity
references to their text (in the output, the text stored is &ent;).
Added
When handlers (twig_handlers or start_tag_handlers) are called
$_ is set to the element node, so quick hacks look better:
my $t= new XML::Twig( twig_handlers =>
{ elt => sub { print $_->att( 'id'), ": ", $_->text, "\n"; } }
);
XML::Twig dispose method which properly reclaims all the memory
used by the object (useful if you don't have WeakRef installed)
XML::Twig and XML::Twig::Elt ignore methods, which can be called
from a start_tag_handlers handler and cause the element (or the
current element if called on a twig) to be ignored by the
parsing
XML::Twig parse_start_tag option that overrides the default function
used to parse start tags when KeepEncoding is used
XML::Twig::Elt xml_string, pcdata_xml_string and att_xml_string
all return an XML-escaped string for an element (including
sub-elements and their tags but not the enclosing tags for the
element), a #PCDATA element and an attribute
XML::Twig::Elt methods tag and set_tag, equivalent respectively
to gi and set_gi
XML::Twig and XML::Twig::Elt set_keep_encoding methods can be used
to set the keep_encoding value if you use several twigs with
different keep_encoding options
Option names for XML::Twig::new are now checked (a warning is output
if the option is not a valid one);
when using pretty_print nice or indented keep_spaces_in is now checked
so the elements within an element listed in keep_spaces_in are not
indented
XML::Twig::Elt insert_new_elt method that does a new and a paste
XML::Twig::Elt split_at method splits a #PCDATA element in 2
XML::Twig::Elt split method splits all the text descendants of an
element, on a regep, wrapping text captured in brackets in the
regexp in a specified element, all elements are returned
XML::Twig::Elt mark method is similar to the split method, except
that only newly created elements (matched by the regexp) are
returned
XML::Twig::Elt get_type method returns #ELT for elements and the gi
(#PCDATA, #CDATA...) otherwise
XML::Twig::Elt is_elt returns the gi if the element is a real element
and 0 if it is #PCDATA, #CDATA...
XML::Twig::Elt contains_only_text returns 1 if the element contains no
"real" element (is_field is another name for it)
First implementation of the output_filter option which filters the
text before it is output by the print, sprint, flush and text methods
(only works for print at the moment, and still under test with various
versions of XML::Parser). Standard filters are also available
Example:
#!/bin/perl -w
use strict;
use XML::Twig;
my $t = new XML::Twig(output_filter => 'latin1');
$t->parse( \*DATA);
$t->print;
__DATA__
<?xml version="1.0" encoding="ISO-8859-1"?>
<docé té="valué">Un homme soupçonné d'être impliqué dans la mort
d'un motard de la police, renversé</docé>
The 'latin1', 'html' and 'safe' filters are predefined, you can also
build additional filters using Iconv (requires text::Iconv) and
Unicode::String (requires Unicode::String and Unicode::Map8):
my $conv = XML::Twig::iconv_convert( 'latin1');
my $t = new XML::Twig(output_filter => $conv);
my $conv = XML::Twig::unicode_convert( 'latin1');
my $t = new XML::Twig(output_filter => $conv);
warning: conversions work fine with XML::Parser 2.27 but sometimes fail
with XML::Parser 2.30 (on Perl 5.6.1, Linux 2.4 on a PC) when using
'latin1' without Text::Iconv or Unicode::String and Unicode::Map8
installed.
The input_filter option works the same way, except the text is
converted before it is stored in the twig (so you can use regexp in
your native encoding for example)
the XML::Twig::Elt set_asis method sets a property of an element that
causes it to be output asis (without XML-escaping < " and &) so you
can still create tagged text
the XML::Twig::Elt prefix and suffix methods accept an optional
'asis' argument that causes the prefix or suffix to get the asis
property (so you can do $elt->prefix( '<b>foo</b>', 'asis') for
example)
the XML::Twig and XML::Twig::Elt find_nodes methods are aliases
to the get_xpath method (this is the name used in XML::XPath)
the XML::Twig parseurl and safe_parseurl methods parse a document
whose url is given
XML::Twig::Elt extra_data, set_extra_data and append_extra_data to
access the... extra data (PI's and comments) attached to an element
XML::Twig method parser returns the XML::Parser::Expat object used
by the twig
Most XML::Parser::Expat methods are now inherited by XML::Twig
objects
XML::Twig::Elt descendant_or_self method that returns the element
and its descendants
Fixed
element (and attribute) names can now include '.'
get_xpath now works for root based XPath expressions ('/doc/elt')
get_xpath now works for regexps (including regexps on attribute values)
you can now properly restore pretty_print and empty_tag_style values
speedup (at install) now checks the Perl version and uses qr or ""
so XML::Twig works in 5.004
XML::Twig::Elt wrap_in now allows wrapping the root element
various bugs in the DOCTYPE and DTD output with XML::Parser 2.30
the tests to fix a bug when working with XML::Parser 2.27
the tests to fix a bug preventing test2 to pass under windows
_default_ handlers now work (thanks Zoogie)
the text method now returns the XML base entities (<>&'") un-escaped
(thanks to Hakan Kallberg's persistence to ask for it ;--)
pretty_print works better for elements without content
end_tag_handlers now work properly (thanks to Phil Glanville for the
patch).
Enhanced
Attributes which name starts with # are not output by the print
methods, and thus can be used to store private data on elements
WeakRef is used if installed, so no more memory leaks
Sped-up print and flush by creating the _print and _flush methods
which do not check for file handle and pretty print options
The doc has been enhanced and somewhat restructured. All options are
now written as this_is_an_option although the legacy form thisIsAnOption
can still be used. Links now display properly in the text form (thanks to
Dominic Mitchell for spotting this and sending a patch)
Navigation functions (including descendants) now allow not only a gi
to be used as filter, but also the '#ELT' token, to filter only "real"
elements (as opposed to #PCDATA, #CDATA, #PI, #COMMENT, #ENT), the
'#TEXT' token, to filter only text (PCDATA and CDATA elements),
regular expressions (built with qr//) applied on the elements gi's,
code references, the code is passed the element as argument, and a
subset of XPath.
Functions that can use this token are: children, first_child, last_child,
prev_sibling, last_sibling, next_elt, last_elt, descendants, get_xpath,
child, sibling, sibling_text, prev_siblings, next_siblings field,
first_child_text
The paste method now accepts a 'within' position, which inserts the
element at the $offset argument (a 3rd, required, argument) in the
reference element or in its first text child
The XML::Twig::Elt insert method now accepts attributes (hashrefs)
applied to the element(s) being inserted:
$elt->insert( e1 => { a => 'v'}, e2 => e3 => { a1 =>'v1', a2 => 'v2'});
The XML::Twig::erase method now outputs a meaningful error message if
applied to the root (or a cut element)
Optimizations for better performances (in the end performances are about
the same or a little worse than XML::Twig 2.02 but the module is much
more powerful)
Known bugs:
The DTD interface is completely broken, and I have little hope of
fixing it considering I have to deal with 2 incompatible versions of
XML::Parser. Plus no one seems to be using it...
Some XPath/Navigation expressions using " or ' in the text()="" part
of the expression will cause a fatal error
Note that this version works better (but doesn't necessarily require)
with WeakRef (Perl version 5.6.0 and above) and Text::Iconv for all
its encoding conversions.