$Id: Changes,v 1.25 2004/01/31 09:01:51 mrodrigu Exp $
CHANGES
Changes in 3.13
Maintenance release to get the tests to pass on various platforms
updated the README
fixed a problem with encoding conversions (using safe_encode and
safe_encode_hex) under perl 5.8.0, see RT ticket #5111
fixed tests to pass when trying to use an unsupported iconv filter
Changes in 3.12
New features and greatly increased test coverage
added lots of tests (>900), thanks to David Rigaudiere, Forrest
Cahoon, Sebastien Aperghis-Tramoni, Henrik Tougaard and Sam Tregar
for testing this release on various OSs, Perl, XML::Parser and
expat versions.
added XML::Twig::XPath that uses XML::XPath as the XPath engine
for findnodes, findnodes_as_string, findvalue, exists, find and
matches. Just use XML::Twig::XPath instead of use XML::Twig;
(see the tests in t/xmlxpath_*).
Added special case to output some HTML tags ('script' to start with)
as not empty.
XML::Twig::Elt->new now properly flags empty elements (spotted by
Dave Roe)
added XML::Twig::Elt contains_a_single method
added #ENT twig_handlers (not necessarily complete, so not yet
documented, needs more tests)
added doc for XML::Twig and XML::Twig::Elt subs_text methods
tags starting with # are now "invisible" (they are not output),
useful for example for pretty_printing
added new options --wrap '' and --date to xml_grep
improved XPath support (added [nb] support)
added xpath method, which generates a unique XPath for an element
added has_child and has_children as synonyms of first_child
added XML::Twig::set_id_seed to control how generated id's are
created
when using ignore on an element, end_tag_handlers are now tested
at the end of the element (so you can for exemple get the byte
offset in the document), suggestion of Philippe Verdret
added XML::Twig::Elt change_att_name
XML::Twig::Elt new now properly works when called as an object
(and not a class) method
fixed namespace processing somewhat
fixed SAX output methods
fixed bug when keep_atts_order on and using set_att on an element
with no existing attribute (spotted by scharloi)
WARNING - potentially incompatible changes -
when using finish_print, the document used to be flushed. This is no
longer the case, you will have to do it before calling finish_print.
This way you have the choice of doing it or not.
Removed XML::Twig::Elt::unescape function (was no longer used)
Changes in 3.11
added --text_only option to xml_grep (outputs the text of the
result, without tags)
fixed bug where "Comments [was] always dropped after a twig
object set 'comments' to 'drop'" (RT#3711), bug report and first
patch by Simon Flack
by popular demand, added option "keep_atts_order" that keeps the
original attribute order in the output. This option needs the
Tie::IxHash module to work.
Changes in 3.10
added xml_pp xml_grep and xml_spellcheck to the distribution
the print method now calls 'print $elt->sprint' instead of printing
content as it converts them to text, in order to reduce the number
of calls to Perl's print (which should increase performance)
changed XML::Twig::Elt erase to allow erasing the root element
if it has only 1 child
added findvalue method to XML::Twig and XML::Twig::Elt
aliased findnodes to get_xpath in XML::Twig and XML::Twig::Elt
added the elt_class option to XML::Twig::new
added the do_not_chain_handlers option to XML::Twig::new
added the XML::Twig::Elt is_first_child and is_last_child methods
set_gi,set_text, prefix, suffix, set_att, set_atts, del_atts, del_att
now return the element for easier chains
Fixed a bug in pretty printing comments before elements (RT #2315)
Added the XML::Twig::Elt children_copy method which returns a list
of elements that are copies of the children of the element
Fixed a bug in wrap_in when the element wrapped is not attached to a tree
Fixed a bug with get_xpath: regexp modifiers were not taken into account
spotted by Eric Safern (RT #2284)
Fixed a bug in methods inherited from XML::Parser::Expat (arguments
were not properly passed)
Installed local empty SIG handlers to trap error messages triggered
by require for optional modules, so that user signal handlers would
not have to deal with them (suggestion from Philippe Verdret)
Fixed a bug in the navigation XPath engine: text() was used instead of
string(). Both are now allowed.
Added XML::Twig::Elt sort_children, sort_children_on_value,
sort_children_on_att and sort_children_on_field methods that sort the
children of an element in place
Added XML::Twig::Elt field_to_att and att_to_field methods
Fixed a memory leak due to ids not being weak references
Added the XML::Twig::Elt wrap_children method that wraps children
of an element that satisfy a regexp in a new element
Added the XML::Twig::Elt add_id method that adds an id to an element
Added the XML::Twig::Elt strip_add method that deletes an attribute
from an element and its descendants
Fixed a quasi-bug in set_att where the hash passed in reference was
used directly, which makes it a problem when the same reference is
passed several times: all the elements share the same attributes.
This is a potentially incompatible change for code that relied on
this feature. Please report problems to the author.
fixed a bug in set_id
fixed a bug spotted by Bill Gunter: allowed _ as the initial character
for XML names. Also now allow ':' as the first element
added the simplify methods, which load a twig into an XML::Simple like
data structure
fixed a bug in get_type and is_elt, spotted and fixed by Paul Stodghill
added the XML::Twig::Elt ancestors_or_self method
fixed bug when doc root is also a twig_root (twig was not built)
updated the README (fleshed out examples, added OS X to the list of
tested platforms)
fixed a bug when using the no_dtd_output option
added doc for the XML::Twig::Elt children_count method
added the XML::Twig::Elt children_text method
updated the doc so it can be properly formated by my custom pod2html,
the generated doc (with a bigger ToC and better links) is available
from the XML::Twig page at http://xmltwig.com/xmltwig/
Changes in 3.09
added XML::Twig::Elt xml_text method
fixed several bugs in the split method under 5.8.0 when matching a utf8
character (thanks to Dominic Mitchell who spotted them)
cleaned-up the pod (still in progress)
added the XML::Twig::Elt pos method that gives the position of
an element in its parent's child list
re-introduced parseurl (thanks to Denis Kolokol for spotting its
absence in this version)
Fixed a bug: ent_tag_handlers were not called on the root (thanks
to Philippe Verdret
Improved #PI (also declared as '?') and #COMMENT handler support
added check on reference type (must be XML::Twig::Elt) in
XML::Twig::Elt::paste (patch by Forrest Cahoon)
Changes in 3.08
The previous fix wasn't enough :--(
Changes in 3.07
Fixed the way weaken is imported from Scalar::Util
Changes in 3.06
Added XML::Twig::Elt trimmed_text and related methods (trimmed_field,
first_child_trimmed_text, last_child_trimmed_text...)
Added XML::Twig::Elt replace_with method
Added XML::Twig::Elt cut_children method
Added XML::Twig contains_only method
Added *[att=~ /regexp/] condition type (suggested by Nikola Janceski)
Fixed a bug in the way handlers for gi, path and subpath were chained
(Thanks to Tommy Wareing)
Fixed a bug where entities caused an error on other handlers (Thanks
to Tommy Wareing)
Fixed a bug with string(sub_elt)=~ /regexp/ (thanks to Tommy Wareing)
Fixed a bug with output_filter used with expand_external_entities
(thanks to Tommy Wareing)
Fixed (yet another!) bug with whitespace handling (whitespace, then an
entity made the whitespace move after the entity) (spotted by the usual
Tommy Wareing)
Added an error message when pasting on an undef reference (suggestion
of Tommy Wareing)
Fixed a bug in in_context (found by Tommy Wareing)
Fixed a bug when loading the DTD (local undef $/ did not stay local,
bug found and patch sent by Steve Pomeroy and Henry Cipolla)
Fixed a bug in setting output filter
Fixed a bug in using a filehandle with twig_print_outside_roots
Added safe_encode_hex filter
fixed bug in set_indent, $INDENT not set properly (thanks
to Eric Jain)
fixed dependencies (no check with 5.8.0, added Scalar::Util
as a possible source for weaken)
Added no_prolog option to XML:Twig::new
Tested build on Windows (thanks to Cory Trese and Josh Hawkins)
Changed in 3.05
Added _ALPHA_ SAX export methods:
XML::Twig toSAX1, toSAX2, flush_toSAX1, flush_toSAX2
XML::Twig::Elt toSAX1, toSAX2
The following gotchas apply:
- these methods work only for documents that are completely
loaded by XML::Twig (ie if you use twig_roots the data
outside of the roots will not be output as SAX).
- SAX1 support is a bit dodgy: the encoding is not preserved
(it is always set to 'UTF-8'),
- locator is not supported (and probably will not, what's the
location of a newly created element?)
Also when exporting SAX you should consider setting Twig to a
mode where all aspects of the XML are treated as nodes by XML::Twig,
by setting the following options when you create the twig:
comments => 'process', pi => 'process', keep_spaces => 1
twig_print_outside_roots now supports a file handle ref as argument:
the untouched part of the tree will be output to the filehandle:
Added the 'indented_c' style that gives a slightly more compact pretty
print than 'indented': the end tags are on the same line as the
preceeding text (suggestion of Hugh Myers)
Added option in get_xpath (aka find_nodes) to apply the query to
a list of elements
Added processing of conditions on the current node in get_xpath:
my @result= get_xpath( q{.[@att="val"]});
This is of course mostly useful with the previous option.
The idea stemmed from a post from Liam Quin to the perl-xml list
Added XML::Twig xml_version, set_xml_version, standalone, set_standalone
methods on the XML declaration
Fixed a bug in change_gi (which simply did not work at all), found
by Ron Hayden.
Fixed bug in space handling with CDATA (spaces before the CDATA section
were moved to within the section), comments and PI's
Fixed bug in parse_url (exit was not called at the end of the child),
found by David Kulp
Cleanup a bit the code that parses xpath expressions (still some work
to be done on this though), fixed a bug with last, found by Roel de Cock
Fixed the SYNOPSIS (parsefile is used to parse files, spotted by
e.sammer)
Fixed a bug in pretty printing (reported by Zhu Zhou)
Fixed a bugin the install: the Makefile now uses the same perl used
to perl Makefile.PL to run speedup and check_optional_modules
(reported by Ralf Santos)
Fixed bugs in pretty printing when using flush, trying to figure out
as well as possible if an element contains other elements or text
(there is still a gotcha, see the BUGS section in the docs)
Fixed a bug that caused the XML declaration and the DTD not to be reset
between parses
Improved the conversion functions (errors are now reported when the
function is created and not when it is first used)
Added the output_encoding option to XML::Twig->new, which allows
specifying an encoding for the output: the conversion filter is
created using Encode (perl 5.8.0) Text::Iconv or Unicode::* The
XML declaration is also updated
#CDATA and #ENT can now be used in handler expressions
added XML::Twig::Elt remove_cdata method, which turns CDATA sections
into regular PCDATA elements
set_asis can now be used to output CDATA sections un-escaped (and without
the CDATA section markers)
Changed in 3.04
Fixed handlers for XML::Parser 2.27 so the module can pass the tests
Changed in 3.03
fixed bugs in entity handling in twig_roots mode
added the ignore_elts option, to skip completely elements
enhanced the XPath-like syntax in navigation and get_xpath
methods: added operators (>, < ...)
fixed [RT 168]: setTwigHandler failed when no handler was already set
(thanks to Jerry)
turned %valid_option into a package global so AnyData can access it
fixed a bug in sprint that prevented it from working with filters
fixed a bug in erase when erasing an empty element that was the
last child of its parent ([RT390], thanks to Julian Arnold)
copy now correctly copies the asis status of elements
fixed typos on the docs (thanks to Shlomo Yona)
added tests (for erase and entities in twig_roots mode)
Changed in 3.02
Tweaked speedup to replace constructs that did not work in
perl 5.005003
Changed in 3.01
Fixed the directory name in the tar file
Changed in 3.00
WARNING: THIS CHANGE IS NOT BACKWARD COMPATIBLE
But it is The Right Thing To Do
In normal mode (when KeepEncoding is not used) the XML data is
now stored as parsed by XML::Parser, ie the base entities are
expanded. The "print" methods (print, sprint and flush, plus the
new xml_string, pcdata_xml_string and att_xml_string) return the
data in XML-escaped form: & and < are escaped in PCDATA and
&, < and the quote (" by default) are turned to & < and
" (or ' if the quote is '). The "text" methods (text,
att and pcdata) return the stored text as is.
So if you want to output XML you should use the "print" methods
and if you want to output text you should use the "text" methods.
Note that this breaks the trick consisting in adding tags to the
content of an element: $elt->prefix( "<b>") no longer adds a <b>
tag before an element. $elt->print will now output "<b>...".
(but you can still use it by marking those elements as 'asis').
It also fixes the annoying ' thingie that used to replace '
in the data.
When the KeepEncoding option is used this is not true, the data
is stored asis, base entities are kept un-escaped.
Note that KeepEncoding is a global setting, if you use several twigs,
some with KeepEncoding and some without then you will have to manually
set the option using the set_keep_encoding method, otherwise the last
XML::Twig::new call will have set it
In addition when the KeepEncoding option is used the start tag is
parsed using a custom function parse_start_tag, which works only
for 1-byte encodings (it is regexp-based). This method can be
overridden using the ParseStartTag (or parse_start_tag) option
when creating the twig. This function takes the original string as
input and returns the gi and the attributes (in a hash).
If you write a function that works for multi-byte encodings I would
very much appreciate if you could send it back to me so I can add it
to the module, so other users can benefit from it.
An additional option ExpansExternalEnts will expand external entity
references to their text (in the output, the text stored is &ent;).
Added
When handlers (twig_handlers or start_tag_handlers) are called
$_ is set to the element node, so quick hacks look better:
my $t= new XML::Twig( twig_handlers =>
{ elt => sub { print $_->att( 'id'), ": ", $_->text, "\n"; } }
);
XML::Twig dispose method which properly reclaims all the memory
used by the object (useful if you don't have WeakRef installed)
XML::Twig and XML::Twig::Elt ignore methods, which can be called
from a start_tag_handlers handler and cause the element (or the
current element if called on a twig) to be ignored by the
parsing
XML::Twig parse_start_tag option that overrides the default function
used to parse start tags when KeepEncoding is used
XML::Twig::Elt xml_string, pcdata_xml_string and att_xml_string
all return an XML-escaped string for an element (including
sub-elements and their tags but not the enclosing tags for the
element), a #PCDATA element and an attribute
XML::Twig::Elt methods tag and set_tag, equivalent respectively
to gi and set_gi
XML::Twig and XML::Twig::Elt set_keep_encoding methods can be used
to set the keep_encoding value if you use several twigs with
different keep_encoding options
Option names for XML::Twig::new are now checked (a warning is output
if the option is not a valid one);
when using pretty_print nice or indented keep_spaces_in is now checked
so the elements within an element listed in keep_spaces_in are not
indented
XML::Twig::Elt insert_new_elt method that does a new and a paste
XML::Twig::Elt split_at method splits a #PCDATA element in 2
XML::Twig::Elt split method splits all the text descendants of an
element, on a regep, wrapping text captured in brackets in the
regexp in a specified element, all elements are returned
XML::Twig::Elt mark method is similar to the split method, except
that only newly created elements (matched by the regexp) are
returned
XML::Twig::Elt get_type method returns #ELT for elements and the gi
(#PCDATA, #CDATA...) otherwise
XML::Twig::Elt is_elt returns the gi if the element is a real element
and 0 if it is #PCDATA, #CDATA...
XML::Twig::Elt contains_only_text returns 1 if the element contains no
"real" element (is_field is another name for it)
First implementation of the output_filter option which filters the
text before it is output by the print, sprint, flush and text methods
(only works for print at the moment, and still under test with various
versions of XML::Parser). Standard filters are also available
Example:
#!/bin/perl -w
use strict;
use XML::Twig;
my $t = new XML::Twig(output_filter => 'latin1');
$t->parse( \*DATA);
$t->print;
__DATA__
<?xml version="1.0" encoding="ISO-8859-1"?>
<docé té="valué">Un homme soupçonné d'être impliqué dans la mort
d'un motard de la police, renversé</docé>
The 'latin1', 'html' and 'safe' filters are predefined, you can also
build additional filters using Iconv (requires text::Iconv) and
Unicode::String (requires Unicode::String and Unicode::Map8):
my $conv = XML::Twig::iconv_convert( 'latin1');
my $t = new XML::Twig(output_filter => $conv);
my $conv = XML::Twig::unicode_convert( 'latin1');
my $t = new XML::Twig(output_filter => $conv);
warning: conversions work fine with XML::Parser 2.27 but sometimes fail
with XML::Parser 2.30 (on Perl 5.6.1, Linux 2.4 on a PC) when using
'latin1' without Text::Iconv or Unicode::String and Unicode::Map8
installed.
The input_filter option works the same way, except the text is
converted before it is stored in the twig (so you can use regexp in
your native encoding for example)
the XML::Twig::Elt set_asis method sets a property of an element that
causes it to be output asis (without XML-escaping < " and &) so you
can still create tagged text
the XML::Twig::Elt prefix and suffix methods accept an optional
'asis' argument that causes the prefix or suffix to get the asis
property (so you can do $elt->prefix( '<b>foo</b>', 'asis') for
example)
the XML::Twig and XML::Twig::Elt find_nodes methods are aliases
to the get_xpath method (this is the name used in XML::XPath)
the XML::Twig parseurl and safe_parseurl methods parse a document
whose url is given
XML::Twig::Elt extra_data, set_extra_data and append_extra_data to
access the... extra data (PI's and comments) attached to an element
XML::Twig method parser returns the XML::Parser::Expat object used
by the twig
Most XML::Parser::Expat methods are now inherited by XML::Twig
objects
XML::Twig::Elt descendant_or_self method that returns the element
and its descendants
Fixed
element (and attribute) names can now include '.'
get_xpath now works for root based XPath expressions ('/doc/elt')
get_xpath now works for regexps (including regexps on attribute values)
you can now properly restore pretty_print and empty_tag_style values
speedup (at install) now checks the Perl version and uses qr or ""
so XML::Twig works in 5.004
XML::Twig::Elt wrap_in now allows wrapping the root element
various bugs in the DOCTYPE and DTD output with XML::Parser 2.30
the tests to fix a bug when working with XML::Parser 2.27
the tests to fix a bug preventing test2 to pass under windows
_default_ handlers now work (thanks Zoogie)
the text method now returns the XML base entities (<>&'") un-escaped
(thanks to Hakan Kallberg's persistence to ask for it ;--)
pretty_print works better for elements without content
end_tag_handlers now work properly (thanks to Phil Glanville for the
patch).
Enhanced
Attributes which name starts with # are not output by the print
methods, and thus can be used to store private data on elements
WeakRef is used if installed, so no more memory leaks
Sped-up print and flush by creating the _print and _flush methods
which do not check for file handle and pretty print options
The doc has been enhanced and somewhat restructured. All options are
now written as this_is_an_option although the legacy form thisIsAnOption
can still be used. Links now display properly in the text form (thanks to
Dominic Mitchell for spotting this and sending a patch)
Navigation functions (including descendants) now allow not only a gi
to be used as filter, but also the '#ELT' token, to filter only "real"
elements (as opposed to #PCDATA, #CDATA, #PI, #COMMENT, #ENT), the
'#TEXT' token, to filter only text (PCDATA and CDATA elements),
regular expressions (built with qr//) applied on the elements gi's,
code references, the code is passed the element as argument, and a
subset of XPath.
Functions that can use this token are: children, first_child, last_child,
prev_sibling, last_sibling, next_elt, last_elt, descendants, get_xpath,
child, sibling, sibling_text, prev_siblings, next_siblings field,
first_child_text
The paste method now accepts a 'within' position, which inserts the
element at the $offset argument (a 3rd, required, argument) in the
reference element or in its first text child
The XML::Twig::Elt insert method now accepts attributes (hashrefs)
applied to the element(s) being inserted:
$elt->insert( e1 => { a => 'v'}, e2 => e3 => { a1 =>'v1', a2 => 'v2'});
The XML::Twig::erase method now outputs a meaningful error message if
applied to the root (or a cut element)
Optimizations for better performances (in the end performances are about
the same or a little worse than XML::Twig 2.02 but the module is much
more powerful)
Known bugs:
The DTD interface is completely broken, and I have little hope of
fixing it considering I have to deal with 2 incompatible versions of
XML::Parser. Plus no one seems to be using it...
Some XPath/Navigation expressions using " or ' in the text()="" part
of the expression will cause a fatal error
Note that this version works better (but doesn't necessarily require)
with WeakRef (Perl version 5.6.0 and above) and Text::Iconv for all
its encoding conversions.