NAME
XML::Fast - Simple and very fast XML - hash conversion
SYNOPSIS
use XML::Fast;
my $hash = xml2hash $xml;
my $hash2 = xml2hash $xml, attr => '.', text => '~';
DESCRIPTION
This module implements simple, state machine based, XML parser written in C.
It could parse and recover some kind of broken XML's. If you need XML validator, use XML::LibXML
RATIONALE
Another similar module is XML::Bare. I've used it for some time, but it have some failures:
If your XML have node with TextNode, then CDATANode, then again TextNode, you'll got broken value
It doesn't support charsets
It doesn't support any kind of entities.
So, after count of tries to fix XML::Bare I've decided to write parser from scratch.
Here is some features and principles:
It uses minimal count of memory allocations.
All XML is parsed in 1 scan.
All values are copied from source XML only once (to destination keys/values)
If some types of nodes (for ex comments) are ignored, there are no memory allocations/copy for them.
I've removed benchmark results, since they are very different for different xml's. Sometimes XML::Bare is faster, sometimes not. So, XML::Fast mainly should be considered not faster-than-bare
, but format-other-than-bare
EXPORT
xml2hash $xml, [ %options ]
hash2xml $hash, [ %options ]
OPTIONS
- order [ = 0 ]
-
Not implemented yet. Strictly keep the output order. When enabled, structures become more complex, but xml could be completely reverted.
- attr [ = '-' ]
-
Attribute prefix
<node attr="test" /> => { node => { -attr => "test" } }
- text [ = '#text' ]
-
Key name for storing text
When undef, text nodes will be ignored
<node>text<sub /></node> => { node => { sub => '', '#text' => "test" } }
- join [ = '' ]
-
Join separator for text nodes, splitted by subnodes
Ignored when
order
in effect# default: xml2hash( '<item>Test1<sub />Test2</item>' ) : { item => { sub => '', '~' => 'Test1Test2' } }; xml2hash( '<item>Test1<sub />Test2</item>', join => '+' ) : { item => { sub => '', '~' => 'Test1+Test2' } };
- trim [ = 1 ]
-
Trim leading and trailing whitespace from text nodes
- cdata [ = undef ]
-
When defined, CDATA sections will be stored under this key
# cdata = undef <node><![CDATA[ test ]]></node> => { node => 'test' } # cdata = '#' <node><![CDATA[ test ]]></node> => { node => { '#' => 'test' } }
- comm [ = undef ]
-
When defined, comments sections will be stored under this key
When undef, comments will be ignored
# comm = undef <node><!-- comm --><sub/></node> => { node => { sub => '' } } # comm = '/' <node><!-- comm --><sub/></node> => { node => { sub => '', '/' => 'comm' } }
- array => 1
-
Force all nodes to be kept as arrays.
# no array <node><sub/></node> => { node => { sub => '' } } # array = 1 <node><sub/></node> => { node => [ { sub => [ '' ] } ] }
- array => [ 'node', 'names']
-
Force nodes with names to be stored as arrays
# no array <node><sub/></node> => { node => { sub => '' } } # array => ['sub'] <node><sub/></node> => { node => { sub => [ '' ] } }
- utf8decode => 1
-
Force decoding of utf8 sequences, instead of just upgrading them (may be useful for broken xml)
SEE ALSO
-
Another fast parser
-
The most powerful XML parser for perl. If you don't need to parse gigabytes of XML ;)
-
XML parser, that uses XML::LibXML for parsing and then constructs hash structure, identical to one, generated by this module. (At least, it should ;)). But of course it is much more slower, than XML::Fast
LIMITATIONS
Does not support wide charsets (UTF-16/32) (see RT71534)
TODO
Ordered mode (as implemented in XML::Hash::LX)
Create hash2xml, identical to one in XML::Hash::LX
Partial content event-based parsing (I need this for reading XML streams)
Patches, propositions and bug reports are welcome ;)
AUTHOR
Mons Anderson, <mons@cpan.org>
COPYRIGHT AND LICENSE
Copyright (C) 2010 Mons Anderson
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.