NAME

XML::Fast - Simple and very fast XML - hash conversion

SYNOPSIS

use XML::Fast;

my $hash = xml2hash $xml;
my $hash2 = xml2hash $xml, attr => '.', text => '~';

DESCRIPTION

This module implements simple, state machine based, XML parser written in C.

It could parse and recover some kind of broken XML's. If you need XML validator, use XML::LibXML

RATIONALE

Another similar module is XML::Bare. I've used it for some time, but it have some failures:

  • If your XML have node with TextNode, then CDATANode, then again TextNode, you'll got broken value

  • It doesn't support charsets

  • It doesn't support any kind of entities.

So, after count of tries to fix XML::Bare I've decided to write parser from scratch.

Here is some features and principles:

  • It uses minimal count of memory allocations.

  • All XML is parsed in 1 scan.

  • All values are copied from source XML only once (to destination keys/values)

  • If some types of nodes (for ex comments) are ignored, there are no memory allocations/copy for them.

I've removed benchmark results, since they are very different for different xml's. Sometimes XML::Bare is faster, sometimes not. So, XML::Fast mainly should be considered not faster-than-bare, but format-other-than-bare

EXPORT

xml2hash $xml, [ %options ]

hash2xml $hash, [ %options ]

OPTIONS

order [ = 0 ]

Not implemented yet. Strictly keep the output order. When enabled, structures become more complex, but xml could be completely reverted.

attr [ = '-' ]

Attribute prefix

<node attr="test" />  =>  { node => { -attr => "test" } }
text [ = '#text' ]

Key name for storing text

When undef, text nodes will be ignored

<node>text<sub /></node>  =>  { node => { sub => '', '#text' => "test" } }
join [ = '' ]

Join separator for text nodes, splitted by subnodes

Ignored when order in effect

# default:
xml2hash( '<item>Test1<sub />Test2</item>' )
: { item => { sub => '', '~' => 'Test1Test2' } };

xml2hash( '<item>Test1<sub />Test2</item>', join => '+' )
: { item => { sub => '', '~' => 'Test1+Test2' } };
trim [ = 1 ]

Trim leading and trailing whitespace from text nodes

cdata [ = undef ]

When defined, CDATA sections will be stored under this key

# cdata = undef
<node><![CDATA[ test ]]></node>  =>  { node => 'test' }

# cdata = '#'
<node><![CDATA[ test ]]></node>  =>  { node => { '#' => 'test' } }
comm [ = undef ]

When defined, comments sections will be stored under this key

When undef, comments will be ignored

# comm = undef
<node><!-- comm --><sub/></node>  =>  { node => { sub => '' } }

# comm = '/'
<node><!-- comm --><sub/></node>  =>  { node => { sub => '', '/' => 'comm' } }
array => 1

Force all nodes to be kept as arrays.

# no array
<node><sub/></node>  =>  { node => { sub => '' } }

# array = 1
<node><sub/></node>  =>  { node => [ { sub => [ '' ] } ] }
array => [ 'node', 'names']

Force nodes with names to be stored as arrays

# no array
<node><sub/></node>  =>  { node => { sub => '' } }

# array => ['sub']
<node><sub/></node>  =>  { node => { sub => [ '' ] } }
utf8decode => 1

Force decoding of utf8 sequences, instead of just upgrading them (may be useful for broken xml)

SEE ALSO

  • XML::Bare

    Another fast parser

  • XML::LibXML

    The most powerful XML parser for perl. If you don't need to parse gigabytes of XML ;)

  • XML::Hash::LX

    XML parser, that uses XML::LibXML for parsing and then constructs hash structure, identical to one, generated by this module. (At least, it should ;)). But of course it is much more slower, than XML::Fast

LIMITATIONS

  • Does not support wide charsets (UTF-16/32) (see RT71534)

TODO

  • Ordered mode (as implemented in XML::Hash::LX)

  • Create hash2xml, identical to one in XML::Hash::LX

  • Partial content event-based parsing (I need this for reading XML streams)

Patches, propositions and bug reports are welcome ;)

AUTHOR

Mons Anderson, <mons@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2010 Mons Anderson

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.