NAME

DTA::CAB::Format::XmlNative - Datum parser|formatter: XML (native)

SYNOPSIS

use DTA::CAB::Format::XmlNative;

##========================================================================
## Methods

$fmt = DTA::CAB::Format::XmlNative->new(%args);
$obj = $fmt->parseNode($nod);
$doc = $fmt->parseDocument();
$fmt = $fmt->putDocument($doc);

##========================================================================
## Utilities

$nod = $fmt->xmlNode($thingy,$name);
$val = PACKAGE::_pushValue(\%hash,  $key, $val); ##-- $hash{$key}=$val;

DESCRIPTION

DTA::CAB::Format::XmlNative is a DTA::CAB::Format subclass for document I/O using a native XML dialect. It inherits from DTA::CAB::Format::XmlCommon.

Methods

new
$fmt = CLASS_OR_OBJ->new(%args);

%$fmt, %args:

##-- input: inherited
xdoc => $xdoc,                          ##-- XML::LibXML::Document
xprs => $xprs,                          ##-- XML::LibXML parser
##
##-- input: new
parseXmlData => $bool,                  ##-- if specified and true, _xmldata key will be populated by parseNode() (default=unspecified:true)
##
##-- input+output: new
xml2key => \%xml2key,                   ##-- maps xml keys to internal keys
ignoreKeys => \%key2undef,              ##-- keys to ignore for i/o
##
##-- output: new
arrayEltKeys => \%akey2ekey,            ##-- maps array keys to element keys for output
arrayImplicitKeys => \%akey2undef,      ##-- pseudo-hash of array keys NOT mapped to explicit elements
key2xml => \%key2xml,                   ##-- maps keys to XML-safe names
xml2key => \%xml2key,                   ##-- maps xml keys to internal keys
##
##-- output: inherited
encoding => $inputEncoding,             ##-- default: UTF-8; applies to output only!
level => $level,                        ##-- output formatting level (default=0)
parseDocument
$doc = $fmt->parseDocument();

Parses buffered XML::LibXML::Document into a buffered DTA::CAB::Document.

shortName

Returns "official" short name for this format, here just 'xml'.

putDocument
$fmt = $fmt->putDocument($doc);

Formats the DTA::CAB::Document $doc as XML to the in-memory buffer $fmt->{xdoc}.

Utilities

parseNode
$obj = $fmt->parseNode($nod);

Returns a perl object represented by the XML::LibXML::Node $nod; attempting to map xml to perl structure "sensibly".

DTA::CAB::Datum nodes (document, sentence, token) get some additional baggage:

_xmldata  => $data,    ##-- unparsed content (raw string)
xmlNode
$nod = $fmt->xmlNode($thingy,$name);

Returns an xml node for the perl scalar $thingy using $name as its key, used in constructing XML output documents.

_pushValue
$val = PACKAGE::_pushValue(\%hash,  $key, $val); ##-- $hash{$key}=$val;
$val = PACKAGE::_pushValue(\@array, $key, $val); ##-- push(@array,$val)

Convenience routine used by parseNode() when constructing perl data structures from XML input.

EXAMPLE

An example file in the format accepted/generated by this module is:

<?xml version="1.0" encoding="UTF-8"?>
<doc>
  <s lang="de">
    <w exlex="wie" hasmorph="1" msafe="1" errid="ec" t="wie" lang="de">
      <moot word="wie" lemma="wie" tag="PWAV"/>
      <xlit latin1Text="wie" isLatin1="1" isLatinExt="1"/>
    </w>
    <w msafe="0" t="oede">
      <moot tag="ADJD" lemma="öde" word="öde"/>
      <xlit isLatinExt="1" isLatin1="1" latin1Text="oede"/>
    </w>
    <w msafe="1" errid="ec" t="!" exlex="!">
      <moot lemma="!" word="!" tag="$."/>
      <xlit isLatinExt="1" isLatin1="1" latin1Text="!"/>
    </w>
  </s>
</doc>

AUTHOR

Bryan Jurish <moocow@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2010-2019 by Bryan Jurish

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.24.1 or, at your option, any later version of Perl 5 you may have available.

SEE ALSO

dta-cab-convert.perl(1), DTA::CAB::Format::XmlCommon(3pm), DTA::CAB::Format::Builtin(3pm), DTA::CAB::Format(3pm), DTA::CAB(3pm), perl(1), ...

1 POD Error

The following errors were encountered while parsing the POD:

Around line 648:

Non-ASCII character seen before =encoding in 'lemma="öde"'. Assuming UTF-8