NAME

news_xml2alvis.pl - news XML to Alvis XML converter

SYNOPSIS

  news_xml2alvis.pl [options] [source directory ...]

Options:

  --xml-ext            XML file identifying filename extension
  --meta-ext           meta file identifying filename extension
  --out-dir            output directory
  --N-per-out-dir      # of records per output directory
  --meta-encoding      the encoding of the meta files
  --help               brief help message
  --man                full documentation
  --[no]warnings       warnings output flag
  

OPTIONS

--xml-ext
Sets the XML file identifying filename extension. 
Default value: 'xml'.
--meta-ext
Sets the  meta file identifying filename extension.
Default value: 'meta'.
--out-dir
Sets the output directory. Default value: '.'.
--N-per-out-dir
Sets the # of records per output directory. Default value: 1000.
--meta-encoding
Specifies the encoding of the meta files. Default value 'iso-8859-1'.
--help
Prints a brief help message and exit.
--man
Prints the manual page and exits.
--[no]warnings
Output (or suppress) warnings. Default value: yes.

DESCRIPTION

 Goes recursively through the files under the source directory
 and converts them to Alvis XML files. Meta information (such
 as the URL or the detected character set, title of the document
 etc.) can be given in a separate meta file, one per each document,
 recognized by the shared basename. E.g. the XML document is
 called foo.news and the meta information is in foo.meta.
 In this case news_xml2alvis.pl should be called like this:

       news_xml2.alvis.pl --xml-ext news --meta-ext meta  
 
 The news XML files are expected to be of the format

 <DOCUMENT>
   <article>
     <date></date>
     <iso-date></iso-date>
     <title></title>
     <content></content>
     <links>
         <link type="a">
             <location></location>
         </link>
     </links>
   </article>

 and meta files of the format 

       <feature name>\t<feature value>\n

 Special features are url,title,date,detectedCharSet.