NAME

html2alvis - HTML to Alvis XML converter

SYNOPSIS

  html2alvis [options] [source directory ...]

Options:

  --html-ext                 HTML file identifying filename extension
  --meta-ext                 meta file identifying filename extension
  --out-dir                  output directory
  --N-per-out-dir            # of records per output directory
  --meta-encoding            the encoding of the meta files
  --html-encoding            the encoding of all HTML files
  --html-encoding-from-meta  take the encoding of the HTML files from
                             the meta files (attribute 'detected-charset')
  --[no]original             include original document?
  --help                     brief help message
  --man                      full documentation
  --[no]warnings             warnings output flag
  

OPTIONS

--html-ext
Sets the HTML file identifying filename extension. 
Default value: 'html'.
--meta-ext
Sets the  meta file identifying filename extension.
The meta file syntax is

      <feature name>\t<feature value>\n

Special features are url,title,date,detectedCharSet.
Default value: 'meta'.
--out-dir
Sets the output directory. Default value: '.'.
--N-per-out-dir
Sets the # of records per output directory. Default value: 1000.
--meta-encoding
Specifies the encoding of all meta files. Default value 'iso-8859-1'.
--html-encoding
Specifies the encoding of all HTML files. Default value 'iso-8859-1'.
Default: undef (meaning 'guess').
--html-encoding-from-meta
Specifies whether the encoding of an HTML file should be read from
the corresponding meta file. If no information is given there,
--html-encoding is used, if that is not given, the encoding is guessed.
Default: no.
--[no]original
Shall the original document be included in the output? Default
value: yes.
--help
Prints a brief help message and exits.
--man
Prints the manual page and exits.
--[no]warnings
Output (or suppress) warnings. Default value: yes.

DESCRIPTION

 Goes recursively through the files under the source directory
 and converts them to Alvis XML files. Meta information (such
 as the URL or the detected character set, title of the document
 etc.) can be given in a separate meta file, one per each document,
 recognized by the shared basename. E.g. the HTML document is
 called foo.original and the meta information is in foo.meta.
 In this case html2alvis should be called like this:

       html2.alvis --html-ext original --meta-ext meta