NAME
html2alvis - HTML to Alvis XML converter
SYNOPSIS
html2alvis [options] [source directory ...]
Options:
--html-ext HTML file identifying filename extension
--meta-ext meta file identifying filename extension
--out-dir output directory
--N-per-out-dir # of records per output directory
--meta-encoding the encoding of the meta files
--html-encoding the encoding of all HTML files
--html-encoding-from-meta take the encoding of the HTML files from
the meta files (attribute 'detected-charset')
--[no]original include original document?
--help brief help message
--man full documentation
--[no]warnings warnings output flag
OPTIONS
- --html-ext
-
Sets the HTML file identifying filename extension. Default value: 'html'.
- --meta-ext
-
Sets the meta file identifying filename extension. The meta file syntax is <feature name>\t<feature value>\n Special features are url,title,date,detectedCharSet. Default value: 'meta'.
- --out-dir
-
Sets the output directory. Default value: '.'.
- --N-per-out-dir
-
Sets the # of records per output directory. Default value: 1000.
- --meta-encoding
-
Specifies the encoding of all meta files. Default value 'iso-8859-1'.
- --html-encoding
-
Specifies the encoding of all HTML files. Default value 'iso-8859-1'. Default: undef (meaning 'guess').
- --html-encoding-from-meta
-
Specifies whether the encoding of an HTML file should be read from the corresponding meta file. If no information is given there, --html-encoding is used, if that is not given, the encoding is guessed. Default: no.
- --[no]original
-
Shall the original document be included in the output? Default value: yes.
- --help
-
Prints a brief help message and exits.
- --man
-
Prints the manual page and exits.
- --[no]warnings
-
Output (or suppress) warnings. Default value: yes.
DESCRIPTION
Goes recursively through the files under the source directory
and converts them to Alvis XML files. Meta information (such
as the URL or the detected character set, title of the document
etc.) can be given in a separate meta file, one per each document,
recognized by the shared basename. E.g. the HTML document is
called foo.original and the meta information is in foo.meta.
In this case html2alvis should be called like this:
html2.alvis --html-ext original --meta-ext meta