Take me over?
NAME
HTML::ToDocBook - Converts an XHTML file into DocBook.
VERSION
This describes version 0.03 of HTML::ToDocBook.
SYNOPSIS
use HTML::ToDocBook;
my $obj = HTML::ToDocBook->new(%args);
$obj->convert(infile=>$filename);
# convert HTML file
$obj->convert(infile=>$filename, html=>1);
DESCRIPTION
This module converts an XHTML file into DocBook format using both heuristics and XSLT processing. By default, this expects the input file to be correct XHTML -- there are other programs such as html tidy (http://tidy.sourceforge.net/) which can correct files for you; this does not do that.
Note also this is very simple; it doesn't deal with things like <div> or <span> which it has no way of guessing the meaning of. (For some, however, if they have class names which match DocBook tags, they will be turned into those tags) This does not merge multiple XHTML files into a single document, so this converts each XHTML file into a <chapter>, with each header being a section (sect1 to sect5). The <title> tag is used for the chapter title.
There will likely to be validity errors, depending on how good the original HTML was. There may be broken links, <xref> elements that should be <link>s, and overuse of <emphasis> and <emphasis role="bold">.
METHODS
new
my $conv = HTML::ToDocBook->new();
my $conv = HTML::ToDocBook->new(stylesheet=>$stylesheet);
Arguments:
- stylesheet
-
A replacement XSLT stylesheet to use for conversions instead of the built-in one. This can either be a file name or a string containing the entire stylesheet.
convert
$obj->convert(infile=>$filename,
html=>1);
Arguments:
- infile
-
The name of the file to convert.
- html
-
Parse the input as HTML rather than XML.
Private Methods
These are not guaranteed to be stable.
insert_sections
$my str = $obj->insert_sections($string);
This inserts <div class="sectN"> tags to enclose all levels of header. These will then be picked up by the XSLT stylesheet and converted into section tags.
REQUIRES
Cwd
File::Basename
File::Spec
XML::LibXML
XML::LibXSLT
HTML::SimpleParse
Test::More
INSTALLATION
To install this module, run the following commands:
perl Build.PL
./Build
./Build test
./Build install
Or, if you're on a platform (like DOS or Windows) that doesn't like the "./" notation, you can do this:
perl Build.PL
perl Build
perl Build test
perl Build install
In order to install somewhere other than the default, such as in a directory under your home directory, like "/home/fred/perl" go
perl Build.PL --install_base /home/fred/perl
as the first step instead.
This will install the files underneath /home/fred/perl.
You will then need to make sure that you alter the PERL5LIB variable to find the modules, and the PATH variable to find the script.
Therefore you will need to change: your path, to include /home/fred/perl/script (where the script will be)
PATH=/home/fred/perl/script:${PATH}
the PERL5LIB variable to add /home/fred/perl/lib
PERL5LIB=/home/fred/perl/lib:${PERL5LIB}
SEE ALSO
perl(1).
BUGS
Please report any bugs or feature requests to the author.
AUTHOR
Kathryn Andersen (RUBYKAT)
perlkat AT katspace dot com
http://www.katspace.org/tools
COPYRIGHT AND LICENCE
XSLT stylesheet based on the one at http://wiki.docbook.org/topic/Html2DocBook by Jeff Beal
Copyright (c) 2006 by Kathryn Andersen
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.