NAME
HTML::HTML5::ToText - convert HTML to plain text
SYNOPSIS
my $dom = HTML::HTML5::Parser->load_html(IO => \*STDIN);
print HTML::HTML5::ToText
->with_traits(qw/ShowLinks ShowImages RenderTables/)
->new()
->process($dom);
DESCRIPTION
The HTML::HTML5::ToText module itself produces a pretty boring conversion of HTML to text, but thanks to Moose and MooseX::Traits.
Compositor
with_traits(@traits)
-
This class method creates a new class that composes
HTML::HTML5::ToText
with each trait given, returning the name of that class. That class will be a subclass ofHTML::HTML5::ToText
.Traits are taken to be in the "HTML::HTML5::ToText::Trait" namespace unless overridden by prefixing the trait with "+".
Constructors
new(%attrs)
Creates a new instance of the class.
new_with_traits(traits => \@traits, %attrs)
Shortcut for:
HTML::HTML5::ToText->with_traits(@traits)->new(%attrs)
Attributes
As per usual for Moose classes, accessor methods are provided for each attribute, and attributes may be set in the constructor.
HTML::HTML5::ToText
does not actually provide any attributes, but some traits may.
Methods
process($node)
Processes an XML::LibXML::Node and returns a string. May be called as a class or object method.
Because
process
likes to perform some alterations to the DOM tree, as a first stage it makes a clone of the DOM tree (so that it can leave the original intact). If you don't care about any changes to the tree, and want to save a bit of CPU, then you can suppress the cloning by passing a true value as a second argument toprocess
.HTML::HTML5::ToText->process($node, 'no_clone')
process_string($string)
As per
process
, but first parses the string with HTML::HTML5::Parser. The second argument (for cloning) does not exist as cloning is not needed in this case.
There are also methods named (in upper-case) after every element defined in HTML5: STRONG($node)
, DL($node)
, IMG($node)
and so on, which process($node)
delegates to; and a textnode($node)
method which is the equivalent for text nodes. These are the methods which traits tend to modify.
BUGS
Please report any bugs to http://rt.cpan.org/Dist/Display.html?Queue=HTML-HTML5-ToText.
SEE ALSO
HTML::HTML5::Parser, HTML::HTML5::Table.
HTML::HTML5::ToText::Trait::RenderTables, HTML::HTML5::ToText::Trait::ShowImages, HTML::HTML5::ToText::Trait::ShowLinks, HTML::HTML5::ToText::Trait::TextFormatting.
AUTHOR
Toby Inkster <tobyink@cpan.org>.
THANKS
Everyone behind Moose. No way I could have done all this in a few hours without Moose's strange brand of meta-programming!
COPYRIGHT AND LICENCE
This software is copyright (c) 2012 by Toby Inkster.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.
DISCLAIMER OF WARRANTIES
THIS PACKAGE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.