NAME
HTML::HTML5::Outline - implementation of the HTML5 Outline algorithm
SYNOPSIS
use
JSON;
use
HTML::HTML5::Outline;
my
$html
=
<<'HTML';
<!doctype html>
<h1>Hello</h1>
<h2>World</h2>
<h1>Good Morning</h1>
<h2>Vietnam</h2>
HTML
my
$outline
= HTML::HTML5::Outline->new(
$html
);
to_json(
$outline
->to_hashref, {
pretty
=>1,
canonical
=>1});
DESCRIPTION
This is an implementation of the HTML5 Outline algorithm, as per http://www.w3.org/TR/html5/sections.html#outlines.
The module can output a JSON-friendly hashref, or an RDF model.
Constructor
HTML::HTML5::Outline->new($html, %options)
Construct a new outline.
$html
is the HTML to generate an outline from, either as an HTML or XHTML string, or as an XML::LibXML::Document object.Options:
default_language - default language to assume text is in when no lang/xml:lang attribute is available. e.g. 'en-gb'.
element_subjects - rather advanced feature that doesn't bear explaining. See USE WITH RDF::RDFA::PARSER for an example.
microformats - support
<ul class="xoxo">
,<ol class="xoxo">
and<whatever class="figure">
as sectioning elements (like<section>
,<figure>
, etc). Boolean, defaults to false.parser - 'html' (default) or 'xml' - choose the parser to use for XHTML/HTML. If the constructor is passed an XML::LibXML::Document, this is ignored.
suppress_collections - allows rdf:List stuff to be suppressed from RDF output. RDF output - especially in Turtle format - looks somewhat nicer without them, but if you care about the order of headings and sections, then you'll want them. Boolean, defaults to false.
uri - the document URI for resolving relative URI references. Only really used by the RDF output.
Object Methods
to_hashref
Returns data as a nested hashref/arrayref structure. Dump it as JSON and you'll figure out the format pretty easily.
to_rdf
Returns data as a n RDF::Trine::Model. Requires RDF::Trine to be installed. Otherwise this method won't exist.
primary_outlinee
Returns a HTML::HTML5::Outline::Outlinee element representing the outline for the page.
Class Methods
has_rdf
Indicates whether the
to_rdf
object method exists.
USE WITH RDF::RDFA::PARSER
This module produces RDF data where many of the resources described are HTML elements. RDFa data typically does not, but RDF::RDFa::Parser does also support some extensions to RDFa which do (e.g. support for the cite
and role
attributes). It's useful to combine the RDF data from each, and RDF::RDFa::Parser 1.093 and upwards contains a few shims to make this possible.
Without further ado...
use
HTML::HTML5::Outline;
use
RDF::RDFa::Parser 1.093;
use
RDF::TrineShortcuts;
my
$rdfa
= RDF::RDFa::Parser->new(
$html_source
,
$base_url
,
RDF::RDFa::Parser::Config->new(
'html5'
,
'1.1'
,
role_attr
=> 1,
cite_attr
=> 1,
longdesc_attr
=> 1,
),
)->consume;
my
$outline
= HTML::HTML5::Outline->new(
$rdfa
->dom,
uri
=>
$rdfa
->uri,
element_subjects
=>
$rdfa
->element_subjects,
);
# Merging two graphs is pretty complicated in RDF::Trine
# but a little easier with RDF::TrineShortcuts...
my
$combined
= rdf_parse();
rdf_parse(
$rdfa
->graph,
model
=>
$combined
);
rdf_parse(
$outline
->to_rdf,
model
=>
$combined
);
my
$NS
= {
};
rdf_string(
$combined
=>
'Turtle'
,
namespaces
=>
$NS
);
SEE ALSO
HTML::HTML5::Outline::RDF, HTML::HTML5::Outline::Outlinee, HTML::HTML5::Outline::Section.
HTML::HTML5::Parser, HTML::HTML5::Sanity.
AUTHOR
Toby Inkster, <tobyink@cpan.org>
ACKNOWLEDGEMENTS
This module is a fork of the document structure parser from Swignition <http://buzzword.org.uk/swignition/>.
That in turn includes the following credits: thanks to Ryan King and Geoffrey Sneddon for pointing me towards [the HTML5] algorithm. I also used Geoffrey's python implementation as a crib sheet to help me figure out what was supposed to happen when the HTML5 spec was ambiguous.
COPYRIGHT AND LICENCE
Copyright (C) 2008-2011 by Toby Inkster
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.