NAME
HTML::Embedded::Turtle - embedding RDF in HTML the crazy way
VERSION
0.01
SYNOPSIS
use HTML::Embedded::Turtle;
my $het = HTML::Embedded::Turtle->new($html, $base_uri);
foreach my $graph ($het->endorsements)
{
my $model = $het->graph($graph);
# $model is an RDF::Trine::Model. Do something with it.
}
DESCRIPTION
RDF can be embedded in (X)HTML using simple <script> tags. This is described at http://esw.w3.org/N3inHTML. This gives you a file format that can contain multiple (optionally named) graphs. The document as a whole can "endorse" a graph by including:
<link rel="meta" href="#foo" />
Where "#foo" is a fragment identifier pointing to a graph.
<script type="text/turtle" id="foo"> ... </script>
The rel="meta" stuff is parsed using an RDFa parser, so equivalent RDFa works too.
This module parses HTML files containing graphs like these, and allows you to access them each individually; as a union of all graphs on the page; or as a union of just the endorsed graphs.
Despite the module name, this module supports a variety of <script type>s: text/turtle, application/turtle, application/x-turtle text/plain (N-Triples), application/x-rdf+json (RDF/JSON), application/json (RDF/JSON), application/rdf+xml (RDF/XML). Although it doesn't support full N3, it recognises the following as well, but treats them as Turtle: text/n3, text/rdf+n3.
Constructor
$het = HTML::Embedded::Turtle->new($markup, $base_uri, \%opts)
-
Create a new object. $markup is the HTML or XHTML markup to parse; $base_uri is the base URI to use for relative references.
Options include:
markup
Choose which parser to use: 'html' or 'xml'. The former chooses HTML::HTML5::Parser, which can handle tag soup; the latter chooses XML::LibXML, which cannot. Defaults to 'html'.
rdfa_options
A set of options to be parsed to RDF::RDFa::Parser when looking for endorsements. See RDF::RDFa::Parser::Config. The default is probably sensible.
Public Methods
$het->union_graph
-
A union graph of all graphs found in the document, as an RDF::Trine::Model. Note that the returned model contains quads.
$het->endorsed_union_graph
-
A union graph of only the endorsed graphs, as an RDF::Trine::Model. Note that the returned model contains quads.
$het->graph($name)
-
A single graph from the page.
$het->all_graphs
-
A hashref where the keys are graph names and the values are RDF::Trine::Models. Some graph names will be URIs, and others may be blank nodes (e.g. "_:foobar").
$het->endorsed_graphs
-
Like
all_graphs
, but only returns endorsed graphs. Note that all endorsed graphs will have graph names that are URIs. $het->endorsements
-
Returns a list of URIs which are the names of endorsed graphs. Note that the presence of a URI
$x
in this list does not imply that$het->graph($x)
will be defined.
BUGS
Please report any bugs to http://rt.cpan.org/.
Please forgive me in advance for inflicting this module upon you.
SEE ALSO
RDF::RDFa::Parser, RDF::Trine.
AUTHOR
Toby Inkster <tobyink@cpan.org>.
COPYRIGHT AND LICENSE
Copyright (C) 2010 by Toby Inkster
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8 or, at your option, any later version of Perl 5 you may have available.