NAME
HTML::Embedded::Turtle - embedding RDF in HTML the crazy way
SYNOPSIS
use HTML::Embedded::Turtle;
my $het = HTML::Embedded::Turtle->new($html, $base_uri);
foreach my $graph ($het->endorsements)
{
my $model = $het->graph($graph);
# $model is an RDF::Trine::Model. Do something with it.
}
DESCRIPTION
RDF can be embedded in (X)HTML using simple <script> tags. This is described at http://esw.w3.org/N3inHTML. This gives you a file format that can contain multiple (optionally named) graphs. The document as a whole can "endorse" a graph by including:
<link rel="meta" href="#foo" />
Where "#foo" is a fragment identifier pointing to a graph.
<script type="text/turtle" id="foo"> ... </script>
The rel="meta" stuff is parsed using an RDFa parser, so equivalent RDFa works too.
This module parses HTML files containing graphs like these, and allows you to access them each individually; as a union of all graphs on the page; or as a union of just the endorsed graphs.
Despite the module name, this module supports a variety of <script type>s: text/turtle, application/turtle, application/x-turtle text/plain (N-Triples), text/n3 (Notation 3), application/x-rdf+json (RDF/JSON), application/json (RDF/JSON), and application/rdf+xml (RDF/XML).
The deprecated attribute "language" is also supported:
<script language="Turtle" id="foo"> ... </script>
Languages supported are (case insensitive): "Turtle", "NTriples", "RDFJSON", "RDFXML" and "Notation3".
Constructor
HTML::Embedded::Turtle->new($markup, $base_uri, \%opts)
-
Create a new object. $markup is the HTML or XHTML markup to parse; $base_uri is the base URI to use for relative references.
Options include:
markup
Choose which parser to use: 'html' or 'xml'. The former chooses HTML::HTML5::Parser, which can handle tag soup; the latter chooses XML::LibXML, which cannot. Defaults to 'html'.
rdfa_options
A set of options to be parsed to RDF::RDFa::Parser when looking for endorsements. See RDF::RDFa::Parser::Config. The default is probably sensible.
Public Methods
union_graph
-
A union graph of all graphs found in the document, as an RDF::Trine::Model. Note that the returned model contains quads.
endorsed_union_graph
-
A union graph of only the endorsed graphs, as an RDF::Trine::Model. Note that the returned model contains quads.
graph($name)
-
A single graph from the page.
graphs
all_graphs
-
A hashref where the keys are graph names and the values are RDF::Trine::Models. Some graph names will be URIs, and others may be blank nodes (e.g. "_:foobar").
graphs
andall_graphs
are aliases for each other. endorsed_graphs
-
Like
all_graphs
, but only returns endorsed graphs. Note that all endorsed graphs will have graph names that are URIs. endorsements
-
Returns a list of URIs which are the names of endorsed graphs. Note that the presence of a URI
$x
in this list does not imply that$het->graph($x)
will be defined. dom
-
Returns the page DOM.
uri
-
Returns the page URI.
BUGS
Please report any bugs to http://rt.cpan.org/.
Please forgive me in advance for inflicting this module upon you.
SEE ALSO
RDF::RDFa::Parser, RDF::Trine, RDF::TriN3.
AUTHOR
Toby Inkster <tobyink@cpan.org>.
COPYRIGHT AND LICENSE
Copyright (C) 2010-2011 by Toby Inkster.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.
DISCLAIMER OF WARRANTIES
THIS PACKAGE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.