NAME
HTML::Microformats - parse microformats in HTML
SYNOPSIS
use HTML::Microformats;
my $doc = HTML::Microformats
->new_document($html, $uri)
->assume_profile(qw(hCard hCalendar));
print $doc->json(pretty => 1);
use RDF::TrineShortcuts qw(rdf_query);
my $results = rdf_query($sparql, $doc->model);
VERSION
0.00_01
DESCRIPTION
The HTML::Microformats module is a wrapper for parser and handler
modules of various individual microformats (each of those modules has a
name like HTML::Microformats::Foo).
The general pattern of usage is to create an HTML::Microformats object
(which corresponds to an HTML document) using the "new_document" method;
then ask for the data, as a Perl hashref, a JSON string, or an
RDF::Trine model.
Constructor
"$doc = HTML::Microformats->new_document($html, $uri, %opts)"
Constructs a document object.
$html is the HTML or XHTML source (string) or an
XML::LibXML::Document.
$uri is the document URI, important for resolving relative URL
references.
%opts are additional parameters; currently only one option is
defined: $opts{'type'} is set to 'text/html' or
'application/xhtml+xml', to control how $html is parsed.
Profile Management
HTML::Microformats uses HTML profiles (i.e. the profile attribute on the
HTML <head> element) to detect which Microformats are used on a page.
Any microformats which do not have a profile URI declared will not be
parsed.
Because many pages fail to properly declare which profiles they use,
there are various profile management methods to tell HTML::Microformats
to assume the presence of particular profile URIs, even if they're
actually missing.
"$doc->profiles"
This method returns a list of profile URIs declared by the document.
"$doc->has_profile(@profiles)"
This method returns true if and only if one or more of the profile
URIs in @profiles is declared by the document.
"$doc->add_profile(@profiles)"
Using "add_profile" you can add one or more profile URIs, and they
are treated as if they were found on the document.
For example:
$doc->assume_profile('http://microformats.org/profile/rel-tag')
This is useful for adding profile URIs declared outside the document
itself (e.g. in HTTP headers).
"$doc->assume_profile(@microformats)"
For example:
$doc->assume_profile(qw(hCard adr geo))
This method acts similarly to "add_profile" but allows you to use
names of microformats rather than URIs.
Microformat names are case sensitive, and must match
HTML::Microformats::Foo module names.
"$doc->assume_all_profiles"
This method is equivalent to calling "assume_profile" for all known
microformats.
Parsing Microformats
Generally speaking, you can skip this. The "data", "json" and "model"
methods will automatically do this for you.
"$doc->parse_microformats"
Scans through the document, finding microformat objects.
On subsequent calls, does nothing (as everything is already parsed).
"$doc->clear_microformats"
Forgets information gleaned by "parse_microformats" and thus allows
"parse_microformats" to be run again. This is useful if you've
modified added some profiles between runs of "parse_microformats".
Retrieving Data
These methods allow you to retrieve the document's data, and do things
with it.
"$doc->objects($format);"
$format is, for example, 'hCard', 'adr' or 'RelTag'.
Returns a list of objects of that type. (If called in scalar
context, returns an arrayref.)
Each object is, for example, an HTML::Microformat::hCard object, or
an HTML::Microformat::RelTag object, etc. See the relevent
documentation for details.
"$doc->all_objects"
Returns a hashref of data. Each hashref key is the name of a
microformat (e.g. 'hCard', 'RelTag', etc), and the values are
arrayrefs of objects.
Each object is, for example, an HTML::Microformat::hCard object, or
an HTML::Microformat::RelTag object, etc. See the relevent
documentation for details.
"$doc->json(%opts)"
Returns data roughly equivalent to the "all_objects" method, but as
a JSON string.
%opts is a hash of options, suitable for passing to the JSON
module's to_json function.
"$doc->model"
Returns data as an RDF::Trine::Model, suitable for serialising as
RDF or running SPARQL queries.
"$doc->add_to_model($model)"
Adds data to an existing RDF::Trine::Model.
WHY ANOTHER MICROFORMATS MODULE?
There already exist two microformats packages on CPAN (see
Text::Microformat and Data::Microformat), so why create another?
Firstly, HTML::Microformats isn't being created from scratch. It's
actually a fork/clean-up of a non-CPAN application (Swignition), and in
that sense predates Text::Microformat (though not Data::Microformat).
It has a number of other features that distinguish it from the existing
packages:
* It supports more formats.
Swignition (and eventually HTML::Microformats) supports hCard,
hCalendar, rel-tag, geo, adr, rel-enclosure, rel-license, hReview,
hResume, hRecipe, xFolk, XFN and more.
* It supports more patterns.
HTML::Microformats supports the include pattern, abbr pattern, table
cell header pattern, value excerpting and other intricacies of
microformat parsing better than the other modules on CPAN.
* It offers RDF support.
One of the key features of HTML::Microformats is that it makes data
available as RDF::Trine models. This allows your application to
benefit from a rich, feature-laden Semantic Web toolkit. Data
gleaned from microformats can be stored in a triple store; output in
RDF/XML or Turtle; queried using the SPARQL or RDQL query languages;
and more.
If you're not comfortable using RDF, HTML::Microformats also makes
all its data available as native Perl objects.
BUGS
Please report any bugs to <http://rt.cpan.org/>.
SEE ALSO
RDF::RDFa::Parser, HTML::HTML5::Microdata::Parser.
<http://www.perlrdf.org/>.
Individual microformat modules:
* HTML::Microformats::adr
* HTML::Microformats::geo
* HTML::Microformats::hCard
* HTML::Microformats::RelTag
* HTML::Microformats::XFN
AUTHOR
Toby Inkster <tobyink@cpan.org>.
COPYRIGHT
Copyright 2008-2010 Toby Inkster
This library is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.