NAME
Text::Microformat - A Microformat parser
VERSION
Version 0.02
SYNOPSIS
use Text::Microformat;
use LWP::Simple;
# Parse a document
my $doc = Text::Microformat->new(
get('http://phil.windley.org/hcard.html')
);
# Extract all known Microformats
my @formats = $doc->find;
my $hcard = shift @formats;
# Easiest way to get a value (returns the first one found, else undef)
my $full_name = $hcard->Get('fn');
my $family_name = $hcard->Get('n.family-name');
my $city = $hcard->Get('adr.locality');
# Get the human-readable version specifically
my $family_name = $hcard->GetH('n.family-name');
# Get the machine-readable version specifically
my $family_name = $hcard->GetM('n.family-name');
# The more powerful interface (access multiple properties)
my $family_name = $hcard->n->[0]->family_name->[0]->Value;
# Dump to a hash
my $hash = $hcard->AsHash;
# Dump to YAML
print $hcard->ToYAML, "\n";
# Free the document and all the formats
$doc->delete;
DESCRIPTION
Text::Microformat is a Microformat parser for Perl.
Text::Microformat sports a very pluggable API, which allows not only new kinds of Microformats to be added, but also extension of the parser itself, to allow new parsing metaphors and source document encodings.
FEATURES
Extracting Microformats from HTML, XHTML and XML
Extracting Microformats from entity-encoded or CDATA sections in RSS feeds.
The include pattern
Microformats built from other Microformats
SUPPORTED MICROFORMATS
hCard
OTHER SUPPORTED SEMANTIC MARKUP
hGrant
METHODS
new($content, %opts)
Parses the string $content and creates a new Text::Microformat object.
Recognized options:
content_type => 'text/html'
Specify the content type. Any content type containing 'html' invokes the HTML Parser, and content type containing XML invokes XML Parser. Defaults to 'text/html'. (See HTML::TreeBuilder and XML::TreeBuilder)
find()
Returns an array of all known Microformats in the document.
delete()
Deletes the underlying parse tree - which is required by HTML::TreeBuilder to free up memory. Behavior of Text::Microformat::Element::* objects is undefined after this method is called.
EXTENDING Text::Microformat
CREATING A NEW FORMAT
This is as easy as creating a new module in the Text::Microformat::Element::* namespace, having Text::Microformat::Element as a super-class. It will be auto-loaded by Text::Microformat.
Every Microformat element has it's own namespace auto-generated, for example:
Text::Microformat::Element::hCard::n::family_name
So it's easy to override the default behavior of Text::Microformat::Element via inheritance.
See existing formats for hints.
CREATING A PLUGIN
This is as easy as creating a new module in the Text::Microformat::Plugin::* namespace. It will be auto-loaded by Text::Microformat. Text::Microformat has several processing phases, and uses NEXT to traverse the plugin chain.
Current processing phases are, in order of execution:
defaults
Set default options in $c->opts
pre_parse
Pre-parsing activities (Operations on the document source, perhaps)
parse
Parsing - at least one plugin must parse $c->content into $c->tree
post_parse
Post-parsing activities (E.g. the include pattern happens here)
pre_find_formats
Before looking for Microformats
find_formats
Populate the $c->formats array with Text::Microformat::Element objects
post_find_formats
After looking for Microformats
A plugin may add handlers to one or more phases.
See existing plugins for hints.
TODO
Documentation!
Add more formats
Add filtering options to the find() method
Parsing and format-finding performance could definitely be improved
SEE ALSO
HTML::TreeBuilder, XML::TreeBuilder, http://microformats.org
AUTHOR
Keith Grennan, <kgrennan at cpan.org>
BUGS
Log bugs and feature requests here: http://code.google.com/p/ufperl/issues/list
SUPPORT
Project homepage: http://code.google.com/p/ufperl/
COPYRIGHT & LICENSE
Copyright 2007 Keith Grennan, all rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.