NAME
HTML::DublinCore - Extract Dublin Core metadata from HTML
SYNOPSIS
use HTML::DublinCore;
## pass HTML to constructor
my $dc = HTML::DublinCore->new( $html );
## get the title element and print it's content
my $title = $dc->title();
print "title: ",$creator->content(),"\n";
## list context will retrieve all of a particular element
foreach my $element ( $dc->creators() ) {
print "creator: ",$element->creator(),"\n";
}
DESCRIPTION
HTML::DublinCore is a module for easily extracting Dublin Core metadata from withing HTML documents. The Dublin Core is a small set of metadata elements for describing information resources. Dublin Core is typically embedded in the <HEAD> of and HTML document using the <META> tag. For more information see RFC 2731 http://www.ietf.org/rfc/rfc2731
HTML::DublinCore allows you to easily extract, and work with the Dublin Core metadata found in a particular HTML document. For a definition of the meaning of various Dublin Core elements please see http://www.dublincore.org/documents/dces/
METHODS
new()
Constructor which you pass HTML content.
$dc = HTML::DublinCore->new( $html );
title()
Returns a HTML::Dublin::Core object for the title element. You can then retrieve content, qualifier, scheme, lang attributes like so.
my $dc = HTML::DublinCore->new( $html );
my $title = $dc->title();
print "content: ",$title->content(),"\n";
print "qualifier: ",$title->qualifier(),"\n";
print "schema: ",$title->schema(),"\n";
print "language: ",$title->language(),"\n";
Since there can be multiple instances of a particular element type (title, creator, subject, etc) you can retrieve multiple title elements by calling title() in a scalar context.
my @titles = $dc->title();
foreach my $title ( @titles ) {
print "title: ",$title->content(),"\n";
}
creator()
Retrieve creator information in the same manner as title().
subject()
Retrieve subject information in the same manner as title().
description()
Retrieve description information in the same manner as title().
publisher()
Retrieve publisher information in the same manner as title().
contribtor()
Retrieve contributor information in the same manner as title().
date()
Retrieve date information in the same manner as title().
type()
Retrieve type information in the same manner as title().
format()
Retrieve format information in the same manner as title().
identifier()
Retrieve identifier information in the same manner as title().
source()
Retrieve source information in the same manner as title().
language()
Retrieve language information in the same manner as title().
relation()
Retrieve relation information in the same manner as title().
coverage()
Retrieve coverage information in the same manner as title().
rights()
Retrieve rights information in the same manner as title().
asHtml()
Serialize your Dublin Core metadata as HTML <META> tags.
print $dc->asHtml();
TODO
More comprehensive tests.
Handle HTML entities properly.
Collect error messages so they can be reported out of the object.
SEE ALSO
Dublin Core http://www.dublincore.org/
RFC 2731 http://www.ietf.org/rfc/rfc2731
HTML::Parser
perl4lib http://www.rice.edu/perl4lib
AUTHOR
Ed Summers <ehs@pobox.com>
COPYRIGHT AND LICENSE
Copyright 2003 by Ed Summers
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.