NAME

HTML::DublinCore - Extract Dublin Core metadata from HTML

SYNOPSIS

use HTML::DublinCore;

## pass HTML to constructor
my $dc = HTML::DublinCore->new( $html );

## get the title element and print it's content
my $title = $dc->title();
print "title: ",$creator->content(),"\n";

## list context will retrieve all of a particular element 
foreach my $element ( $dc->creators() ) {
    print "creator: ",$element->creator(),"\n";
}

DESCRIPTION

HTML::DublinCore is a module for easily extracting Dublin Core metadata from withing HTML documents. The Dublin Core is a small set of metadata elements for describing information resources. Dublin Core is typically embedded in the <HEAD> of and HTML document using the <META> tag. For more information see RFC 2731 http://www.ietf.org/rfc/rfc2731

HTML::DublinCore allows you to easily extract, and work with the Dublin Core metadata found in a particular HTML document. For a definition of the meaning of various Dublin Core elements please see http://www.dublincore.org/documents/dces/

METHODS

new()

Constructor which you pass HTML content.

$dc = HTML::DublinCore->new( $html );

title()

Returns a HTML::Dublin::Core object for the title element. You can then retrieve content, qualifier, scheme, lang attributes like so.

my $dc = HTML::DublinCore->new( $html );
my $title = $dc->title();
print "content: ",$title->content(),"\n";
print "qualifier: ",$title->qualifier(),"\n";
print "schema: ",$title->schema(),"\n";
print "language: ",$title->language(),"\n";

Since there can be multiple instances of a particular element type (title, creator, subject, etc) you can retrieve multiple title elements by calling title() in a scalar context.

    my @titles = $dc->title();
    foreach my $title ( @titles ) {
	print "title: ",$title->content(),"\n";
    }

creator()

Retrieve creator information in the same manner as title().

subject()

Retrieve subject information in the same manner as title().

description()

Retrieve description information in the same manner as title().

publisher()

Retrieve publisher information in the same manner as title().

contribtor()

Retrieve contributor information in the same manner as title().

date()

Retrieve date information in the same manner as title().

type()

Retrieve type information in the same manner as title().

format()

Retrieve format information in the same manner as title().

identifier()

Retrieve identifier information in the same manner as title().

source()

Retrieve source information in the same manner as title().

language()

Retrieve language information in the same manner as title().

relation()

Retrieve relation information in the same manner as title().

coverage()

Retrieve coverage information in the same manner as title().

rights()

Retrieve rights information in the same manner as title().

asHtml()

Serialize your Dublin Core metadata as HTML <META> tags.

print $dc->asHtml();

TODO

  • More comprehensive tests.

  • Handle HTML entities properly.

  • Collect error messages so they can be reported out of the object.

SEE ALSO

AUTHOR

Ed Summers <ehs@pobox.com>

COPYRIGHT AND LICENSE

Copyright 2003 by Ed Summers

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.