NAME

MediaWiki::FastDump - Fast and easy access to the pages and titles from a Mediawiki XML dump file.

VERSION

Version 0.01

SYNOPSIS

use MediaWiki::FastDump;

my $p = MediaWiki::FastDump->new($filename);
my $p = MediaWiki::FastDump->new($url);
my $p = MediaWiki::FastDump->new(\*FILEHANDLE);

while(my ($title, $article) = $p->next) {
	print "Title: $title\n";
	print "$article\n";
}

FUNCTIONS

new

This is the constructor for this module. It is called with a single parameter: the location of a MediaWiki XML page dump file or a reference to an already open file handle. The location may either be to a file on the local filesystem or a URL.

next

This method returns a two item list where the first item is the page title and the second item is the page text. When there are no more pages left it returns an empty list.

HISTORY

This software started life as a benchmark for comparing various XML parsers for perl. When I discovered just how fast this implementation went I realized 80% of the people who access a MediaWiki dump file are going to be accessing the article titles and text of the English Wikipedia. This means the XML parsing needs to be really fast. This package is twice as fast as the fastest SAX parser and five times faster than Parse::MediaWikiDump (as of Dec 2, 2009).

LIMITATIONS

This software is fairly fragile and is really a hack. If things go awry it might not even be able to tell. If the XML format changes the behavior is completely undefined.

AUTHOR

"Tyler Riddle", <"triddle at gmail.com">

BUGS

Please report any bugs or feature requests to bug-mediawiki-fastdump at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=MediaWiki-FastDump. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

perldoc MediaWiki::FastDump

You can also look for information at:

COPYRIGHT & LICENSE

Copyright 2009 "Tyler Riddle".

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.

See http://dev.perl.org/licenses/ for more information.