NAME
MediaWiki::FastDump - Fast and easy access to the pages and titles from a Mediawiki XML dump file.
VERSION
Version 0.01
SYNOPSIS
use MediaWiki::FastDump;
my $p = MediaWiki::FastDump->new($filename);
my $p = MediaWiki::FastDump->new($url);
my $p = MediaWiki::FastDump->new(\*FILEHANDLE);
while(my ($title, $article) = $p->next) {
print "Title: $title\n";
print "$article\n";
}
FUNCTIONS
new
This is the constructor for this module. It is called with a single parameter: the location of a MediaWiki XML page dump file or a reference to an already open file handle. The location may either be to a file on the local filesystem or a URL.
next
This method returns a two item list where the first item is the page title and the second item is the page text. When there are no more pages left it returns an empty list.
HISTORY
This software started life as a benchmark for comparing various XML parsers for perl. When I discovered just how fast this implementation went I realized 80% of the people who access a MediaWiki dump file are going to be accessing the article titles and text of the English Wikipedia. This means the XML parsing needs to be really fast. This package is twice as fast as the fastest SAX parser and five times faster than Parse::MediaWikiDump (as of Dec 2, 2009).
LIMITATIONS
This software is fairly fragile and is really a hack. If things go awry it might not even be able to tell. If the XML format changes the behavior is completely undefined.
AUTHOR
"Tyler Riddle", <"triddle at gmail.com">
BUGS
Please report any bugs or feature requests to bug-mediawiki-fastdump at rt.cpan.org
, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=MediaWiki-FastDump. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
SUPPORT
You can find documentation for this module with the perldoc command.
perldoc MediaWiki::FastDump
You can also look for information at:
RT: CPAN's request tracker
AnnoCPAN: Annotated CPAN documentation
CPAN Ratings
Search CPAN
COPYRIGHT & LICENSE
Copyright 2009 "Tyler Riddle".
This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.
See http://dev.perl.org/licenses/ for more information.