NAME
MediaWiki::DumpFile::Pages - Process an XML dump file of pages from a MediaWiki instance
SYNOPSIS
use MediaWiki::DumpFile::Pages;
$pages = MediaWiki::DumpFile::Pages->new($file);
$pages = MediaWiki::DumpFile::Pages->new(\*FH);
$version = $pages->version;
#version 0.3 and later dump files only
$sitename = $pages->sitename;
$base = $pages->base;
$generator = $pages->generator;
$case = $pages->case;
%namespaces = $pages->namespaces;
#all versions
while(defined($page = $pages->next) {
print 'Title: ', $page->title, "\n";
}
$title = $page->title;
$id = $page->id;
$revision = $page->revision;
@revisions = $page->revision;
$text = $revision->text;
$id = $revision->id;
$timestamp = $revision->timestamp;
$comment = $revision->comment;
$contributor = $revision->contributor;
#version 0.4 and later dump files only
$bool = $revision->redirect;
$username = $contributor->username;
$id = $contributor->id;
$ip = $contributor->ip;
$username_or_ip = $contributor->astext;
$username_or_ip = "$contributor";
METHODS
new
This is the constructor for this package. It is called with a single parameter: the location of a MediaWiki pages dump file or a reference to an already open file handle.
version
Returns the version of the dump file.
sitename
Returns the sitename from the MediaWiki instance. Requires a dump file of at least version 0.3.
base
Returns the URL used to access the MediaWiki instance. Requires a dump file of at least version 0.3.
generator
Returns the version of MediaWiki that generated the dump file. Requires a dump file of at least version 0.3.
case
Returns the case sensitivity configuration of the MediaWiki instance. Requires a dump file of at least version 0.3.
namespaces
Returns a hash where the key is the numerical namespace id and the value is the plain text namespace name. The main namespace has an id of 0 and an empty string value. Requires a dump file of at least version 0.3.
next
Returns an instance of MediaWiki::DumpFile::Pages::Page or undef if there is no more pages available.
size
Returns the size of the input file in bytes or if the input specified is a reference to a file handle it returns undef.
current_byte
Returns the number of bytes of XML that have been successfully parsed.
MediaWiki::DumpFile::Pages::Page
This object represents a distinct Mediawiki page and is used to access the page data and metadata. The following methods are available:
- title
-
Returns a string of the page title
- id
-
Returns a numerical page identification
- revision
-
In scalar context returns the last revision in the dump for this page; in array context returns a list of all revisions made available for the page in the same order as the dump file. All returned data is an instance of MediaWiki::DumpFile::Pages::Revision
MediaWiki::DumpFile::Pages::Page::Revision
This object represents a distinct revision of a page from the Mediawiki dump file. The standard dump files contain only the most specific revision of each page and the comprehensive dump files contain all revisions for each page. The following methods are available:
- text
-
Returns the page text for this specific revision of the page.
- id
-
Returns the numerical revision id for this specific revision - this is independent of the page id.
- timestamp
-
Returns a string value representing the time the revision was created. The string is in the format of "2008-07-09T18:41:10Z".
- comment
-
Returns the comment made about the revision when it was created.
- contributor
-
Returns an instance of MediaWiki::DumpFile::Pages::Page::Revision::Contributor
- minor
-
Returns true if the edit was marked as being minor or false otherwise
- redirect
-
Returns true if the page is a redirect to another page or false otherwise. Requires a dump file of at least version 0.4.
MediaWiki::DumpFile::Pages::Page::Revision::Contributor
This object provides access to the contributor of a specific revision of a page. When used in a scalar context it will return the username of the editor if the editor was logged in or the IP address of the editor if the edit was anonymous.
- username
-
Returns the username of the editor if the editor was logged in when the edit was made or undef otherwise.
- id
-
Returns the numerical id of the editor if the editor was logged in or undef otherwise.
- ip
-
Returns the IP address of the editor if the editor was anonymous or undef otherwise.
- astext
-
Returns the username of the editor if they were logged in or the IP address if the editor was anonymous.
AUTHOR
Tyler Riddle, <triddle at gmail.com>
BUGS
Please see MediaWiki::DumpFile for information on how to report bugs in this software.
COPYRIGHT & LICENSE
Copyright 2009 "Tyler Riddle".
This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.
See http://dev.perl.org/licenses/ for more information.