The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

HTML::WikiConverter::Dialect::MediaWiki - Dialect for conversion of HTML to MediaWiki markup

SYNOPSIS

  use HTML::WikiConverter;

  my $wc = new HTMLM::WikiConverter(
    html => $html,
    dialect => 'MediaWiki',
    pretty_tables => 1
  );

  print $wc->output;

DESCRIPTION

This module is the HTML::WikiConverter dialect for producing MediaWiki markup from HTML source. MediaWiki is a wiki engine, particularly well known because it is the wiki engine used by the free encyclopedia, Wikipedia.

OPTIONS

This module accepts a few options. You can pass them in to this module by including them when you construct a new HTML::WikiConverter:

  my $wc = new HTML::WikiConverter(
    html => $html,
    dialect => 'MediaWiki',
    base_url => 'http://en.wikipedia.org',

    default_wplang  => 'en',
    convert_wplinks => 1,
    pretty_tables   => 1
  );

In addition to the standard parameters that can be passed to any wiki dialect (including html, and base_url), this module also accepts:

Specifies whether links to Wikipedia (http://www.wikipedia.org) should be converted into their [[wikilink]] equivalents. For example, with the convert_wplinks enabled, the HTML

  <A HREF="http://en.wikipedia.org/wiki/Comedy_film">Comedy film</A>

will be automatically converted to

  [[Comedy film]]

Wikipedia allows you to specify alternate titles for links. This module uses the content of the A tag as the alternate title. So

  <A HREF="http://en.wikipedia.org/wiki/Comedy_film">comedy</A>

becomes

  [[Comedy film|comedy]]

Capitalization is also considered when producing wiki links. If the page title and alternate title differ only in the capitalization of the first character of the title, then a simpler link is produced. So rather than converting

  <A HREF="http://en.wikipedia.org/wiki/Comedy_film">comedy film</A>

to

  [[Comedy film|comedy film]]

this module produces

  [[comedy film]]

since the Wikipedia parser knows that this should point to the "Comedy film" article.

Note: Despite this apparent coolness, the "pipe trick" is not yet used by this module. If it were, this module would convert this

  <A HREF="http://en.wikipedia.org/wiki/User:Diberri">Diberri</A>

into

  [[User:Diberri|]]

(Note the trailing pipe character.) That would be really cool, but it's not yet implemented.

default_wplang

Specifies the two-character langauge code to be used as the default language when converting links to Wikipedia articles. If the language differs from the language found in the URL, then an interlanguage wiki link is created with

  [[:xx:Article]]

Where "xx" is the language code in the URL, and "Article" is the name of the article being linked to. Note that the leading colon is not a typo -- this is needed so that the MediaWiki software interprets this as a link to an article rather than an indication that a translation of the current page is available.

pretty_tables

Boolean specifying whether to stylize tables with shading and thin borders. A "pretty table" looks like this:

  {| cellpadding="3" cellspacing="0" border="1" style="border-collapse: collapse"
  |- bgcolor="#cccccc"
  | ... etc
  |}

FEATURES

The MediaWiki dialect converts most HTML tags into their MediaWiki equivalents.

Simple markup

Tags such as B, STRONG, EM, and I are converted to their MediaWiki equivalents.

Tables (nested tables not supported)

TABLE tags and associated TR, TH, and TD tags are converted into "{|...|}" blocks. Nested tables are currently not supported at any reasonable level.

Lists (nested lists are supported)

Both unordered and ordered lists (UL and OL, respectively) are converted into their MediaWiki counterparts using an asterisk (*) to indicate a bulleted (unordered) list, and a pound sign (#) to represent a numbered (ordered) list.

Indentation (and multiple-indentation)

In the HTML source, indentation is accomplished with DL and DD tags. Indented blocks are prefixed with a colon (or multiple colons, for multiply-indented blocks) in the MediaWiki markup.

Converts SPAN to FONT

Where possible, SPAN tags are converted into their FONT equivalents. Some style properties present in the SPAN tag, including "font-family" and "color", are converted to FONT attributes. The "font-family" property is converted to a "face" attribute on the FONT tag, and the "color" property is converted to a "color" attribute.

The "class" and "id" SPAN attributes are copied to the FONT tag.

Headings (H1-H6)

Headings tags (H1-H6) are replaced with symmetrical sequences of equal signs, with one equal sign per heading level (e.g. H1 gets a single equal sign, H6 gets six of them).

Images (including thumbnails and their placement)

IMG tags are converted to the appropriate [[Image:...]] markup, and the context of the IMG tag is used to add attributes to the resulting MediaWiki image markup. For example, if the IMG tag is enclosed in a DIV that specifies "float:right" for the STYLE attribute, then the "right" keyword is appended to the list of attributes in the image markup (e.g. "[[Image:thing.png|right]]").

Additionally, thumbnail markup is generated if the IMG tag specifies a "width" attribute that differs from the actual width of the image as it's stored on the network.

Line breaks

HTML line breaks (BR tags) are converted to the XHTML-compatible "<br />".

KNOWN BUGS

 Nested tables are not handled properly (or at all, really)

 DIVs used to align images are not always properly recognized

 Whether to pull an image of the network should be a configurable option

COPYRIGHT

Copyright (c) 2004 David J. Iberri

This library is free software; you may redistribute it and/or modify it under the same terms as Perl itself.

AUTHOR

David J. Iberri <diberri@yahoo.com>

1 POD Error

The following errors were encountered while parsing the POD:

Around line 911:

You forgot a '=back' before '=head1'