NAME

MKDoc::Text::Structured - Another Text to HTML module

SYNOPSIS

my $text = some_structured_text();
my $html = MKDoc::Text::Structured::process ($text);

SUMMARY

MKDoc::Text::Structured is a library which allows simple syntaxic text construct to be turned into HTML. These constructs are the ones you would be using when writing a text email or newsgroup message.

MKDoc::Text::Structured follows the KISS philosophy. Comparing with similar modules which try to implement as many HTML constructs as possible, this module is incredibly conservative.

Block level elements

P

Paragraphs are defined by blocks of text separated by one or more empty lines.

The text:

This is a paragraph,
until it meets an empty line.

This is another paragraph.

Would become:

<p>This is a paragraph,
until it meets an empty line.</p>
<p>This is another paragraph.</p>

H1, H2, H3

Headlines are really just like a paragraph, except that they have the following syntax:

The text:

==========
Headline 1
==========

Headline 2
==========

Headline 3
----------

Would become:

<h1>Headline 1</h1>
<h2>Headline 2</h2>
<h3>Headline 3</h3>

The advantage in treating headlines just like paragraph is that multi-line headlines are no problem. Also, it means you can use *strong* and _emphasized_ within a headline (see STRONG and EM sections).

PRE

Pre-formatted text looks like a paragraph, except that it must be indented with at least one space character.

The text:

This is a paragraph,
until it meets an empty line.

  But this is pre-formatted text.
  Hey  Hey Ho  Ho!

This is another paragraph.

Would become:

<p>This is a paragraph,
until it meets an empty line.</p>
<pre>But this is pre-formatted text.
Hey  Hey Ho  Ho!</pre>
<p>This is another paragraph.</p>

Again, you can use *strong* and _emphasized_ within pre-formatted text (see STRONG and EM sections).

Inline Elements

STRONG

The text:

This is *strong text*

Would become:

<p>This is <strong>strong text</strong></p>

Note 1: The star character will act as a 'strong' marker only when:

- The "opening" star is preceded by whitespace or carriage return,

- The "closing" star is followed by whitespace or carriage return, or punctuation immediately followed by whitespace or carriage return.

In other words, you can write 3*3*2 = 18 safely. The module tries to follow the DWIM ("Do What I Mean") philosophy as much as possible.

Note 2: This can only work within one block level element. It will not work across paragraphs or lists (See UL, LI and OL, LI sections).

Example 1:

* Hello, *I will not
* be bold*
* but
* *I will be*

Example 2:

This is a paragraph. *Nothing in this paragraph
is going to be bold.

Nor in this one*.

EM

The text:

This is _emphasized text_

Would become:

<p>This is <em>emphasized text</em></p>

Same notes as for bold / strong text also applied for emphasized text.

Entity substitution

Characters that would otherwise be interpreted as XML are encoded. i.e. &, < and > become &amp; &lt; and &gt;

Additionally some standard typed versions of special characters are substituted with a richer and better-looking HTML entity:

--   surrounded by whitespace becomes &mdash;
-    surrounded by whitespace becomes &ndash;
...  becomes &hellip;
(tm) becomes &trade;
(r)  becomes &reg;
(c)  becomes &copy;
x    between numbers becomes &times;
''   surrounding text becomes &lsquo; &rsquo;
""   surrounding text becomes &ldquo; &rdquo;

Nested Structures

BLOCKQUOTE

Quoted text is text that starts with a 'greater than' character and followed by a space on each line.

> > Hey, that's pretty cool!

> Well, sort-of

I think it's pretty cool...

Would become:

<blockquote><blockquote><p>Hey, that's pretty cool!</p></blockquote>
<p>Well, sort-of</p></blockquote>
<p>I think it's pretty cool...</p>

UL, LI

Ordered lists and unordered lists can be constructed and nested:

The text:

* An item
* Another item

* Headlines work too
  ==================

  I can write *paragraphs within lists*.

    And even _pre-formatted text_!

  - Also, I can have sub-lists
  - That's no problem
  - Notice that '*' and '-' have the same meaning.
    It's just syntaxic sugar, really :-)

Would become:

<ul><li><p>An item</p></li>
<li><p>Another item</p></li>
<li><h2>Headlines work too</h2>
<p>I can write <strong>paragraphs within lists</strong>.</p>
<pre>And even <em>pre-formatted text</em>!</pre>
<ul><li><p>Also, I can have sub-lists</p></li>
<li><p>That's no problem</p></li>
<li><p>Notice that '*' and '-' have the same meaning.
It's just syntaxic sugar, really :-)</p></li></ul></li></ul>

OL, LI

Un-ordered lists and unordered lists can be constructed and nested:

The text:

1. An item
2. Another item

3. Headlines work too
   ==================

   * An un-ordered list
   * Can be nested
   * It should all work nicely.

Would become:

<ol><li><p>An item</p></li>
<li><p>Another item</p></li>
<li><h2>Headlines work too</h2>
<ul><li><p>An un-ordered list</p></li>
<li><p>Can be nested</p></li>
<li><p>It should all work nicely.</p></li></ul></li></ol>

Hyperlinks

This module uses URI::Find to locate URIs such as http://mkdoc.com/ and turn them into clickable links.

Additionally, once the XHTML fragment is produced, you could use MKDoc::XML::Tagger to hyperlink it against a glossary of hyperlinks.

AUTHOR

Copyright 2003 - MKDoc Holdings Ltd.

Author: Jean-Michel Hiver

This module is free software and is distributed under the same license as Perl itself. Use it at your own risk.

SEE ALSO

MKDoc: http://www.mkdoc.com/

Help us open-source MKDoc. Join the mkdoc-modules mailing list:

mkdoc-modules@lists.webarch.co.uk