NAME

Regexp::Common::Markdown - Markdown Common Regular Expressions

SYNOPSIS

use Regexp::Common qw( Markdown );

while( <> )
{
    my $pos = pos( $_ );
    /\G$RE{Markdown}{Header}/gmc       and  print "Found a header at pos $pos\n";
    /\G$RE{Markdown}{Bold}/gmc       and  print "Found bold text at pos $pos\n";
}

VERSION

v0.1.1

DESCRIPTION

This module provides Markdown regular expressions as set out by its original author John Gruber

There are different types of patterns: vanilla and extended. To get the extended regular expressions, use the -extended switch.

You can use each regular expression by using their respective names: Bold, Blockquote, CodeBlock, CodeLine, CodeSpan, Em, HtmlOpen, HtmlClose, HtmlEmpty, Header, HeaderLine, Image, ImageRef, Line, Link, LinkAuto, LinkDefinition, LinkRef, List

Almost all of the regular expressions use named capture. See "%+" in perlvar for more information on named capture.

For example:

if( $text =~ /$RE{Markdown}{LinkAuto}/ )
{
    print( "Found https url \"$+{link_https}\"\n" ) if( $+{link_https} );
    print( "Found file url \"$+{link_file}\"\n" ) if( $+{link_file} );
    print( "Found ftp url \"$+{link_ftp}\"\n" ) if( $+{link_ftp} );
    print( "Found e-mail address \"$+{link_mailto}\"\n" ) if( $+{link_mailto} );
    print( "Found Found phone number \"$+{link_tel}\"\n" ) if( $+{link_tel} );
    my $url = URI->new( $+{link_https} );
}

As a general rule, Markdown rule requires that the text being parsed be de-tabbed, i.e. with its tabs converted into 4 spaces. Those regular expressions reflect this principle.

STANDARD MARKDOWN

$RE{Markdown}

This returns a pattern that recognises any of the supported vanilla Markdown formatting. If you pass the -extended parameter, some will be added and some of those regular expressions will be replaced by their extended ones, such as ExtAbbr, ExtCodeBlock, ExtLink, ExtAttributes

Blockquote

$RE{Markdown}{Blockquote}

For example:

> foo
>
> > bar
>
> foo

You can see example of this regular expression along with test units here: https://regex101.com/r/TdKq0K/1/tests

The capture names are:

You can see also Markdown::Parser::Blockquote

Bold

$RE{Markdown}{Bold}

For example:

**This is a text in bold.**

__And so is this.__

You can see example of this regular expression along with test units here: https://regex101.com/r/Jp2Kos/2/tests

The capture names are:

You can see also Markdown::Parser::Bold

Code Block

$RE{Markdown}{CodeBlock}

For example:

```
Some text

    Indented code block sample code
```

You can see example of this regular expression along with test units here: https://regex101.com/r/M6W99K/1/tests

The capture names are:

You can see also Markdown::Parser::Code

Code Line

$RE{Markdown}{CodeLine}

For example:

    the lines in this block  
    all contain trailing spaces  

You can see example of this regular expression along with test units here: https://regex101.com/r/toEboU/1/tests

The capture names are:

You can see also Markdown::Parser::Code

Code Span

$RE{Markdown}{CodeSpan}

For example:

This is some `inline code`

You can see example of this regular expression along with test units here: https://regex101.com/r/C2Vl9M/1/tests

The capture names are:

You can see also Markdown::Parser::Code

Emphasis

$RE{Markdown}{Em}

For example:

This routine parameter is _test_

You can see example of this regular expression along with test units here: https://regex101.com/r/eDb6RN/2/tests

You can see also Markdown::Parser::Emphasis

$RE{Markdown}{Header}

For example:

### This is a H3 Header

### And so is this one ###

You can see example of this regular expression along with test units here: https://regex101.com/r/9uQwBk/2/tests

The capture names are:

You can see also Markdown::Parser::Header

Header Line

$RE{Markdown}{HeaderLine}

For example:

This is an H1 header
====================

And this is a H2
-----------

You can see example of this regular expression along with test units here: https://regex101.com/r/sQLEqz/2/tests

The capture names are:

You can see also Markdown::Parser::Header

HTML

$RE{Markdown}{Html}

For example:

<div>
    foo
</div>

You can see example of this regular expression along with test units here: https://regex101.com/r/SH8ki3/1/tests

The capture names are:

You can see also Markdown::Parser::HTML

Image

$RE{Markdown}{Image}

For example:

![Alt text](/path/to/img.jpg)

or

![Alt text](/path/to/img.jpg "Optional title")

or, with reference:

![alt text][foo]

You can see example of this regular expression along with test units here: https://regex101.com/r/z0yH2F/4/tests

The capture names are:

You can see also Markdown::Parser::Image

Line

$RE{Markdown}{Line}

For example:

---

or

- - -

or

***

or

* * *

or

___

or

_ _ _

You can see example of this regular expression along with test units here: https://regex101.com/r/Vlew4X/2

The capture names are:

See also Markdown original author reference for horizontal line

You can see also Markdown::Parser::Line

Line Break

$RE{Markdown}{LineBreak}

For example:

Mignonne, allons voir si la rose  
Qui ce matin avait déclose  
Sa robe de pourpre au soleil,  
A point perdu cette vesprée,  
Les plis de sa robe pourprée,  
Et son teint au vôtre pareil.

To ensure arbitrary line breaks, each line ends with 2 spaces and 1 line break. This should become:

Mignonne, allons voir si la rose<br />
Qui ce matin avait déclose<br />
Sa robe de pourpre au soleil,<br />
A point perdu cette vesprée,<br />
Les plis de sa robe pourprée,<br />
Et son teint au vôtre pareil.

P.S.: If you're wondering, this is an extract from Ronsard.

You can see example of this regular expression along with test units here: https://regex101.com/r/6VG46H/1/

There is no capture name. This is basically used like this:

if( $text =~ /\G$RE{Markdown}{LineBreak}/ )
{
    print( "Found a line break\n" );
}

Or

$text =~ s/$RE{Markdown}{LineBreak}/<br \/>\n/gs;

You can see also Markdown::Parser::NewLine

The capture name is:

$RE{Markdown}{Link}

For example:

[Inline link](https://www.example.com "title")

or

[Inline link](/some/path "title")

or, without title

[Inline link](/some/path)

or with a reference id:

[reference link][refid]

[refid]: /path/to/something (Title)

or, using the link text as the id for the reference:

[My Example][]

[My Example]: https://example.com (Great Example)

You can see example of this regular expression along with test units here: https://regex101.com/r/sGsOIv/4/tests

The capture names are:

You can see also Markdown::Parser::Link

$RE{Markdown}{LinkAuto}

Supports, http, https, ftp, newsgroup, local file, e-mail address or phone numbers

For example:

<https://www.example.com>

would become:

<a href="https://www.example.com">https://www.example.com</a>

An e-mail such as:

<!#$%&'*+-/=?^_`.{|}~@example.com>

would become:

<a href="mailto:!#$%&'*+-/=?^_`.{|}~@example.com>!#$%&'*+-/=?^_`.{|}~@example.com</a>

Other possible and valid e-mail addresses:

<"abc@def"@example.com>

<jsmith@[192.0.2.1]>

A file link:

<file:///Volume/User/john/Document/form.rtf>

A newsgroup link:

<news:alt.fr.perl>

A ftp uri:

<ftp://ftp.example.com/plop/>

Phone numbers:

<+81-90-1234-5678>

<tel:+81-90-1234-5678>

You can see example of this regular expression along with test units here: https://regex101.com/r/bAUu1E/3/tests

The capture names are:

You can see also Markdown::Parser::Link

$RE{Markdown}{LinkDefinition}

For example:

[1]: /url/  "Title"

[refid]: /path/to/something (Title)

You can see example of this regular expression along with test units here: https://regex101.com/r/edg2F7/2/tests

The capture names are:

You can see also Markdown::Parser::LinkDefinition

$RE{Markdown}{LinkRef}

Example:

Foo [bar] [1].

Foo [bar][1].

Foo [bar]
[1].

[Foo][]

[1]: /url/  "Title"
[Foo]: https://www.example.com

You can see example of this regular expression along with test units here: https://regex101.com/r/QmyfnH/1/tests

The capture names are:

See also the reference on links by Markdown original author

You can see also Markdown::Parser::Link

List

$RE{Markdown}{List}

For example, an unordered list:

*   asterisk 1

*   asterisk 2

*   asterisk 3

or, an ordered list:

1. One item

1. Second item

1. Third item

You can see example of this regular expression along with test units here: https://regex101.com/r/RfhRVg/4

The capture names are:

You can see also Markdown::Parser::List

List First Level

$RE{Markdown}{ListFirstLevel}

This regular expression is used for top level list, as opposed to the nth level pattern that is used for sub list. Both will match lists within list, but the processing under markdown is different whether the list is a top level one or an sub one.

You can see also Markdown::Parser::List

List Nth Level

$RE{Markdown}{ListNthLevel}

Regular expression to process list within list.

You can see also Markdown::Parser::List

List Item

$RE{Markdown}{ListItem}

You can see example of this regular expression along with test units here: https://regex101.com/r/bulBCP/1/tests

The capture names are:

You can see also Markdown::Parser::ListItem

Paragraph

$RE{Markdown}{Paragraph}

For example:

The quick brown fox
jumps over the lazy dog

Lorem Ipsum

> Why am I matching?
1. Nonononono!
* Aaaagh!
# Stahhhp!

This regular expression would capture the whole block up until "Lorem Ipsum", but will be careful not to catch other markdown element after that. Thus, anything after "Lorem Ipsum" would not be caught because this is a blockquote.

You can see example of this regular expression along with test units here: https://regex101.com/r/0B3gR4/2/

The capture names are:

You can see also Markdown::Parser::Paragraph

EXTENDED MARKDOWN

Abbreviation

$RE{Markdown}{ExtAbbr}

For example:

Some discussion about HTML, SGML and HTML4.

*[HTML4]: Hyper Text Markup Language version 4
*[HTML]: Hyper Text Markup Language
*[SGML]: Standard Generalized Markup Language

You can see example of this regular expression along with test units here: https://regex101.com/r/ztM2Pw/2/tests

The capture names are:

You can see also Markdown::Parser::Abbr

Attributes

$RE{Markdown}{ExtAttributes}

For example, an header with attribute .cl.class#id7

### Header  {.cl.class#id7 }

Code Block

$RE{Markdown}{ExtCodeBlock}

This is the same as conventional blocks with backticks, except the extended version uses tilde characters.

For example:

~~~
<div>
~~~

You can see example of this regular expression along with test units here: https://regex101.com/r/Y9lPAz/1/tests

The capture names are:

You can see also Markdown::Parser::Code

Footnotes

$RE{Markdown}{ExtFootnote}

This looks like this:

[^1]: Content for fifth footnote.
[^2]: Content for sixth footnote spaning on 
    three lines, with some span-level markup like
    _emphasis_, a [link][].

A reference to those footnotes could be:

Some paragraph with a footnote[^1], and another[^2].

The footnote_id reference can be anything as long as it is unique.

You can see also Markdown::Parser::Footnote

Inline Footnotes

For consistency with links, footnotes can be added inline, like this:

I met Jack [^jack](Co-founder of Angels, Inc) at the meet-up.

Inline notes will work even without the identifier. For example:

I met Jack [^](Co-founder of Angels, Inc) at the meet-up.

However, in compliance with pandoc footnotes style, inline footnotes can also be added like this:

Here is an inline note.^[Inlines notes are easier to write, since
you don't have to pick an identifier and move down to type the
note.]

You can see example of this regular expression along with test units here: https://regex101.com/r/WuB1FR/2/

The capture names are:

You can see also Markdown::Parser::Footnote

Footnote Reference

$RE{Markdown}{ExtFootnoteReference}

This regular expression matches 3 types of footnote references:

You can see example of this regular expression along with test units here: https://regex101.com/r/3eO7rJ/1/

The capture names are:

You can see also Markdown::Parser::FootnoteReference

Header

$RE{Markdown}{ExtHeader}

This extends regular header with attributes.

For example:

### Header  {.cl.class#id7 }

You can see example of this regular expression along with test units here: https://regex101.com/r/GyzbR2/1

The capture names are:

You can see also Markdown::Parser::Header

Header Line

$RE{Markdown}{ExtHeaderLine}

Same as header line, but with attributes.

For example:

Header  {#id5.cl.class}
======

You can see example of this regular expression along with test units here: https://regex101.com/r/berfAR/2/tests

The capture names are:

You can see also Markdown::Parser::Header

Image

$RE{Markdown}{ExtImage}

Same as regular image, but with attributes.

For example:

This is an ![inline image](/img "title"){.class #inline-img}.

You can see example of this regular expression along with test units here: https://regex101.com/r/xetHV1/2

The capture names are:

You can see also Markdown::Parser::Image

$RE{Markdown}{ExtLink}

Same as regular links, but with attributes.

For example:

This is an [inline link](/url "title"){.class #inline-link}.

You can see example of this regular expression along with test units here: https://regex101.com/r/7mLssJ/2

The capture names are:

You can see also Markdown::Parser::Link

$RE{Markdown}{ExtLinkDefinition}

Same as regular link definition, but with attributes

For example:

[refid]: /path/to/something (Title) { .class #ref data-key=val }

You can see example of this regular expression along with test units here: https://regex101.com/r/hVfXCe/2/

The capture names are:

You can see also Markdown::Parser::LinkDefinition

Table

$RE{Markdown}{ExtTable}

For example:

You can see example of this regular expression along with test units here: https://regex101.com/r/01XCqB/9/tests

The capture names are:

Table format is taken from David E. Wheeler RFC

You can see also Markdown::Parser::Table

SEE ALSO

Regexp::Common for a general description of how to use this interface.

Markdown::Parser for a Markdown parser using this module.

CHANGES & CONTRIBUTIONS

Feel free to reach out to the author for possible corrections, improvements, or suggestions.

AUTHOR

Jacques Deguest <jack@deguest.jp>

CREDITS

Credits to Michel Fortin and John Gruber for their test units.

Credits to Firas Dib for his online regular expression test tool.

COPYRIGHT & LICENSE

Copyright (c) 2020 DEGUEST Pte. Ltd.

You can use, copy, modify and redistribute this package and associated files under the same terms as Perl itself.