NAME

Regexp::Common::microsyntax - a collection of regular expressions for use with microblogging-style text (tweets, dents, microposts, etc.)

VERSION

Version 0.02

SYNOPSIS

use Regexp::Common qw(microsyntax);

# Available patterns: user, hashtag, grouptag, slashtag

# Get all users/hashtags/groups/slashtags mentioned in $post
@users     = $post =~ m/$RE{microsyntax}{user}/og;
@hashtags  = $post =~ m/$RE{microsyntax}{hashtag}/og;
@groups    = $post =~ m/$RE{microsyntax}{grouptag}/og;
@slashtags = $post =~ m/$RE{microsyntax}{slashtag}/og;

# Capture/extract individual elements (see Regexp::Common '-keep')
my @usernames;
while ($post =~ m/$RE{microsyntax}{user}{-keep => 1 }/go) {
  push @usernames, $3;
}

# Substitute/markup individual elements
$post =~ s|$RE{microsyntax}{user}|<span class="user">$1</span>|go;

DESCRIPTION

Please consult the manual of Regexp::Common for a general description of the works of this interface.

Do not use this module directly, but load it via Regexp::Common.

This module provides regular expressions for matching microblogging-style text (tweets, dents, microposts, etc.). It is based on the ruby twitter-text Regex class, with extensions to support features that Twitter doesn't support (like status.net !group tags, microsyntax.org slashtags, etc.).

$RE{microsyntax}{user}

Returns a pattern that matches @username handles. For this pattern and the next three, using '-keep' (see Regexp::Common) allows access to the following individual components:

$1 captures the entire match
$2 captures the sigil used ('@' for usernames, '#' or '' for hashtags, etc.)
$3 captures the text after the sigil i.e. the bare username, hashtag, etc.

$RE{microsyntax}{hashtag}

Returns a pattern that matches #hashtags, with support for unicode hashtags. Note that all number hashtags are specifically excluded.

$RE{microsyntax}{grouptag}

Returns a pattern that matches identica/status.net !group tags.

$RE{microsyntax}{slashtag}

Returns a pattern that matches slashtags, as defined and documented at http://microsyntax.org/. These normally occur at the end of a post, with the first (but typically not the others) introduced by a slash e.g.

Sample post /via @person1 by @person2 cc @person3 @person4

The following slashtags are recognised:

by
cc, for, and tip
thx
hat tip, ht, and via

For this pattern, using '-keep' allows access to the following individual components:

$1 captures the entire match
$2 captures the verbatim slashtag (e.g. '/via', 'cc', 'by')
$3 captures the (potentially multiple) @user handles with this slashtag

AUTHOR

Gavin Carr <gavin@openfusion.com.au>

BUGS

Please report any bugs or feature requests to bug-regexp-common-microsyntax at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Regexp-Common-microsyntax.

ACKNOWLEDGEMENTS

The Ruby twitter-text-rb library, http://github.com/mzsanford/twitter-text-rb/.

SEE ALSO

Regexp::Common

LICENSE AND COPYRIGHT

Copyright 2011 Gavin Carr <gavin@openfusion.com.au>.

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.

See http://dev.perl.org/licenses/ for more information.