NAME
Regexp::Common::microsyntax - a collection of regular expressions for use with microblogging-style text (tweets, dents, microposts, etc.)
VERSION
Version 0.02
SYNOPSIS
use Regexp::Common qw(microsyntax);
# Available patterns: user, hashtag, grouptag, slashtag
# Get all users/hashtags/groups/slashtags mentioned in $post
@users = $post =~ m/$RE{microsyntax}{user}/og;
@hashtags = $post =~ m/$RE{microsyntax}{hashtag}/og;
@groups = $post =~ m/$RE{microsyntax}{grouptag}/og;
@slashtags = $post =~ m/$RE{microsyntax}{slashtag}/og;
# Capture/extract individual elements (see Regexp::Common '-keep')
my @usernames;
while ($post =~ m/$RE{microsyntax}{user}{-keep => 1 }/go) {
push @usernames, $3;
}
# Substitute/markup individual elements
$post =~ s|$RE{microsyntax}{user}|<span class="user">$1</span>|go;
DESCRIPTION
Please consult the manual of Regexp::Common for a general description of the works of this interface.
Do not use this module directly, but load it via Regexp::Common.
This module provides regular expressions for matching microblogging-style text (tweets, dents, microposts, etc.). It is based on the ruby twitter-text Regex class, with extensions to support features that Twitter doesn't support (like status.net !group tags, microsyntax.org slashtags, etc.).
$RE{microsyntax}{user}
Returns a pattern that matches @username handles. For this pattern and the next three, using '-keep' (see Regexp::Common) allows access to the following individual components:
- $1 captures the entire match
- $3 captures the text after the sigil i.e. the bare username, hashtag, etc.
$RE{microsyntax}{hashtag}
Returns a pattern that matches #hashtags, with support for unicode hashtags. Note that all number hashtags are specifically excluded.
$RE{microsyntax}{grouptag}
Returns a pattern that matches identica/status.net !group tags.
$RE{microsyntax}{slashtag}
Returns a pattern that matches slashtags, as defined and documented at http://microsyntax.org/. These normally occur at the end of a post, with the first (but typically not the others) introduced by a slash e.g.
Sample post /via @person1 by @person2 cc @person3 @person4
The following slashtags are recognised:
For this pattern, using '-keep' allows access to the following individual components:
- $1 captures the entire match
- $2 captures the verbatim slashtag (e.g. '/via', 'cc', 'by')
- $3 captures the (potentially multiple) @user handles with this slashtag
AUTHOR
Gavin Carr <gavin@openfusion.com.au>
BUGS
Please report any bugs or feature requests to bug-regexp-common-microsyntax at rt.cpan.org
, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Regexp-Common-microsyntax.
ACKNOWLEDGEMENTS
The Ruby twitter-text-rb
library, http://github.com/mzsanford/twitter-text-rb/.
SEE ALSO
LICENSE AND COPYRIGHT
Copyright 2011 Gavin Carr <gavin@openfusion.com.au>.
This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.
See http://dev.perl.org/licenses/ for more information.