NAME

Twitter::Text - Perl implementation of the twitter-text parsing library

SYNOPSIS

use Twitter::Text;

$result = parse_tweet('Hello world こんにちは世界');
print $result->{valid} ? 'valid tweet' : 'invalid tweet';

DESCRIPTION

Twitter::Text is a Perl implementation of the twitter-text parsing library.

WARNING

This library does not implement auto-linking and hit highlighting.

Please refer Implementation status for latest status.

FUNCTIONS

All functions below are exported by default.

Extraction

extract_hashtags

my \@hashtags = extract_hashtags($text);

extract_hashtags_with_indices

my \@hashtags_with_indices = extract_hashtags_with_indices($text, [\%options]);

extract_mentioned_screen_names

my \@screen_names = extract_mentioned_screen_names($text);

extract_mentioned_screen_names_with_indices

my \@screen_names_with_indices = extract_mentioned_screen_names_with_indices($text);

extract_mentions_or_lists_with_indices

my \@mentions_or_lists_with_indices = extract_mentions_or_lists_with_indices($text);

extract_urls

my \@urls = extract_urls($text);

extract_urls_with_indices

my \@urls = extract_urls_with_indices($text, [\%options]);

Validation

parse_tweet

my \%parse_result = parse_tweet($text, [\%options]);

The parse_tweet function takes a $text string and optional \%options parameter and returns a hash reference with following values:

weighted_length

The overall length of the tweet with code points weighted per the ranges defined in the configuration file.

permillage

Indicates the proportion (per thousand) of the weighted length in comparison to the max weighted length. A value > 1000 indicates input text that is longer than the allowable maximum.

valid

Indicates if input text length corresponds to a valid result.

display_range_start, display_range_end

An array of two unicode code point indices identifying the inclusive start and exclusive end of the displayable content of the Tweet.

valid_range_start, valid_range_end

An array of two unicode code point indices identifying the inclusive start and exclusive end of the valid content of the Tweet.

EXAMPLES

use Data::Dumper;
use Twitter::Text;

$result = parse_tweet('Hello world こんにちは世界');
print Dumper($result);
# $VAR1 = {
#       'weighted_length' => 33
#       'permillage' => 117,
#       'valid' => 1,
#       'display_range_start' => 0,
#       'display_range_end' => 32,
#       'valid_range_start' => 0,
#       'valid_range_end' => 32,
#     };

is_valid_hashtag

my $valid = is_valid_hashtag($hashtag);

is_valid_list

my $valid = is_valid_list($username_list);

is_valid_url

my $valid = is_valid_url($url, [unicode_domains => 1, require_protocol => 1]);

is_valid_username

my $valid = is_valid_username($username);

SEE ALSO

twitter-text. Implementation of Twitter::Text (this library) is heavily based on Ruby implementation of twitter-text.

https://developer.twitter.com/en/docs/counting-characters

COPYRIGHT & LICENSE

Copyright (C) Twitter, Inc and other contributors

Copyright (C) utgwkk.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

AUTHOR

utgwkk <utagawakiki@gmail.com>