NAME

Finance::CompanyNames - Functions for finding company names in English free text

SYNOPSIS

use Finance::CompanyNames;

my $corps = {
    MSFT => 'Microsoft'
  , INTC => 'Intel'
  , etc...
};

Finance::CompanyNames::Init($corps)
$hashref = Finance::CompanyNames::Match($freetext);

DESCRIPTION

Finance::CompanyNames finds company names in English text. The user provides a list of company names they wish to find, and the body of text to search. The module then uses natural language processing techniques to find those names or their variants in the text. For example, if a company is alternately referred to as "XYZ", "XYZ Corp.", "XYZ Corporation", and "The XYZ Corporation", Finance::CompanyNames will recognize all variants.

INTERFACE

Initialization

It is necessary to call Finance::CompanyNames::Init() before anything else. The argument to this function is a reference to a hash. The canonical use is to use stock tickers as the keys and company names as values. However, you are free to use anything for the keys.

Searching

Finance::CompanyNames::Match searches a body of text for company names. The only argument is a scalar containing the text. The return value is a reference to a hash of references to hashes. The keys are the stock ticker symbols of company names found in the text, or other keys you may have used in Init(). The values are hashes with keys "freq" and "contexts". "freq" is the number of times the company was seen in the text, and "contexts" is a reference to an array storing the bit of text mentioning the company.

For example:

$rv = { INTC => { freq => 10 , contexts => [ "blah blah blah blah blah Intel blah blah blah blah" , "blah Intel Corp. blah blah blah blah blah blah" ] } };

NOTE

Please note that Finance::CompanyNames allocates a massive amount of memory. It loads a complete English wordlist as well as a list of English root words and their affixes. This requires approximately 20MB of memory on the author's computer. It is possible for a future version to behave differently. Please mail the author if you have an improvement.

Also please note this module only works with English text, due to the included word and stem lists.

AUTHORS

Finance::CompanyNames is a product of Gilder, Gagnon, Howe, & Co. LLC. Mail GGHC Skunkworks <cpan@gghcwest.com> regarding this software.

LICENSE

Finance::CompanyNames is distributed under the Artistic License, the same terms under which Perl itself is distributed.