NAME

Algorithm::LDA

SYNOPSIS

use Algorithm::LDA;

my $lda = new Algorithm::LDA("Data", 5, 100, 100, 0, 10, 0.1, 10, "stoplist.txt");

DESCRIPTION

Algorithm::LDA is an implementation of Latent Dirichlet Allocation in Algorithm

add

description:

Used to add to array of documents ($self->documents)

input:

%args <- hash containing data

output:

1

example:

while (my $line = <$fh2>) {
   my $obj = decode_json($line);
   add(%$obj);
}

init

description:

Initializes alpha, initializes beta, loads documents, starts main loop

input:

None

output:

1

example:

init();

printResults

description:

Prints words in each topic, topics in each document, phi values, 
and theta values to text files in the 'Results/$data' directory

input:

None

output:

None

example:

printResults();

load

description:

Loads documents from text files (in "data/$data") or JSON file (in "Documents")

input:

None

output:

None

example:

load();

wordsPerTopic

 description:
   
Creates an array of words in each topic

input:

%args -> hash containing topic

output:

@words -> Array containing words and probabilities (phi value) for $args{topic}

example:

my $words_on_topic = wordsPerTopic(topic => $topic);

topicsPerDocument

description:

Creates an array of topics in each document

input:

%args -> hash containing document

output:

@topics -> Array containing topics and probabilities (theta value) for $args{document}

example:

my $topics_on_document= topicsPerDocument(document => $doc);

sample_topic

description:

Uses Gibbs Sampling to determine a topic given a document and word

input:

$document -> ID of document word is in
$word -> word that is to be evaluated

output:

$topic -> topic ID
$k -> last topic if topic can't be found

example:

my $topics_on_document= topicsPerDocument(document => $doc);

computePhi

description:

Computes the expected phi value for a word given a topic ID

input:

$topic -> ID of topic (iteration 0..$k)
$word -> word that is to be evaluated

output:

Phi value

example:

$dist += ($self->computePhi($topic, $word) * $self->computeTheta($document, $topic));

computeTheta

description:

Computes the expected theta value for a topic given a document ID

input:

$document -> ID of document
$topic -> ID of topic (iteration 0..$k)

output:

Theta value

example:

$dist += ($self->computePhi($topic, $word) * $self->computeTheta($document, $topic));

increaseMap

description:

Increases the values of all of the hashmaps

input:

$document -> ID of document
$topic -> ID of topic
$word -> word in document $document

output:

None

example:

$self->increaseMap($data->{document}, $data->{topic}, $data->{word});

decreaseMap

description:

Decreases the values of all of the hashmaps

input:

$document -> ID of document
$topic -> ID of topic
$word -> word in document $document

output:

None

example:

$self->decreaseMap($data->{document}, $data->{topic}, $data->{word});

valid

description:

Returns whether or not $data is a valid array (able to be added to the dataset)

input:

$data -> data to be evaluated

output:

Boolean/Integer -> true/1 - $data is an array | false/0 - $data is not an array;

example:

return unless (valid($args{data}));

removeSpecialChars

description:

Removes special characters from a word (non-ascii/non-letter characters)

input:

$word -> word to be cleaned

output:

$newWord -> $word without non-ascii/non-letter characters

example:

@ws = map { removeSpecialChars($_) } @ws;

beta

description:

Randomly initializes beta values

input:

None

output:

None

example:

beta();

stop

description:

Stopword subroutine.  Generates a regex to remove words in a stopword list

input:

None

output:

$stop_regex -> regex containing stopwords

example:

my $stop = stop();
my $regex = qr/($stop)/;

REFERENCING

If you have a reference paper for this module put it here in bibtex form

CONTACT US

If you have any trouble installing and using <module name> 
please contact us via :

    Bridget T. McInnes: btmcinnes at vcu.edu

SEE ALSO

Additional modules associated with the package

AUTHORS

Bridget McInnes <btmcinnes at vcu.edu>

COPYRIGHT AND LICENSE

Copyright 2016 by Bridget McInnes, Nicholas Jordan

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to

The Free Software Foundation, Inc.,
59 Temple Place - Suite 330,
Boston, MA  02111-1307, USA.