NAME
Algorithm::LDA
SYNOPSIS
use Algorithm::LDA;
my $lda = new Algorithm::LDA("Data", 5, 100, 100, 0, 10, 0.1, 10, "stoplist.txt");
DESCRIPTION
Algorithm::LDA is an implementation of Latent Dirichlet Allocation in Algorithm
add
description:
Used to add to array of documents ($self->documents)
input:
%args <- hash containing data
output:
1
example:
while (my $line = <$fh2>) {
my $obj = decode_json($line);
add(%$obj);
}
init
description:
Initializes alpha, initializes beta, loads documents, starts main loop
input:
None
output:
1
example:
init();
printResults
description:
Prints words in each topic, topics in each document, phi values,
and theta values to text files in the 'Results/$data' directory
input:
None
output:
None
example:
printResults();
load
description:
Loads documents from text files (in "data/$data") or JSON file (in "Documents")
input:
None
output:
None
example:
load();
wordsPerTopic
description:
Creates an array of words in each topic
input:
%args -> hash containing topic
output:
@words -> Array containing words and probabilities (phi value) for $args{topic}
example:
my $words_on_topic = wordsPerTopic(topic => $topic);
topicsPerDocument
description:
Creates an array of topics in each document
input:
%args -> hash containing document
output:
@topics -> Array containing topics and probabilities (theta value) for $args{document}
example:
my $topics_on_document= topicsPerDocument(document => $doc);
sample_topic
description:
Uses Gibbs Sampling to determine a topic given a document and word
input:
$document -> ID of document word is in
$word -> word that is to be evaluated
output:
$topic -> topic ID
$k -> last topic if topic can't be found
example:
my $topics_on_document= topicsPerDocument(document => $doc);
computePhi
description:
Computes the expected phi value for a word given a topic ID
input:
$topic -> ID of topic (iteration 0..$k)
$word -> word that is to be evaluated
output:
Phi value
example:
$dist += ($self->computePhi($topic, $word) * $self->computeTheta($document, $topic));
computeTheta
description:
Computes the expected theta value for a topic given a document ID
input:
$document -> ID of document
$topic -> ID of topic (iteration 0..$k)
output:
Theta value
example:
$dist += ($self->computePhi($topic, $word) * $self->computeTheta($document, $topic));
increaseMap
description:
Increases the values of all of the hashmaps
input:
$document -> ID of document
$topic -> ID of topic
$word -> word in document $document
output:
None
example:
$self->increaseMap($data->{document}, $data->{topic}, $data->{word});
decreaseMap
description:
Decreases the values of all of the hashmaps
input:
$document -> ID of document
$topic -> ID of topic
$word -> word in document $document
output:
None
example:
$self->decreaseMap($data->{document}, $data->{topic}, $data->{word});
valid
description:
Returns whether or not $data is a valid array (able to be added to the dataset)
input:
$data -> data to be evaluated
output:
Boolean/Integer -> true/1 - $data is an array | false/0 - $data is not an array;
example:
return unless (valid($args{data}));
removeSpecialChars
description:
Removes special characters from a word (non-ascii/non-letter characters)
input:
$word -> word to be cleaned
output:
$newWord -> $word without non-ascii/non-letter characters
example:
@ws = map { removeSpecialChars($_) } @ws;
beta
description:
Randomly initializes beta values
input:
None
output:
None
example:
beta();
stop
description:
Stopword subroutine. Generates a regex to remove words in a stopword list
input:
None
output:
$stop_regex -> regex containing stopwords
example:
my $stop = stop();
my $regex = qr/($stop)/;
REFERENCING
If you have a reference paper for this module put it here in bibtex form
CONTACT US
If you have any trouble installing and using <module name>
please contact us via :
Bridget T. McInnes: btmcinnes at vcu.edu
SEE ALSO
Additional modules associated with the package
AUTHORS
Nick Jordan, Virginia Commonwealth University
Bridget McInnes, Virginia Commonwealth University
COPYRIGHT AND LICENSE
Copyright 2016 by Bridget McInnes, Nicholas Jordan
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to
The Free Software Foundation, Inc.,
59 Temple Place - Suite 330,
Boston, MA 02111-1307, USA.