NAME
Algorithm::LDA
SYNOPSIS
use Algorithm::LDA;
my $lda = new Algorithm::LDA("Data", 5, 100, 100, 0, 10, 0.1, 10, "stoplist.txt");
DESCRIPTION
Algorithm::LDA is an implementation of Latent Dirichlet Allocation in Algorithm
add
description:
Used to add to array of documents ($self->documents)
input:
%args <- hash containing data
output:
1
example:
while (my $line = <$fh2>) {
my $obj = decode_json($line);
add(%$obj);
}
init
description:
Initializes alpha, initializes beta, loads documents, starts main loop
input:
None
output:
1
example:
init();
printResults
description:
Prints words in each topic, topics in each document, phi values,
and theta values to text files in the 'Results/$data' directory
input:
None
output:
None
example:
printResults();
load
description:
Loads documents from text files (in "data/$data") or JSON file (in "Documents")
input:
None
output:
None
example:
load();
wordsPerTopic
description:
Creates an array of words in each topic
input:
%args -> hash containing topic
output:
@words -> Array containing words and probabilities (phi value) for $args{topic}
example:
my $words_on_topic = wordsPerTopic(topic => $topic);
topicsPerDocument
description:
Creates an array of topics in each document
input:
%args -> hash containing document
output:
@topics -> Array containing topics and probabilities (theta value) for $args{document}
example:
my $topics_on_document= topicsPerDocument(document => $doc);
sample_topic
description:
Uses Gibbs Sampling to determine a topic given a document and word
input:
$document -> ID of document word is in
$word -> word that is to be evaluated
output:
$topic -> topic ID
$k -> last topic if topic can't be found
example:
my $topics_on_document= topicsPerDocument(document => $doc);
computePhi
description:
Computes the expected phi value for a word given a topic ID
input:
$topic -> ID of topic (iteration 0..$k)
$word -> word that is to be evaluated
output:
Phi value
example:
$dist += ($self->computePhi($topic, $word) * $self->computeTheta($document, $topic));
computeTheta
description:
Computes the expected theta value for a topic given a document ID
input:
$document -> ID of document
$topic -> ID of topic (iteration 0..$k)
output:
Theta value
example:
$dist += ($self->computePhi($topic, $word) * $self->computeTheta($document, $topic));
increaseMap
description:
Increases the values of all of the hashmaps
input:
$document -> ID of document
$topic -> ID of topic
$word -> word in document $document
output:
None
example:
$self->increaseMap($data->{document}, $data->{topic}, $data->{word});
decreaseMap
description:
Decreases the values of all of the hashmaps
input:
$document -> ID of document
$topic -> ID of topic
$word -> word in document $document
output:
None
example:
$self->decreaseMap($data->{document}, $data->{topic}, $data->{word});
valid
description:
Returns whether or not $data is a valid array (able to be added to the dataset)
input:
$data -> data to be evaluated
output:
Boolean/Integer -> true/1 - $data is an array | false/0 - $data is not an array;
example:
return unless (valid($args{data}));
removeSpecialChars
description:
Removes special characters from a word (non-ascii/non-letter characters)
input:
$word -> word to be cleaned
output:
$newWord -> $word without non-ascii/non-letter characters
example:
@ws = map { removeSpecialChars($_) } @ws;
beta
description:
Randomly initializes beta values
input:
None
output:
None
example:
beta();
stop
description:
Stopword subroutine. Generates a regex to remove words in a stopword list
input:
None
output:
$stop_regex -> regex containing stopwords
example:
my $stop = stop();
my $regex = qr/($stop)/;
REFERENCING
If you have a reference paper for this module put it here in bibtex form
CONTACT US
If you have any trouble installing and using <module name>
please contact us via :
Bridget T. McInnes: btmcinnes at vcu.edu
SEE ALSO
Additional modules associated with the package
AUTHORS
Bridget McInnes <btmcinnes at vcu.edu>
COPYRIGHT AND LICENSE
Copyright 2016 by Bridget McInnes, Nicholas Jordan
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to
The Free Software Foundation, Inc.,
59 Temple Place - Suite 330,
Boston, MA 02111-1307, USA.