NAME

Word2vec::Interface - Interface module for word2vec.pm, word2phrase.pm, interface.pm modules and associated utilities.

SYNOPSIS

use Word2vec::Interface;

my $result = 0;

# Compile a text corpus, execute word2vec training and compute cosine similarity of two words
my $w2vinterface = Word2vec::Interface->new();

my $xmlconv = $w2vinterface->GetXMLToW2VHandler();
$xmlconv->SetWorkingDir( "Medline/XML/Directory/Here" );
$xmlconv->SetSavePath( "textcorpus.txt" );
$xmlconv->SetStoreTitle( 1 );
$xmlconv->SetStoreAbstract( 1 );
$xmlconv->SetBeginDate( "01/01/2004" );
$xmlconv->SetEndDate( "08/13/2016" );
$xmlconv->SetOverwriteExistingFile( 1 );

# If Compound Word File Exists, Store It In Memory
# And Create Compound Word Binary Search Tree Using The Compound Word Data
$xmlconv->ReadCompoundWordDataFromFile( "compoundword.txt" );
$xmlconv->CreateCompoundWordBST();

# Parse XML Files or Directory Of Files
$result = $xmlconv->ConvertMedlineXMLToW2V( "/xmlDirectory/" );

# Check(s)
print( "Error Parsing Medline XML Files\n" ) if ( $result == -1 );
exit if ( $result == -1 );

# Setup And Execute word2vec Training
my $word2vec = $w2vinterface->GetWord2VecHandler();
$word2vec->SetTrainFilePath( "textcorpus.txt" );
$word2vec->SetOutputFilePath( "vectors.bin" );
$word2vec->SetWordVecSize( 200 );
$word2vec->SetWindowSize( 8 );
$word2vec->SetSample( 0.0001 );
$word2vec->SetNegative( 25 );
$word2vec->SetHSoftMax( 0 );
$word2vec->SetBinaryOutput( 0 );
$word2vec->SetNumOfThreads( 20 );
$word2vec->SetNumOfIterations( 12 );
$word2vec->SetUseCBOW( 1 );
$word2vec->SetOverwriteOldFile( 0 );

# Execute word2vec Training
$result = $word2vec->ExecuteTraining();

# Check(s)
print( "Error Training Word2vec On File: \"textcorpus.txt\"" ) if ( $result == -1 );
exit if ( $result == -1 );

# Read word2vec Training Data Into Memory And Store As A Binary Search Tree
$result = $word2vec->ReadTrainedVectorDataFromFile( "vectors.bin" );

# Check(s)
print( "Error Unable To Read Word2vec Trained Vector Data From File\n" ) if ( $result == -1 );
exit if ( $result == -1 );

# Compute Cosine Similarity Between "respiratory" and "arrest"
$result = $word2vec->ComputeCosineSimilarity( "respiratory", "arrest" );
print( "Cosine Similarity Between \"respiratory\" and \"arrest\": $result\n" ) if defined( $result );
print( "Error Computing Cosine Similarity\n" ) if !defined( $result );

# Compute Cosine Similarity Between "respiratory arrest" and "heart attack"
$result = $word2vec->ComputeMultiWordCosineSimilarity( "respiratory arrest", "heart attack" );
print( "Cosine Similarity Between \"respiratory arrest\" and \"heart attack\": $result\n" ) if defined( $result );
print( "Error Computing Cosine Similarity\n" ) if !defined( $result );

undef( $w2vinterface );

# or

use Word2vec::Interface;

my $result = 0;
my $w2vinterface = Word2vec::Interface->new();
$w2vinterface->XTWSetWorkingDir( "Medline/XML/Directory/Here" );
$w2vinterface->XTWSetSavePath( "textcorpus.txt" );
$w2vinterface->XTWSetStoreTitle( 1 );
$w2vinterface->XTWSetStoreAbstract( 1 );
$w2vinterface->XTWSetBeginDate( "01/01/2004" );
$w2vinterface->XTWSetEndDate( "08/13/2016" );
$w2vinterface->XTWSetOverwriteExistingFile( 1 );

# If Compound Word File Exists, Store It In Memory
# And Create Compound Word Binary Search Tree Using The Compound Word Data
$w2vinterface->XTWReadCompoundWordDataFromFile( "compoundword.txt" );
$w2vinterface->XTWCreateCompoundWordBST();

# Parse XML Files or Directory Of Files
$result = $w2vinterface->XTWConvertMedlineXMLToW2V( "/xmlDirectory/" );

$result = $w2vinterface->W2VExecuteTraining( "textcorpus.txt", "vectors.bin", 200, 8, undef, 0.001, 25,
                                             undef, 0, 0, 20, 15, 1, 0, undef, undef, undef, 1 );

# Read word2vec Training Data Into Memory And Store As A Binary Search Tree
$result = $w2vinterface->W2VReadTrainedVectorDataFromFile( "vectors.bin" );

# Check(s)
print( "Error Unable To Read Word2vec Trained Vector Data From File\n" ) if ( $result == -1 );
exit if ( $result == -1 );

# Compute Cosine Similarity Between "respiratory" and "arrest"
$result = $w2vinterface->W2VComputeCosineSimilarity( "respiratory", "arrest" );
print( "Cosine Similarity Between \"respiratory\" and \"arrest\": $result\n" ) if defined( $result );
print( "Error Computing Cosine Similarity\n" ) if !defined( $result );

# Compute Cosine Similarity Between "respiratory arrest" and "heart attack"
$result = $w2vinterface->W2VComputeMultiWordCosineSimilarity( "respiratory arrest", "heart attack" );
print( "Cosine Similarity Between \"respiratory arrest\" and \"heart attack\": $result\n" ) if defined( $result );
print( "Error Computing Cosine Similarity\n" ) if !defined( $result );

undef( $w2vinterface );

DESCRIPTION

Word2vec::Interface is an interface module for utilization of word2vec, word2phrase, xmltow2v and their associated functions.
This program houses a set of functions, modules and utilities for use with UMLS Similarity.

XmlToW2v Features:
 - Compilation of a text corpus from plain or gun-zipped Medline XML files.
 - Multi-threaded text corpus compilation support.
 - Include text corpus articles via date range.
 - Include text corpus articles via title, abstract or both.
 - Compoundifying on-the-fly while building text corpus given a compound word file.

Word2vec Features:
 - Word2vec training with user specified settings.
 - Manipulation of Word2vec word vectors. (Addition/Subtraction/Average)
 - Word2vec binary format to plain text file conversion.
 - Word2vec plain text to binary format file conversion.
 - Multi-word cosine similarity computation. (Sudo-compound word cosine similarity).

Word2phrase Features:
 - Word2phrase training with user specified settings.

Interface Features:
 - Word Sense Disambiguation via trained word2vec data.

Interface Main Functions

new

Description:

Returns a new "Word2vec::Interface" module object.

Note: Specifying no parameters implies default options.

Default Parameters:
   word2vecDir                 = "../../External/word2vec"
   debugLog                    = 0
   writeLog                    = 0
   ignoreCompileErrors         = 0
   ignoreFileChecks            = 0
   exitFlag                    = 0
   workingDir                  = ""
   word2vec                    = Word2vec::Word2vec->new()
   word2phrase                 = Word2vec::Word2phrase->new()
   xmltow2v                    = Word2vec::Xmltow2v->new()
   util                        = Word2vec::Interface()
   instanceAry                 = ()
   senseAry                    = ()
   instanceCount               = 0
   senseCount                  = 0

Input:

$word2vecDir                 -> Specifies word2vec package source/executable directory.
$debugLog                    -> Instructs module to print debug statements to the console. ('1' = True / '0' = False)
$writeLog                    -> Instructs module to print debug statements to a log file. ('1' = True / '0' = False)
$ignoreCompileErrors         -> Instructs module to ignore source code compilation errors. ('1' = True / '0' = False)
$ignoreFileChecks            -> Instructs module to ignore file checks. ('1' = True / '0' = False)
$exitFlag                    -> In the event of a run-time check error, exitFlag is set to '1' which gracefully terminates the script.
$workingDir                  -> Specifies the current working directory.
$word2vec                    -> Word2vec::Word2vec object.
$word2phrase                 -> Word2vec::Word2phrase object.
$xmltow2v                    -> Word2vec::Xmltow2v object.
$interface                   -> Word2vec::Interface object.
$instanceAry                 -> Word Sense Disambiguation: Array of instances.
$senseAry                    -> Word Sense Disambiguation: Array of senses.
$instanceCount               -> Number of Word Sense Disambiguation instances loaded in memory.
$senseCount                  -> Number of Word Sense Disambiguation senses  loaded in memory.

Note: It is not recommended to specify all new() parameters, as it has not been thoroughly tested.  Maximum recommended parameters to be specified include:
"word2vecDir, debugLog, writeLog, ignoreCompileErrors, ignoreFileChecks"

Output:

Word2vec::Interface object.

Example:

use Word2vec::Interface;

# Parameters: Word2Vec Directory = undef, DebugLog = True, WriteLog = False, IgnoreCompileErrors = False, IgnoreFileChecks = False
my $interface = Word2vec::Interface->new( undef, 1, 0 );

undef( $interface );

# Or

# Parameters: Word2Vec Directory = undef, DebugLog = False, WriteLog = False, IgnoreCompileErrors = False, IgnoreFileChecks = False
use Word2vec::Interface;

my $interface = Word2vec::Interface->new();

undef( $interface );

DESTROY

Description:

Removes member variables and file handle from memory.

Input:

None

Output:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->DESTROY();

undef( $interface );

RunFileChecks

Description:

Runs word2vec file checks. Looks for word2vec executable files, if not found
it will then look for the source code and compile automatically placing the
executable files in the same directory. Errors out gracefully when word2vec
executable files are not present and source files cannot be located.

Notes : Word2vec Executable File List: word2vec, word2phrase, word-analogy, distance, compute-accuracy.

      : This method is called automatically in interface::new() function. It can be disabled by setting
        _ignoreFileChecks new() parameter to 1.

Input:

$string -> Word2vec source/executable directory.

Output:

$value  -> Returns '1' if checks passed and '0' if file checks failed.

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new( undef, 1, 0, 1, 1 );
my $result = $interface->RunFileChecks();

print( "Passed Word2Vec File Checks!\n" ) if $result == 0;
print( "Failed Word2Vec File Checks!\n" ) if $result == 1;

undef( $interface );

_CheckIfExecutableFileExists

Description:

Checks specified executable file exists in a given directory.

Input:

$filePath -> Executable file path
$fileName -> Executable file name

Output:

$value    -> Returns '1' if file is found and '0' if otherwise.

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $result = $interface->_CheckIfExecutableFileExists( "../../External/word2vec", "word2vec" );

print( "Executable File Exists!\n" ) if $result == 1;
print( "Executable File Does Not Exist!\n" ) if $result == 0;

undef( $interface );

_CheckIfSourceFileExists

Description:

Checks specified directory (string) for the filename (string).
This ensures the specified files are of file type "text/cpp".

Input:

$filePath -> Executable file path
$fileName -> Executable file name

Output:

$value    -> Returns '1' if file is found and of type "text/cpp" and '0' if otherwise.

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $result = $interface->_CheckIfSourceFileExists( "../../External/word2vec", "word2vec" );

print( "Source File Exists!\n" ) if $result == 1;
print( "Source File Does Not Exist!\n" ) if $result == 0;

undef( $interface );

_CompileSourceFile

Description:

Compiles C++ source filename in a specified directory.

Input:

$filePath -> Source file path (string)
$fileName -> Source file name (string)

Output:

$value    -> Returns '1' if successful and '0' if un-successful.

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface;
my $result = $interface->_CompileSourceFile( "../../External/word2vec", "word2vec" );

print( "Compiled Source Successfully!\n" ) if $result == 1;
print( "Source Compilation Attempt Unsuccessful!\n" ) if $result == 0;

undef( $interface );

GetFileType

Description:

Checks file in given file path and if it exists, returns the file type.

Input:

$filePath -> File path

Output:

$string -> Returns file type (string).

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $fileType = $interface->GetFileType( "samples/textcorpus.txt" );

print( "File Type: $fileType\n" );

undef( $interface );

GetOSType

Description:

Returns current operating system (string).

Input:

None

Output:

$string -> Operating System Type. (String)

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $os = $interface->GetOSType();

print( "Operating System: $os\n" );

undef( $interface );

_ModifyWord2VecSourceForWindows

Description:

Modifies "word2vec.c" file for compilation under windows operating system.

Input:

None

Output:

$value -> '1' = Successful / '0' = Un-successful

Example:

This is a private function and should not be utilized.

_RemoveWord2VecSourceModification

Description:

Removes modification of "word2vec.c". Returns source file to its original state.

Input:

None

Output:

$value -> '1' = Successful / '0' = Un-successful.

Example:

This is a private function and should not be utilized.

Interface Command-Line Functions

CLComputeCosineSimilarity

Description:

Command-line Method: Computes cosine similarity between 'wordA' and 'wordB' using the specified 'filePath' for
loading trained word2vec word vector data.

Input:

$filePath -> Word2Vec trained word vectors binary file path. (String)
$wordA    -> First word for cosine similarity comparison.
$wordB    -> Second word for cosine similarity comparison.

Output:

$value    -> Cosine similarity value (float) or undefined.

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $value = $interface->CLComputeCosineSimilarity( "../../samples/samplevectors.bin", "of", "the" );
print( "Cosine Similarity Between \"of\" and \"the\": $value\n" ) if defined( $value );
print( "Error: Cosine Similarity Could Not Be Computed\n" ) if !defined( $value );

undef( $interface );

CLComputeMultiWordCosineSimilarity

Description:

Command-line Method: Computes cosine similarity between 'phraseA' and 'phraseB' using the specified 'filePath'
for loading trained word2vec word vector data.

Note: Supports multiple words concatenated by ':' for each string.

Input:

$filePath -> Word2Vec trained word vectors binary file path. (String)
$phraseA  -> First phrase for cosine similarity comparison. (String)
$phraseB  -> Second phrase for cosine similarity comparison. (String)

Output:

$value    -> Cosine similarity value (float) or undefined.

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $value = $interface->CLComputeMultiWordCosineSimilarity( "../../samples/samplevectors.bin", "heart:attack", "myocardial:infarction" );
print( "Cosine Similarity Between \"heart attack\" and \"myocardial infarction\": $value\n" ) if defined( $value );
print( "Error: Cosine Similarity Could Not Be Computed\n" ) if !defined( $value );

undef( $instance );

CLComputeAvgOfWordsCosineSimilarity

Description:

Command-line Method: Computes cosine similarity average of all words in 'phraseA' and 'phraseB',
then takes cosine similarity between 'phraseA' and 'phraseB' average values using the
specified 'filePath' for loading trained word2vec word vector data.

Note: Supports multiple words concatenated by ':' for each string.

Input:

$filePath -> Word2Vec trained word vectors binary file path. (String)
$phraseA  -> First phrase for cosine similarity comparison.
$phraseB  -> Second phrase for cosine similarity comparison.

Output:

$value    -> Cosine similarity value (float) or undefined.

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $value = $interface->CLComputeAvgOfWordsCosineSimilarity( "../../samples/samplevectors.bin", "heart:attack", "myocardial:infarction" );
print( "Cosine Similarity Between \"heart attack\" and \"myocardial infarction\": $value\n" ) if defined( $value );
print( "Error: Cosine Similarity Could Not Be Computed\n" ) if !defined( $value );

undef( $instance );

CLMultiWordCosSimWithUserInput

Description:

Command-line Method: Computes cosine similarity depending on user input given a vectorBinaryFile (string).

Note: Words can be compounded by the ':' character.

Input:

$filePath -> Word2Vec trained word vectors binary file path. (String)

Output:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->CLMultiWordCosSimWithUserInput( "../../samples/samplevectors.bin" );

undef( $instance );

CLAddTwoWordVectors

Description:

Command-line Method: Loads the specified word2vec trained binary data file, adds word vectors and returns the summed result.

Input:

$filePath  -> Word2Vec trained word vectors binary file path. (String)
$wordDataA -> Word2Vec word data (String)
$wordDataB -> Word2Vec word data (String)

Output:

$vectorData -> Summed '$wordDataA' and '$wordDataB' vectors

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $wordVtr = $interface->CLAddTwoWordVectors( "../../samples/samplevectors.bin", "of", "the" );

print( "Word Vector for \"of\" + \"the\": $wordVtr\n" ) if defined( $wordVtr );
print( "Word Vector Cannot Be Computed\n" ) if !defined( $wordVtr );

undef( $instance );

CLSubtractTwoWordVectors

Description:

Command-line Method: Loads the specified word2vec trained binary data file, subtracts word vectors and returns the difference result.

Input:

$filePath  -> Word2Vec trained word vectors binary file path. (String)
$wordDataA -> Word2Vec word data (String)
$wordDataB -> Word2Vec word data (String)

Output:

$vectorData -> Difference of '$wordDataA' and '$wordDataB' vectors

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $wordVtr = $interface->CLSubtractTwoWordVectors( "../../samples/samplevectors.bin", "of", "the" );

print( "Word Vector for \"of\" - \"the\": $wordVtr\n" ) if defined( $wordVtr );
print( "Word Vector Cannot Be Computed\n" ) if !defined( $wordVtr );

undef( $instance );

CLStartWord2VecTraining

Description:

Command-line Method: Executes word2vec training given the specified options hash.

Input:

$hashRef -> Hash reference of word2vec options

Output:

$value   -> Returns '0' = Successful / '-1' = Un-successful.

Example:

use Word2vec::Interface;

my %options;
$options{'-trainfile'} = "../../samples/textcorpus.txt";
$options{'-outputfile'} = "../../samples/tempvectors.bin";

my $interface = Word2vec::Interface->new();
my $result = $interface->CLStartWord2VecTraining( \%options );

print( "Success!\n" ) if $result == 0;
print( "Failed!\n" ) if $result == -1;

undef( $interface );

CLStartWord2PhraseTraining

Description:

Command-line Method: Executes word2phrase training given the specified options hash.

Input:

$hashRef -> Hash reference of word2vec options.

Output:

$value   -> Returns '0' = Successful / '-1' = Un-successful.

Example:

use Word2vec::Interface;

my %options;
$options{'-trainfile'} = "../../samples/textcorpus.txt";
$options{'-outputfile'} = "../../samples/tempvectors.bin";

my $interface = Word2vec::Interface->new();
my $result = $interface->CLStartWord2PhraseTraining( \%options );

print( "Success!\n" ) if $result == 0;
print( "Failed!\n" ) if $result == -1;

undef( $interface );

CLCleanText

Description:

Command-line Method: Reads an input text file, normalizes based on the settings below and prints to a new file.
  - All Text Conveted To Lowercase
  - Duplicate White Spaces Removed
  - "'s" (Apostrophe 's') Characters Removed
  - Hyphen "-" Replaced With Whitespace
  - All Characters Outside Of "a-z" and NewLine Characters Are Removed
  - Lastly, Whitespace Before And After Text Is Removed

Input:

$hashRef -> Hash reference of inputfile/outputfile options.

Output:

$value   -> Returns '0' = Successful / '-1' = Un-successful.

Example:

use Word2vec::Interface;

my %options;
$options{'-inputfile'} = "../../samples/test.txt";
$options{'-outputfile'} = "../../samples/clean_text.txt";

my $interface = Word2vec::Interface->new();
my $result = $interface->CLCleanText( \%options );

print( "Success!\n" ) if $result == 0;
print( "Failed!\n" ) if $result == -1;

undef( $interface );

CLCompileTextCorpus

Description:

Command-line Method: Compiles a text corpus given the specified options hash.

Input:

$hashRef -> Hash reference of xmltow2v options.

Output:

$value   -> Returns '0' = Successful / '-1' = Un-successful.

Example:

use Word2vec::Interface;

my %options;
$options{'-workdir'} = "../../samples";
$options{'-savedir'} = "../../samples/textcorpus.txt";

my $interface = Word2vec::Interface->new();
my $result = $interface->CLCompileTextCorpus( \%options );

print( "Success!\n" ) if $result == 0;
print( "Failed!\n" ) if $result == -1;

undef( $interface );

CLConvertWord2VecVectorFileToText

Description:

Command-line Method: Converts conversion of word2vec binary format to plain text word vector data.

Input:

$filePath -> Word2Vec binary file path
$savePath -> Path to save converted file

Output:

$value    -> '0' = Successful / '-1' = Un-successful

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $result = $interface->CLConvertWord2VecVectorFileToText( "../../samples/samplevectors.bin", "../../samples/convertedvectors.bin" );

print( "Success!\n" ) if $result == 0;
print( "Failed!\n" ) if $result == -1;

undef( $interface );

CLConvertWord2VecVectorFileToBinary

Description:

Command-line Method: Converts conversion of plain text word vector data to word2vec binary format.

Input:

$filePath -> Word2Vec binary file path
$savePath -> Path to save converted file

Output:

$value    -> '0' = Successful / '-1' = Un-successful

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $result = $interface->CLConvertWord2VecVectorFileToBinary( "../../samples/samplevectors.bin", "../../samples/convertedvectors.bin" );

print( "Success!\n" ) if $result == 0;
print( "Failed!\n" ) if $result == -1;

undef( $interface );

CLConvertWord2VecVectorFileToSparse

Description:

Command-line Method: Converts conversion of plain text word vector data to sparse vector data format.

Input:

$filePath -> Vectors file path
$savePath -> Path to save converted file

Output:

$value    -> '0' = Successful / '-1' = Un-successful

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $result = $interface->CLConvertWord2VecVectorFileToSparse( "../../samples/samplevectors.bin", "../../samples/convertedvectors.bin" );

print( "Success!\n" ) if $result == 0;
print( "Failed!\n" ) if $result == -1;

undef( $interface );

CLCompoundifyTextInFile

Description:

Command-line Method: Reads a specified plain text file at 'filePath' and 'compoundWordFile', then compoundifies and saves the file to 'savePath'.

Input:

$filePath         -> Text file to compoundify
$savePath         -> Path to save compoundified file
$compoundWordFile -> Compound word file path

Output:

$value            -> Result '0' = Successful / '-1' = Un-successful

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $result = $interface->CLCompoundifyTextInFile( "../../samples/textcorpus.txt", "../../samples/compoundcorpus.txt", "../../samples/compoundword.txt" );

print( "Success!\n" ) if $result == 0;
print( "Failed!\n" ) if $result == -1;

undef( $interface );

CLSortVectorFile

Description:

Reads a specifed vector file in memory, sorts alphanumerically and saves to a file.

Input:

$hashRef -> Hash reference of parameters. (File path and overwrite parameters)

Output:

$value   -> Result '0' = Successful / '-1' = Un-successful

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();

my %options;
%options{ "-filepath" }  = "vectors.bin";
%options{ "-overwrite" } = 1;

my $result = $interface->CLSortVectorFile();

print( "Success!\n" ) if $result == 0;
print( "Failed!\n" ) if $result == -1;

undef( $interface );

CLFindSimilarTerms

Description:

Fetches an array containing the nearest n terms using cosine similarity as the metric of determining similar terms.

Input:

$term                  -> Comparison term used to find similar terms.
$numberOfSimilarTerms  -> Integer value used to limit the number of elements in array returned.

Output:

$value                 -> 'Array reference' = Successful / 'undef' = Un-successful

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $result = $interface->W2VReadTrainedVectorDataFromFile( "vectors.bin" );
$result = $interface->CLFindSimilarTerms( "cookie", 10 ) if $result == 0;

print "Success\n"                     if  defined( $result );
print "Error: No Elements Returned\n" if !defined( $result );
return if !defined( $result );

for my $element ( @{ $result } )
{
   print "$element\n";
}

undef( $interface );

CleanWord2VecDirectory

Description:

Cleans up C object and executable files in word2vec directory.

Input:

None

Output:

$value            -> Result '0' = Successful / '-1' = Un-successful

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $result = $interface->CleanWord2VecDirectory();

print( "Success!\n" ) if $result == 0;
print( "Failed!\n" ) if $result == -1;

undef( $interface );

CLSimilarityAvg

Description:

Computes cosine similarity of average values for a list of specified word comparisons given a file.

Note: Trained vector data must be loaded in memory previously before calling this method.

Input:

$filePath         -> Text file with list of word comparisons by line.

Output:

$value            -> Result '0' = Successful / '-1' = Un-successful

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $result = $interface->W2VReadTrainedVectorDataFromFile( "vectors.bin" );
$result = $interface->CLSimilarityAvg( "MiniMayoSRS.terms" ) if $result == 0;

print( "Success!\n" ) if $result == 0;
print( "Failed!\n" ) if $result == -1;

undef( $interface );

CLSimilarityComp

Description:

Computes cosine similarity values for a list of specified compound word comparisons given a file.

Note: Trained vector data must be loaded in memory previously before calling this method.

Input:

$filePath         -> Text file with list of word comparisons by line.

Output:

$value            -> Result '0' = Successful / '-1' = Un-successful

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $result = $interface->W2VReadTrainedVectorDataFromFile( "vectors.bin" );
$result = $interface->CLSimilarityComp( "MiniMayoSRS.terms" ) if $result == 0;

print( "Success!\n" ) if $result == 0;
print( "Failed!\n" ) if $result == -1;

undef( $interface );

CLSimilaritySum

Description:

Computes cosine similarity of summed values for a list of specified word comparisons given a file.

Note: Trained vector data must be loaded in memory previously before calling this method.

Input:

$filePath         -> Text file with list of word comparisons by line.

Output:

$value            -> Result '0' = Successful / '-1' = Un-successful

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $result = $interface->W2VReadTrainedVectorDataFromFile( "vectors.bin" );
$result = $interface->CLSimilaritySum( "MiniMayoSRS.terms" ) if $result == 0;

print( "Success!\n" ) if $result == 0;
print( "Failed!\n" ) if $result == -1;

undef( $interface );

CLWordSenseDisambiguation

Description:

Command-line Method: Assigns a particular sense to each instance using word2vec trained word vector data.
Stop words are removed if a stoplist is specified before computing cosine similarity average of each instance
and sense context.

Input:

$instanceFilePath -> WSD instance file path
$senseFilePath    -> WSD sense file path
$stopListfilePath -> Stop list file path

Output:

$value            -> Returns '0' = Successful / '-1' = Un-successful

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $result = $interface->CLWordSenseDisambiguation( "ACE.instances.sval", "ACE.senses.sval", "vectors.bin", "stoplist" );

print( "Success!\n" ) if $result == 0;
print( "Failed!\n" ) if $result == -1;

undef( $interface );

_WSDAnalyzeSenseData

Description:

Analyzes sense sval files for identification number mismatch and adjusts accordingly in memory.

Input:

None

Output:

None

Example:

This is a private function and should not be utilized.

_WSDReadList

Description:

Reads a WSD list when the '-list' parameter is specified.

Input:

$listPath    -> WSD list file path

Output:

\%listOfFile -> List of files hash reference

Example:

This is a private function and should not be utilized.

_WSDParseList

Description:

Parses the specified list of files for Word Sense Disambiguation computation.

Input:

$listOfFilesHashRef -> Hash reference to a hash of file paths
$vectorBinaryFile   -> Word2vec trained word vector data file
$stopListFilePath   -> Stop list file path

Output:

$value              -> '0' = Successful / '-1' = Un-successful

Example:

This is a private function and should not be utilized.

WSDParseFile

Description:

Parses a specified file in SVL format and stores all context in memory. Utilized for
Word Sense Disambiguation cosine similarity computation.

Input:

$filePath       -> WSD instance or sense file path
$stopListRegex  -> Stop list regex ( Automatically generated with stop list file )

Output:

$arrayReference -> Array reference of WSD instances or WSD senses in memory.

Example:

This is a private function and should not be utilized.

WSDCalculateCosineAvgSimiliarity

Description:

For each instance stored in memory, this method computes an average cosine similarity for the context
of each instance and sense with stop words removed via stop list regex. After average cosine similarity
values are calculated for each instance and sense, the cosine similarity of each instance and sense is
computed. The highest cosine similarity value of a given instance to a particular sense is assigned and
stored.

Input:

None

Output:

$value -> Returns '0' = Successful / '-1' = Un-successful

Example:

This is a private function and should not be utilized.

_WSDCalculateAccuracy

Description:

Computes accuracy of assigned sense identification for each instance in memory.

Input:

None

Output:

$value -> Returns accuracy percentage (float) or '-1' if un-successful.

Example:

This is a private function and should not be utilized.

WSDPrintResults

Description:

For each instance, this method prints standard information to the console window consisting of:
InstanceID
Assigned SenseID
Calculated SenseID
Cosine Similarity Value
Note: Only prints to console if '--debuglog' or 'writelog' option is passed.

Input:

None

Output:

None

Example:

This is a private function and should not be utilized.

WSDSaveResults

Description:

Saves WSD results post sense identification assignment in the 'instanceFilePath' (string) location. Saved data consists of:
InstanceID
Assigned SenseID
Calculated SenseID
Cosine Similarity Value

Input:

$instanceFilePath -> WSD instance file path

Output:

None

Example:

This is a private function and should not be utilized.

_WSDGenerateAccuracyReport

Description:

Fetches saved results for all instance files and stores accuracies for each in a text file.

Input:

$workingDirectory -> Directory of "*.results.txt" files

Output:

None

Example:

This is a private function and should not be utilized.

_WSDStop

Description:

Generates and returns a stop list regex given a 'stopListFilePath' (string). Returns undefined in the event of an error.

Input:

$stopListFilePath -> WSD Stop list file path

Output:

$stopListRegex    -> Returns stop list regex of the WSD stop list file.

Example:

This is a private function and should not be utilized.

ConvertStringLineEndingsToTargetOS

Description:

Converts passed string parameter to current OS line ending format.

ie. DOS/Windows to Unix/Linux or Unix/Linux to DOS/Windows.

Warning: This is incompatible with the legacy MacOS format, errors may occur as it is not supported.

Input:

$string -> String to convert

Output:

$string -> Output data with target OS line endings.

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();

my $tempStr = "samples text\r\n;
$tempStr = $interface->ConvertStringLineEndingsToTargetOS( $tempStr );

undef( $interface );

Interface Accessor Functions

GetWord2VecDir

Description:

Returns word2vec executable/source directory.

Input:

None

Output:

$string -> Word2vec file path

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $filePath = $interface->GetWord2VecDir();

print( "FilePath: $filePath\n" );

undef( $interface );

GetDebugLog

Description:

Returns the _debugLog member variable set during Word2vec::Word2phrase object initialization of new function.

Input:

None

Output:

$value -> 0 = False, 1 = True

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $debugLog = $interface->GetDebugLog();

print( "Debug Logging Enabled\n" ) if $debugLog == 1;
print( "Debug Logging Disabled\n" ) if $debugLog == 0;

undef( $interface );

GetWriteLog

Description:

Returns the _writeLog member variable set during Word2vec::Word2phrase object initialization of new function.

Input:

None

Output:

$value -> 0 = False, 1 = True

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $writeLog = $interface->GetWriteLog();

print( "Write Logging Enabled\n" ) if $writeLog == 1;
print( "Write Logging Disabled\n" ) if $writeLog == 0;

undef( $interface );

GetIgnoreCompileErrors

Description:

Returns the _ignoreCompileErrors member variable set during Word2vec::Word2phrase object initialization of new function.

Input:

None

Output:

$value -> 0 = False, 1 = True

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $ignoreCompileErrors = $interface->GetIgnoreCompileErrors();

print( "Ignore Compile Errors Enabled\n" ) if $ignoreCompileErrors == 1;
print( "Ignore Compile Errors Disabled\n" ) if $ignoreCompileErrors == 0;

undef( $interface );

GetIgnoreFileChecks

Description:

Returns the _ignoreFileChecks member variable set during Word2vec::Word2phrase object initialization of new function.

Input:

None

Output:

$value -> 0 = False, 1 = True

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $ignoreFileChecks = $interface->GetIgnoreFileChecks();

print( "Ignore File Checks Enabled\n" ) if $ignoreFileChecks == 1;
print( "Ignore File Checks Disabled\n" ) if $ignoreFileChecks == 0;

undef( $interface );

GetExitFlag

Description:

Returns the _exitFlag member variable set during Word2vec::Word2phrase object initialization of new function.

Input:

None

Output:

$value -> 0 = False, 1 = True

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $exitFlag = $interface->GetExitFlag();

print( "Exit Flag Set\n" ) if $exitFlag == 1;
print( "Exit Flag Not Set\n" ) if $exitFlag == 0;

undef( $interface );

GetFileHandle

Description:

Returns file handle used by WriteLog() method.

Input:

None

Output:

$fileHandle -> Returns file handle blob used by 'WriteLog()' function or undefined.

Example:

This is a private function and should not be utilized.

GetWorkingDirectory

Description:

Returns the _workingDir member variable set during Word2vec::Word2phrase object initialization of new function.

Input:

None

Output:

$string -> Returns working directory

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $dir = $interface->GetWorkingDirectory();

print( "Working Directory: $dir\n" );

undef( $interface );

GetLeskHandler

Description:

Returns the _lesk member variable set during Word2vec::Lesk object initialization of new function.

Note: This returns a new object if not defined with lesk::_debugLog and lesk::_writeLog parameters mirroring interface::_debugLog and interface::_writeLog.

Input:

None

Output:

Word2vec::Lesk -> Returns 'Word2vec::Lesk' object.

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $lesk = $interface->GetLeskHandler();

undef( $lesk );
undef( $interface );

GetSpearmansHandler

Description:

Returns the _spearmans member variable set during Word2vec::Spearmans object initialization of new function.

Note: This returns a new object if not defined with spearmans::_debugLog and spearmans::_writeLog parameters mirroring interface::_debugLog and interface::_writeLog.

Input:

None

Output:

Word2vec::Spearmans -> Returns 'Word2vec::Spearmans' object.

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $spearmans = $interface->GetSpearmansHandler();

undef( $spearmans );
undef( $interface );

GetWord2VecHandler

Description:

Returns the _word2vec member variable set during Word2vec::Word2vec object initialization of new function.

Note: This returns a new object if not defined with word2vec::_debugLog and word2vec::_writeLog parameters mirroring interface::_debugLog and interface::_writeLog.

Input:

None

Output:

Word2vec::Word2vec -> Returns 'Word2vec::Word2vec' object.

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $word2vec = $interface->GetWord2VecHandler();

undef( $word2vec );
undef( $interface );

GetWord2PhraseHandler

Description:

Returns the _word2phrase member variable set during Word2vec::Word2phrase object initialization of new function.

Note: This returns a new object if not defined with word2phrase::_debugLog and word2phrase::_writeLog parameters mirroring interface::_debugLog and interface::_writeLog.

Input:

None

Output:

Word2vec::Word2phrase -> Returns 'Word2vec::Word2phrase' object

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $word2phrase = $interface->GetWord2PhraseHandler();

undef( $word2phrase );
undef( $interface );

GetXMLToW2VHandler

Description:

Returns the _xmltow2v member variable set during Word2vec::Xmltow2v object initialization of new function.

Note: This returns a new object if not defined with word2vec::_debugLog and word2vec::_writeLog parameters mirroring interface::_debugLog and interface::_writeLog.

Input:

None

Output:

Word2vec::Xmltow2v -> Returns 'Word2vec::Xmltow2v' object

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $xmltow2v = $interface->GetXMLToW2VHandler();

undef( $xmltow2v );
undef( $interface );

#=head3 GetInstanceAry

Description:

Returns the _instanceAry member variable set during Word2vec::Word2phrase object initialization of new function.

Input:

None

Output:

$instance -> Returns array reference of WSD instances.

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $aryRef = $interface->GetInstanceAry();

my @instanceAry = @{ $aryRef };
undef( $interface );

GetSensesAry

Description:

Returns the _senseAry member variable set during Word2vec::Word2phrase object initialization of new function.

Input:

None

Output:

$senses -> Returns array reference of WSD senses.

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $aryRef = $interface->GetSensesAry();

my @sensesAry = @{ $aryRef };
undef( $interface );

GetInstanceCount

Description:

Returns the _instanceCount member variable set during Word2vec::Word2phrase object initialization of new function.

Input:

None

Output:

$value -> Returns number of stored WSD instances.

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $count = $interface->GetInstanceCount();

print( "Stored WSD instances in memory: $count\n" );

undef( $interface );

GetSenseCount

Description:

Returns the _sensesCount member variable set during Word2vec::Word2phrase object initialization of new function.

Input:

None

Output:

$value -> Returns number of stored WSD senses.

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $count = $interface->GetSensesCount();

print( "Stored WSD senses in memory: $count\n" );

undef( $interface );

Interface Mutator Functions

SetWord2VecDir

Description:

Sets word2vec executable/source file directory.

Input:

$string -> Word2Vec Directory

Output:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->SetWord2VecDir( "/word2vec" );

undef( $interface );

SetDebugLog

Description:

Instructs module to print debug statements to the console.

Input:

$value -> '1' = Print Debug Statements / '0' = Do Not Print Statements

Output:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->SetDebugLog( 1 );

undef( $interface );

SetWriteLog

Description:

Instructs module to print a log file.

Input:

$value -> '1' = Print Debug Statements / '0' = Do Not Print Statements

Output:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->SetWriteLog( 1 );

undef( $interface );

SetIgnoreCompileErrors

Description:

Instructs module to ignore compile errors when compiling source files.

Input:

$value -> '1' = Ignore warnings/errors, '0' = Display and process warnings/errors.

Output:

None

Example:

use Word2vec::Interface;

my $instance = word2vec::instance->new();
$instance->SetIgnoreCompileErrors( 1 );

undef( $instance );

SetIgnoreFileCheckErrors

Description:

Instructs module to ignore file checking errors.

Input:

$value -> '1' = Ignore warnings/errors, '0' = Display and process warnings/errors.

Output:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->SetIgnoreFileCheckErrors( 1 );

undef( $interface );

SetWorkingDirectory

Description:

Sets current working directory.

Input:

$path -> Working directory path (String)

Output:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->SetWorkingDirectory( "my/new/working/directory" );

undef( $interface );

SetInstanceAry

Description:

Sets member instance array variable to de-referenced passed array reference parameter.

Input:

$arrayReference -> Array reference for Word Sense Disambiguation - Array of instances (Word2vec::Wsddata objects).

Output:

None

Example:

use word2vec::instance;

# This array would theoretically contain 'Word2vec::Wsddata' objects.
my @instanceAry = ();

my $instance = word2vec::instance->new();
$instance->SetInstanceAry( \@instanceAry );

undef( $instance );

ClearInstanceAry

Description:

Clears member instance array.

Input:

None

Output:

None

Example:

use Word2vec::Interface;

my $instance = word2vec::instance->new();
$instance->ClearInstanceAry();

undef( $instance );

SetSenseAry

Description:

Sets member sense array variable to de-referenced passed array reference parameter.

Input:

$arrayReference -> Array reference for Word Sense Disambiguation - Array of senses (Word2vec::Wsddata objects).

Output:

None

Example:

use Word2vec::Interface;

# This array would theoretically contain 'Word2vec::Wsddata' objects.
my @senseAry = ();

my $interface = word2vec::instance->new();
$interface->SetSenseAry( \@senseAry );

undef( $instance );

ClearSenseAry

Description:

Clears member sense array.

Input:

None

Output:

None

Example:

use word2vec::instance;

my $instance = word2vec::instance->new();
$instance->ClearSenseAry();

undef( $instance );

SetInstanceCount

Description:

Sets member instance count variable to passed value (integer).

Input:

$value -> Integer (Positive)

Output:

None

Example:

use word2vec::instance;

my $instance = word2vec::instance->new();
$instance->SetInstanceCount( 12 );

undef( $instance );

SetSenseCount

Description:

Sets member sense count variable to passed value (integer).

Input:

$value -> Integer (Positive)

Output:

None

Example:

use Word2vec::Interface;

my $interface = word2vec::instance->new();
$instance->SetSenseCount( 12 );

undef( $instance );

Debug Functions

GetTime

Description:

Returns current time string in "Hour:Minute:Second" format.

Input:

None

Output:

$string -> XX:XX:XX ("Hour:Minute:Second")

Example:

use Word2vec::Interface:

my $interface = Word2vec::Interface->new();
my $time = $interface->GetTime();

print( "Current Time: $time\n" ) if defined( $time );

undef( $interface );

GetDate

Description:

Returns current month, day and year string in "Month/Day/Year" format.

Input:

None

Output:

$string -> XX/XX/XXXX ("Month/Day/Year")

Example:

use Word2vec::Interface:

my $interface = Word2vec::Interface->new();
my $date = $interface->GetDate();

print( "Current Date: $date\n" ) if defined( $date );

undef( $interface );

WriteLog

Description:

Prints passed string parameter to the console, log file or both depending on user options.

Note: printNewLine parameter prints a new line character following the string if the parameter
is undefined and does not if parameter is 0.

Input:

$string -> String to print to the console/log file.
$value  -> 0 = Do not print newline character after string, all else prints new line character including 'undef'.

Output:

None

Example:

use Word2vec::Interface:

my $interface = Word2vec::Interface->new();
$interface->WriteLog( "Hello World" );

undef( $interface );

Lesk Main Functions

GetPhraseOverlapBetweenStrings

Description:

Given two strings, this returns a hash of all overlapping (matching) phrases between both strings and their frequency counts. This prioritizes longer phrases as higher priority when matching.

Input:

$string_a -> First comparison string
$string_b -> Second comparison string

Output:

$hash_ref -> Returns a hash table reference with keys being the unique matching phrase between two input string parameters and the value as the frequency count of each unique phrase.

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();

my %phrase_overlaps = %{ $interface->GetPhraseOverlapBetweenStrings( "I like to eat cookies", "Sometimes I like to eat cookies" ) };

for my $phrase ( sort keys %phrase_overlaps )
{
   print "$phrase : $phrase_overlaps{ $phrase }\n";
}

undef( %phrase_overlaps );
undef( $interface );

CalculateLeskScore

Description:

Given two strings, this returns a lesk score based on overlapping (matching) features between both strings.

Input:

$string_a -> First comparison string
$string_b -> Second comparison string

Output:

$score    -> Lesk Score (Float)

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();

my $lesk_score = $interface->CalculateLeskScore( "I like to eat cookies", "Sometimes I like to eat cookies" );

print "Lesk Score: $lesk_score\n";

undef( $interface );

CalculateLeskCosineScore

Description:

Given two strings, this returns a cosine score based on overlapping (matching) features between both strings.

Input:

$string_a -> First comparison string
$string_b -> Second comparison string

Output:

$score    -> Cosine Score (Float)

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();

my $cosine_score = $interface->CalculateLeskCosineScore( "I like to eat cookies", "Sometimes I like to eat cookies" );

print "Cosine Score: $cosine_score\n";

undef( $interface );

CalculateLeskFScore

Description:

Given two strings, this returns a F score based on overlapping (matching) features between both strings.

Input:

$string_a -> First comparison string
$string_b -> Second comparison string

Output:

$score    -> F Score (Float)

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();

my $f_score = $interface->CalculateLeskFScore( "I like to eat cookies", "Sometimes I like to eat cookies" );

print "F Score: $f_score\n";

undef( $interface );

CalculateAllLeskScores

Description:

Given two strings, this returns a list of scores (F, Cosine, Lesk, Raw Lesk, Precision, Recall), frequency counts (features, phrases, string lengths).

Input:

$string_a    -> First comparison string
$string_b    -> Second comparison string

Output:

$result_hash -> Hash reference containing: Lesk, Raw Lesk, F, Precision, Recall, Cosine, Matching Feature Frequency, Matching Phrase Frequency, String A Length and String B Length.

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();

my %scores = %{ $interface->CalculateAllLeskScores( "I like to eat cookies", "Sometimes I like to eat cookies" ) };

for my $score_name ( sort keys %scores )
{
   print "$score_name : $scores{ $score_name }\n";
}

undef( $interface );

Util Main Functions

CleanText

Description:

Normalizes text based on the following.
 - Text converted to lowercase
 - More than one white space is replaced with a single white space
 - Apostrophe "s" ('s) characters are removed
 - Hyphen character is replaced with a single white space
 - All special characters removed outside of lowercase 'a-z' and compoundified terms retained, joined by '_' (underscore).
 - Line-feed/carriage return (LF-CR) endings are cleaned and converted to OS specific LF-CR endings

Input:

$string -> String of text to normalize

Output:

$string -> Cleaned/Normalized text.

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $text = "123485clean-text!!@&^#*@";

print( "Original Text: \"$text\"\n" );

$text = $interface->CleanText( $text );

print( "Cleaned Text: \"$text\"\n" );

undef( $interface );

RemoveNewLineEndingsFromString

Description:

Removes new line endings from string. Supports MSWin32, linux and MacOS line endings.

Input:

$string -> String with line-feed/carriage return ending to remove.

Output:

$string -> String without line-feed/carriage return ending.

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $text = "this is sample text\n";

print( "Original Text: \"$text\"\n" );

$text = $interface->RemoveNewLineEndingsFromString( $text );

print( "Cleaned Text: \"$text\"\n" );

undef( $interface );

IsFileOrDirectory

Description:

Given a path, returns a string specifying whether this path represents a file or directory.

Input:

$path   -> String representing path to check

Output:

$string -> Returns "file", "dir" or "unknown".

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();

my $result = $interface->IsFileOrDirectory( "../samples/stoplist" );

print( "Path Type Is A File\n" ) if $result eq "file";
print( "Path Type Is A Directory\n" ) if $result eq "dir";
print( "Path Type Is Unknown\n" ) if $result eq "unknown";

undef( $interface );

IsWordOrCUITerm

Description:

Determines if string parameter is a 'word' or 'cui'.

Input:

$string -> String with single term/cui to examine.

Output:

$string -> Returns "word" or "cui".

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();

my $result = $interface->IsWordOrCUITerm( "c12345678" );

print( "String Is Word\n" )  if $result eq "word";
print( "String Is A CUI\n" ) if $result eq "cui";

undef( $interface );

GetFilesInDirectory

Description:

Given a path and file tag string, returns a string of files consisting of the file tag string in the specified path.

Input:

$path    -> String representing path
$fileTag -> String consisting of file tag to fetch.

Output:

$string  -> Returns string of file names consisting of $fileTag.

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();

# Looks in specified path for files including ".sval" in their file name.
my $result = $interface->GetFilesInDirectory( "../samples/", ".sval" );

print( "Found File Name(s): $result\n" ) if defined( $result );

undef( $interface );

Spearmans Main Functions

SpCalculateSpearmans

Calculates Spearman's Rank Correlation Score between two data-sets.

Input:

$fileA                  -> Data set to compare
$fileB                  -> Data set to compare
$includeCountsInResults -> Specifies whether to return file counts in score. (undef = False / defined = True)

Output:

$value -> "undef" or Spearman's Rank Correlation Score

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $score     = $interface->SpCalculateSpearmans( "samples/MiniMayoSRS.term.comp_results", "Similarity/MiniMayoSRS.terms.coders", undef );
print "Spearman's Rank Correlation Score: $score\n" if defined( $score );
print "Spearman's Rank Correlation Score: undef\n" if !defined( $score );

undef( $interface );

SpIsFileWordOrCUIFile

Description:

Determines if a file is composed of CUI or word terms by checking the first line.

Input:

$string -> File Path

Output:

$string -> "undef" = Unable to determine, "cui" = CUI Term File, "word" = Word Term File

Example:

use Word2vec::Interface;

my $interface       = Word2vec::Interface->new();
my $isWordOrCuiFile = $interface->SpIsFileWordOrCUIFile( "samples/MiniMayoSRS.terms" );

print( "MiniMayoSRS.terms File Is A \"$isWordOrCuiFile\" File\n" ) if defined( $isWordOrCuiFile );
print( "Unable To Determine Type Of File\n" )                      if !defined( $isWordOrCuiFile );

undef( $interface );

SpGetPrecision

Returns the number of decimal places after the decimal point of the Spearman's Rank Correlation Score to represent.

Input:

None

Output:

$value -> Integer

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
print "Spearman's Precision: " . $interface->SpGetPrecision() . "\n";

undef( $interface );

SpGetIsFileOfWords

Returns the variable indicating whether the files to be parsed are files consisting of words or CUI terms.

Input:

None

Output:

$value -> "undef" = Auto-Detect, 0 = CUI Terms, 1 = Word Terms

Example:

use Word2vec::Interface;

my $interface     = Word2vec::Interface->new();
my $isFileOfWords = $interface->SpGetIsFileOfWords();
print "Is File Of Words?: $isFileOfWords\n" if defined( $isFileOfWords );
print "Is File Of Words?: undef\n" if !defined( $isFileOfWords );

undef( $interface );

SpGetPrintN

Returns the variable indicating whether the to print NValue.

Input:

None

Output:

$value -> "undef" = Do not print NValue, "defined" = Print NValue

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $printN    = $interface->SpGetPrintN();
print "Print N\n"        if defined( $printN );
print "Do Not Print N\n" if !defined( $printN );

undef( $interface );

SpGetACount

Returns the non-negative count for file A.

Input:

None

Output:

$value -> Integer

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
print "A Count: " . $interface->SpGetACount() . "\n";

undef( $interface );

SpGetBCount

Returns the non-negative count for file B.

Input:

None

Output:

$value -> Integer

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
print "B Count: " . $interface->SpGetBCount() . "\n";

undef( $interface );

SpGetNValue

Returns the N value.

Input:

None

Output:

$value -> Integer

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
print "N Value: " . $interface->SpGetNValue() . "\n";

undef( $interface );

SpSetPrecision

Sets number of decimal places after the decimal point of the Spearman's Rank Correlation Score to represent.

Input:

$value -> Integer

Output:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->SpSetPrecision( 8 );
my $score = $interface->SpCalculateSpearmans( "samples/MiniMayoSRS.term.comp_results", "Similarity/MiniMayoSRS.terms.coders", undef );
print "Spearman's Rank Correlation Score: $score\n" if defined( $score );
print "Spearman's Rank Correlation Score: undef\n" if !defined( $score );

undef( $interface );

SpSetIsFileOfWords

Specifies the main method to auto-detect if file consists of CUI or Word terms, or manual override with user setting.

Input:

$value -> "undef" = Auto-Detect, 0 = CUI Terms, 1 = Word Terms

Output:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->SpSetIsFileOfWords( undef );
my $score = $interface->SpCalculateSpearmans( "samples/MiniMayoSRS.term.comp_results", "Similarity/MiniMayoSRS.terms.coders", undef );
print "Spearman's Rank Correlation Score: $score\n" if defined( $score );
print "Spearman's Rank Correlation Score: undef\n" if !defined( $score );

undef( $interface );

SpSetPrintN

Specifies the main method print _NValue post Spearmans::CalculateSpearmans() function completion.

Input:

$value -> "undef" = Do Not Print _NValue, "defined" = Print _NValue

Output:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->SpSetPrintN( 1 );
my $score = $interface->SpCalculateSpearmans( "samples/MiniMayoSRS.term.comp_results", "Similarity/MiniMayoSRS.terms.coders", undef );
print "Spearman's Rank Correlation Score: $score\n" if defined( $score );
print "Spearman's Rank Correlation Score: undef\n" if !defined( $score );

undef( $interface );

Word2Vec Main Functions

W2VExecuteTraining

Executes word2vec training based on parameters. Parameter variables have higher precedence
than member variables. Any parameter specified will override its respective member variable.

Note: If no parameters are specified, this module executes word2vec training based on preset
member variables. Returns string regarding training status.

Input:

$trainFilePath  -> Specifies word2vec text corpus training file in a given path. (String)
$outputFilePath -> Specifies word2vec trained output data file name and save path. (String)
$vectorSize     -> Size of word2vec word vectors. (Integer)
$windowSize     -> Maximum skip length between words. (Integer)
$minCount       -> Disregard words that appear less than $minCount times. (Integer)
$sample         -> Threshold for occurrence of words. Those that appear with higher frequency in the training data will be randomly down-sampled. (Float)
$negative       -> Number of negative examples. (Integer)
$alpha          -> Set that start learning rate. (Float)
$hs             -> Hierarchical Soft-max (Integer)
$binary         -> Save trained data as binary mode. (Integer)
$numOfThreads   -> Number of word2vec training threads. (Integer)
$iterations     -> Number of training iterations to run prior to completion of training. (Integer)
$useCBOW        -> Enable Continuous Bag Of Words model or Skip-Gram model. (Integer)
$classes        -> Output word classes rather than word vectors. (Integer)
$readVocab      -> Read vocabulary from file path without constructing from training data. (String)
$saveVocab      -> Save vocabulary to file path. (String)
$debug          -> Set word2vec debug mode. (Integer)
$overwrite      -> Instructs the module to either overwrite any existing text corpus files or append to the existing file. ( '1' = True / '0' = False )

Note: It is not recommended to specify all new() parameters, as it has not been thoroughly tested.

Output:

$value          -> '0' = Successful / '-1' = Un-successful

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->W2VSetTrainFilePath( "textcorpus.txt" );
$interface->W2VSetOutputFilePath( "vectors.bin" );
$interface->W2VSetWordVecSize( 200 );
$interface->W2VSetWindowSize( 8 );
$interface->W2VSetSample( 0.0001 );
$interface->W2VSetNegative( 25 );
$interface->W2VSetHSoftMax( 0 );
$interface->W2VSetBinaryOutput( 0 );
$interface->W2VSetNumOfThreads( 20 );
$interface->W2VSetNumOfIterations( 15 );
$interface->W2VSetUseCBOW( 1 );
$interface->W2VSetOverwriteOldFile( 0 );
$interface->W2VExecuteTraining();

undef( $interface );

# or

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->W2VExecuteTraining( "textcorpus.txt", "vectors.bin", 200, 8, 5, 0.001, 25, 0.05, 0, 0, 20, 15, 1, 0, "", "", 2, 0 );

undef( $interface );

W2VExecuteStringTraining

Executes word2vec training based on parameters. Parameter variables have higher precedence
than member variables. Any parameter specified will override its respective member variable.

Note: If no parameters are specified, this module executes word2vec training based on preset
member variables. Returns string regarding training status.

Input:

$trainingStr    -> String to train with word2vec.
$outputFilePath -> Specifies word2vec trained output data file name and save path. (String)
$vectorSize     -> Size of word2vec word vectors. (Integer)
$windowSize     -> Maximum skip length between words. (Integer)
$minCount       -> Disregard words that appear less than $minCount times. (Integer)
$sample         -> Threshold for occurrence of words. Those that appear with higher frequency in the training data will be randomly down-sampled. (Float)
$negative       -> Number of negative examples. (Integer)
$alpha          -> Set that start learning rate. (Float)
$hs             -> Hierarchical Soft-max (Integer)
$binary         -> Save trained data as binary mode. (Integer)
$numOfThreads   -> Number of word2vec training threads. (Integer)
$iterations     -> Number of training iterations to run prior to completion of training. (Integer)
$useCBOW        -> Enable Continuous Bag Of Words model or Skip-Gram model. (Integer)
$classes        -> Output word classes rather than word vectors. (Integer)
$readVocab      -> Read vocabulary from file path without constructing from training data. (String)
$saveVocab      -> Save vocabulary to file path. (String)
$debug          -> Set word2vec debug mode. (Integer)
$overwrite      -> Instructs the module to either overwrite any existing text corpus files or append to the existing file. ( '1' = True / '0' = False )

Note: It is not recommended to specify all new() parameters, as it has not been thoroughly tested.

Output:

$value          -> '0' = Successful / '-1' = Un-successful

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->W2VSetOutputFilePath( "vectors.bin" );
$interface->W2VSetWordVecSize( 200 );
$interface->W2VSetWindowSize( 8 );
$interface->W2VSetSample( 0.0001 );
$interface->W2VSetNegative( 25 );
$interface->W2VSetHSoftMax( 0 );
$interface->W2VSetBinaryOutput( 0 );
$interface->W2VSetNumOfThreads( 20 );
$interface->W2VSetNumOfIterations( 15 );
$interface->W2VSetUseCBOW( 1 );
$interface->W2VSetOverwriteOldFile( 0 );
$interface->W2VExecuteStringTraining( "string to train here" );

undef( $interface );

# or

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->W2VExecuteStringTraining( "string to train here", "vectors.bin", 200, 8, 5, 0.001, 25, 0.05, 0, 0, 20, 15, 1, 0, "", "", 2, 0 );

undef( $interface );

W2VComputeCosineSimilarity

Description:

Computes cosine similarity between two words using trained word2vec vector data. Returns
float value or undefined if one or more words are not in the dictionary.

Note: Supports single words only and requires vector data to be in memory with W2VReadTrainedVectorDataFromFile() prior to function execution.

Input:

$string -> Single string word
$string -> Single string word

Output:

$value  -> Float or Undefined

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->W2VReadTrainedVectorDataFromFile( "samples/samplevectors.bin" );
print "Cosine similarity between words: \"of\" and \"the\": " . $interface->W2VComputeCosineSimilarity( "of", "the" ) . "\n";

undef( $interface );

W2VComputeAvgOfWordsCosineSimilarity

Description:

Computes cosine similarity between two words or compound words using trained word2vec vector data.
Returns float value or undefined.

Note: Supports multiple words concatenated by ' ' and requires vector data to be in memory prior
to method execution. This method will not error out when a word is not located within the dictionary.
It will take the average of all found words for each parameter then cosine similarity of both word vectors.

Input:

$string -> string of single or multiple words separated by ' ' (space).
$string -> string of single or multiple words separated by ' ' (space).

Output:

$value  -> Float or Undefined

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->W2VReadTrainedVectorDataFromFile( "samples/samplevectors.bin" );
print "Cosine similarity between words: \"heart attack\" and \"acute myocardial infarction\": " .
      $interface->W2VComputeAvgOfWordsCosineSimilarity( "heart attack", "acute myocardial infarction" ) . "\n";

undef( $interface );

W2VComputeMultiWordCosineSimilarity

Description:

Computes cosine similarity between two words or compound words using trained word2vec vector data.

Note: Supports multiple words concatenated by ' ' (space) and requires vector data to be in memory prior to method execution.
If $allWordsMustExist is set to true, this function will error out when a specified word is not found and return undefined.

Input:

$string            -> string of single or multiple words separated by ' ' (space).
$string            -> string of single or multiple words separated by ' ' (space).
$allWordsMustExist -> 1 = True, 0 or undef = False

Output:

$value             -> Float or Undefined

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->W2VReadTrainedVectorDataFromFile( "samples/samplevectors.bin" );
print "Cosine similarity between words: \"heart attack\" and \"acute myocardial infarction\": " .
      $interface->W2VComputeMultiWordCosineSimilarity( "heart attack", "acute myocardial infarction" ) . "\n";

undef( $interface );

W2VComputeCosineSimilarityOfWordVectors

Description:

Computes cosine similarity between two word vectors.
Returns float value or undefined if one or more words are not in the dictionary.

Note: Function parameters require actual word vector data with words removed.

Input:

$string -> string of word vector representation data separated by ' ' (space).
$string -> string of word vector representation data separated by ' ' (space).

Output:

$value  -> Float or Undefined

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->W2VReadTrainedVectorDataFromFile( "samples/samplevectors.bin" );
my $vectorAData = $interface->W2VGetWordVector( "heart" );
my $vectorBData = $interface->W2VGetWordVector( "attack" );

# Remove Words From Data
$vectorAData = W2VRemoveWordFromWordVectorString( $vectorAData );
$vectorBData = W2VRemoveWordFromWordVectorString( $vectorBData );

undef( @tempAry );

print "Cosine similarity between words: \"heart\" and \"attack\": " .
      $interface->W2VComputeCosineSimilarityOfWordVectors( $vectorAData, $vectorBData ) . "\n";

undef( $interface );

W2VCosSimWithUserInput

Description:

Computes cosine similarity between two words using trained word2vec vector data based on user input.

Note: No compound word support.

Warning: Requires vector data to be in memory prior to method execution.

Input:

None

Output:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->W2VReadTrainedVectorDataFromFile( "samples/samplevectors.bin" );
$interface->W2VCosSimWIthUserInputTest();

undef( $interface );

W2VMultiWordCosSimWithUserInput

Description:

Computes cosine similarity between two words or compound words using trained word2vec vector data based on user input.

Note: Supports multiple words concatenated by ':'.

Warning: Requires vector data to be in memory prior to method execution.

Input:

None

Output:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->W2VReadTrainedVectorDataFromFile( "samples/samplevectors.bin" );
$interface->W2VMultiWordCosSimWithUserInput();

undef( $interface );

W2VComputeAverageOfWords

Description:

Computes cosine similarity average of all found words given an array reference parameter of
plain text words. Returns average values (string) or undefined.

Warning: Requires vector data to be in memory prior to method execution.

Input:

$arrayReference -> Array reference of words

Output:

$string         -> String of word2vec word average values

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->W2VReadTrainedVectorDataFromFile( "samples/samplevectors.bin" );
my @wordAry = qw( of the and );
my $data = $interface->W2VComputeAverageOfWords( \@wordAry );
print( "Computed Average Of Words: $data" ) if defined( $data );

undef( $interface );

W2VAddTwoWords

Description:

Adds two word vectors and returns the result.

Warning: This method also requires vector data to be in memory prior to method execution.

Input:

$string -> Word to add
$string -> Word to add

Output:

$string -> String of word2vec summed word values

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->W2VReadTrainedVectorDataFromFile( "samples/samplevectors.bin" );

my $data = $interface->W2VAddTwoWords( "heart", "attack" );
print( "Computed Sum Of Words: $data" ) if defined( $data );

undef( $interface );

W2VSubtractTwoWords

Description:

Subtracts two word vectors and returns the result.

Warning: This method also requires vector data to be in memory prior to method execution.

Input:

$string -> Word to subtract
$string -> Word to subtract

Output:

$string -> String of word2vec difference between word values

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->W2VReadTrainedVectorDataFromFile( "samples/samplevectors.bin" );

my $data = $interface->W2VSubtractTwoWords( "king", "man" );
print( "Computed Difference Of Words: $data" ) if defined( $data );

undef( $interface );

W2VAddTwoWordVectors

Description:

Adds two vector data strings and returns the result.

Warning: Text word must be removed from vector data prior to calling this method. This method
also requires vector data to be in memory prior to method execution.

Input:

$string -> Word2vec word vector data (with string word removed)
$string -> Word2vec word vector data (with string word removed)

Output:

$string -> String of word2vec summed word values

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->W2VReadTrainedVectorDataFromFile( "samples/samplevectors.bin" );
my $wordAData = $interface->W2VGetWordVector( "of" );
my $wordBData = $interface->W2VGetWordVector( "the" );

# Removing Words From Vector Data
$wordAData = W2VRemoveWordFromWordVectorString( $wordAData );
$wordBData = W2VRemoveWordFromWordVectorString( $wordBData );

my $data = $interface->W2VAddTwoWordVectors( $wordAData, $wordBData );
print( "Computed Sum Of Words: $data" ) if defined( $data );

undef( $interface );

W2VSubtractTwoWordVectors

Description:

Subtracts two vector data strings and returns the result.

Warning: Text word must be removed from vector data prior to calling this method. This method
also requires vector data to be in memory prior to method execution.

Input:

$string -> Word2vec word vector data (with string word removed)
$string -> Word2vec word vector data (with string word removed)

Output:

$string -> String of word2vec difference between word values

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->W2VReadTrainedVectorDataFromFile( "samples/samplevectors.bin" );
my $wordAData = $interface->W2VGetWordVector( "of" );
my $wordBData = $interface->W2VGetWordVector( "the" );

# Removing Words From Vector Data
$wordAData = W2VRemoveWordFromWordVectorString( $wordAData );
$wordBData = W2VRemoveWordFromWordVectorString( $wordBData );

my $data = $interface->W2VSubtractTwoWordVectors( $wordAData, $wordBData );
print( "Computed Difference Of Words: $data" ) if defined( $data );

undef( $interface );

W2VAverageOfTwoWordVectors

Description:

Computes the average of two vector data strings and returns the result.

Warning: Text word must be removed from vector data prior to calling this method. This method
also requires vector data to be in memory prior to method execution.

Input:

$string -> Word2vec word vector data (with string word removed)
$string -> Word2vec word vector data (with string word removed)

Output:

$string -> String of word2vec average between word values

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->W2VReadTrainedVectorDataFromFile( "samples/samplevectors.bin" );
my $wordAData = $interface->W2VGetWordVector( "of" );
my $wordBData = $interface->W2VGetWordVector( "the" );

# Removing Words From Vector Data
$wordAData = W2VRemoveWordFromWordVectorString( $wordAData );
$wordBData = W2VRemoveWordFromWordVectorString( $wordBData );

my $data = $interface->W2VAverageOfTwoWordVectors( $wordAData, $wordBData );
print( "Computed Average Of Words: $data" ) if defined( $data );

undef( $interface );

W2VGetWordVector

Description:

Searches dictionary in memory for the specified string argument and returns the vector data.
Returns undefined if not found.

Warning: Requires vector data to be in memory prior to method execution.

Input:

$string -> Word to locate in word2vec vocabulary/dictionary

Output:

$string -> Found word2vec word + word vector data or undefined.

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->W2VReadTrainedVectorDataFromFile( "samples/samplevectors.bin" );
my $wordData = $interface->W2VGetWordVector( "of" );
print( "Word2vec Word Data: $wordData\n" ) if defined( $wordData );

undef( $interface );

W2VIsVectorDataInMemory

Description:

Checks to see if vector data has been loaded in memory.

Input:

None

Output:

$value -> '1' = True / '0' = False

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $result = $interface->W2VIsVectorDataInMemory();

print( "No vector data in memory\n" ) if $result == 0;
print( "Yes vector data in memory\n" ) if $result == 1;

$interface->W2VReadTrainedVectorDataFromFile( "samples/samplevectors.bin" );

print( "No vector data in memory\n" ) if $result == 0;
print( "Yes vector data in memory\n" ) if $result == 1;

undef( $interface );

W2VIsWordOrCUIVectorData

Description:

Checks to see if vector data consists of word or CUI terms.

Input:

None

Output:

$string -> 'cui', 'word' or undef

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->W2VReadTrainedVectorDataFromFile( "samples/samplevectors.bin" );
my $isWordOrCUIData = $interface->W2VIsWordOrCUIVectorData();

print( "Vector Data Consists Of \"$isWordOrCUIData\" Terms\n" ) if defined( $isWordOrCUIData );
print( "Cannot Determine Type Of Terms\n" ) if !defined( $isWordOrCUIData );

undef( $interface );

W2VIsVectorDataSorted

Description:

Checks to see if vector data header is signed as sorted in memory.

Input:

None

Output:

$value -> '1' = True / '0' = False

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->W2VReadTrainedVectorDataFromFile( "samples/samplevectors.bin" );

my $result = $interface->IsVectorDataSorted();

print( "No vector data is not sorted\n" ) if $result == 0;
print( "Yes vector data is sorted\n" ) if $result == 1;

undef( $interface );

W2VCheckWord2VecDataFileType

Description:

Checks specified file to see if vector data is in binary or plain text format. Returns 'text'
for plain text and 'binary' for binary data.

Input:

$string -> File path

Output:

$string -> File Type ( "text" = Plain text file / "binary" = Binary data file )

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $fileType = $interface->W2VCheckWord2VecDataFileType( "samples/samplevectors.bin" );

print( "FileType: $fileType\n" ) if defined( $fileType );

undef( $fileType );

W2VReadTrainedVectorDataFromFile

Description:

Reads trained vector data from file path in memory or searches for vector data from file. This function supports and
automatically detects word2vec binary, plain text and sparse vector data formats.

Note: If search word is undefined, the entire vector file is loaded in memory. If a search word is defined only the vector data is returned or undef.

Input:

$string     -> Word2vec trained vector data file path
$searchWord -> Searches trained vector data file for specific word vector

Output:

$value      -> '0' = Successful / '-1' = Un-successful

Example:

# Loading data in memory
use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $result = $interface->W2VReadTrainedVectorDataFromFile( "samples/samplevectors.bin" );

print( "Success Loading Data\n" ) if $result == 0;
print( "Un-successful, Data Not Loaded\n" ) if $result == -1;

undef( $interface );

# or

# Searching vector data file for a specific word vector
use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $result = $interface->W2VReadTrainedVectorDataFromFile( "samples/samplevectors.bin", "medical" );

print( "Found Vector Data In File\n" ) if $result != -1;
print( "Vector Data Not Found\n" )     if $result == -1;

undef( $interface );

W2VSaveTrainedVectorDataToFile

Description:

Saves trained vector data at the location in specified format.

Note: Leaving 'saveFormat' undefined will automatically save as plain text format.

Input:

$string       -> Save Path
$saveFormat   -> Integer ( '0' = Save as plain text / '1' = Save data in word2vec binary format / '2' = Sparse vector data Ffrmat )

Note: Leaving $saveFormat as undefined will save the file in plain text format.

Warning: If the vector data is stored as a binary search tree, this method will error out gracefully.

Output:

$value        -> '0' = Successful / '-1' = Un-successful

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();

$interface->W2VReadTrainedVectorDataFromFile( "samples/samplevectors.bin" );
$interface->W2VSaveTrainedVectorDataToFile( "samples/newvectors.bin" );

undef( $interface );

W2VStringsAreEqual

Description:

Compares two strings to check for equality, ignoring case-sensitivity.

Note: This method is not case-sensitive. ie. "string" equals "StRiNg"

Input:

$string -> String to compare
$string -> String to compare

Output:

$value  -> '1' = Strings are equal / '0' = Strings are not equal

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $result = $interface->W2VStringsAreEqual( "hello world", "HeLlO wOrLd" );

print( "Strings are equal!\n" )if $result == 1;
print( "Strings are not equal!\n" ) if $result == 0;

undef( $interface );

W2VRemoveWordFromWordVectorString

Description:

Given a vector data string as input, it removed the vector word from its data returning only data.

Input:

$string          -> Vector word & data string.

Output:

$string          -> Vector data string.

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $str = "cookie 1 0.234 9 0.0002 13 0.234 17 -0.0023 19 1.0000";

my $vectorData = $interface->W2VRemoveWordFromWordVectorString( $str );

print( "Success!\n" ) if length( vectorData ) < length( $str );

undef( $interface );

W2VConvertRawSparseTextToVectorDataAry

Description:

Converts sparse vector string to a dense vector format data array.

Input:

$string          -> Vector data string.

Output:

$arrayReference  -> Reference to array of vector data.

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $str = "cookie 1 0.234 9 0.0002 13 0.234 17 -0.0023 19 1.0000";

my @vectorData = @{ $interface->W2VConvertRawSparseTextToVectorDataAry( $str ) };

print( "Data conversion successful!\n" ) if @vectorData > 0;
print( "Data conversion un-successful!\n" ) if @vectorData == 0;

undef( $interface );

W2VConvertRawSparseTextToVectorDataHash

Description:

Converts sparse vector string to a dense vector format data hash.

Input:

$string          -> Vector data string.

Output:

$hashReference   -> Reference to hash of vector data.

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $str = "cookie 1 0.234 9 0.0002 13 0.234 17 -0.0023 19 1.0000";

my %vectorData = %{ $interface->W2VConvertRawSparseTextToVectorDataHash( $str ) };

print( "Data conversion successful!\n" ) if ( keys %vectorData ) > 0;
print( "Data conversion un-successful!\n" ) if ( keys %vectorData ) == 0;

undef( $interface );

Word2Vec Accessor Functions

W2VGetDebugLog

Description:

Returns the _debugLog member variable set during Word2vec::Word2vec object initialization of new function.

Input:

None

Output:

$value -> '0' = False, '1' = True

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new()
my $debugLog = $interface->W2VGetDebugLog();

print( "Debug Logging Enabled\n" ) if $debugLog == 1;
print( "Debug Logging Disabled\n" ) if $debugLog == 0;


undef( $interface );

W2VGetWriteLog

Description:

Returns the _writeLog member variable set during Word2vec::Word2vec object initialization of new function.

Input:

None

Output:

$value -> '0' = False, '1' = True

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $writeLog = $interface->W2VGetWriteLog();

print( "Write Logging Enabled\n" ) if $writeLog == 1;
print( "Write Logging Disabled\n" ) if $writeLog == 0;

undef( $interface );

W2VGetFileHandle

Description:

Returns the _fileHandle member variable set during Word2vec::Word2vec object instantiation of new function.

Warning: This is a private function. File handle is used by WriteLog() method. Do not manipulate this file handle as errors can result.

Input:

None

Output:

$fileHandle -> Returns file handle for WriteLog() method or undefined.

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $fileHandle = $interface->W2VGetFileHandle();

undef( $interface );

W2VGetTrainFilePath

Description:

Returns the _trainFilePath member variable set during Word2vec::Word2vec object instantiation of new function.

Input:

None

Output:

$string -> Returns word2vec training text corpus file path.

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $filePath = $interface->W2VGetTrainFilePath();
print( "Training File Path: $filePath\n" );

undef( $interface );

W2VGetOutputFilePath

Description:

Returns the _outputFilePath member variable set during Word2vec::Word2vec object instantiation of new function.

Input:

None

Output:

$string -> Returns post word2vec training output file path.

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $filePath = $interface->W2VGetOutputFilePath();
print( "File Path: $filePath\n" );

undef( $interface );

W2VGetWordVecSize

Description:

Returns the _wordVecSize member variable set during Word2vec::Word2vec object instantiation of new function.

Input:

None

Output:

$value -> Returns (integer) size of word2vec word vectors. Default value = 100

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $value = $interface->W2VGetWordVecSize();
print( "Word Vector Size: $value\n" );

undef( $interface );

W2VGetWindowSize

Description:

Returns the _windowSize member variable set during Word2vec::Word2vec object instantiation of new function.

Input:

None

Output:

$value -> Returns (integer) word2vec window size. Default value = 5

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $value = $interface->W2VGetWindowSize();
print( "Window Size: $value\n" );

undef( $interface );

W2VGetSample

Description:

Returns the _sample member variable set during Word2vec::Word2vec object instantiation of new function.

Input:

None

Output:

$value -> Returns (integer) word2vec sample size. Default value = 0.001

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $value = $interface->W2VGetSample();
print( "Sample: $value\n" );

undef( $interface );

W2VGetHSoftMax

Description:

Returns the _hSoftMax member variable set during Word2vec::Word2vec object instantiation of new function.

Input:

None

Output:

$value -> Returns (integer) word2vec HSoftMax value. Default = 0

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $value = $interface->W2VGetHSoftMax();
print( "HSoftMax: $value\n" );

undef( $interface );

W2VGetNegative

Description:

Returns the _negative member variable set during Word2vec::Word2vec object instantiation of new function.

Input:

None

Output:

$value -> Returns (integer) word2vec negative value. Default = 5

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $value = $interface->W2VGetNegative();
print( "Negative: $value\n" );

undef( $interface );

W2VGetNumOfThreads

Description:

Returns the _numOfThreads member variable set during Word2vec::Word2vec object instantiation of new function.

Input:

None

Output:

$value -> Returns (integer) word2vec number of threads to use during training. Default = 12

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $value = $interface->W2VGetNumOfThreads();
print( "Number of threads: $value\n" );

undef( $interface );

W2VGetNumOfIterations

Description:

Returns the _iterations member variable set during Word2vec::Word2vec object instantiation of new function.

Input:

None

Output:

$value -> Returns (integer) word2vec number of word2vec iterations. Default = 5

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $value = $interface->W2VGetNumOfIterations();
print( "Number of iterations: $value\n" );

undef( $interface );

W2VGetMinCount

Description:

Returns the _minCount member variable set during Word2vec::Word2vec object instantiation of new function.

Input:

None

Output:

$value -> Returns (integer) word2vec min-count value. Default = 5

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $value = $interface->W2VGetMinCount();
print( "Min Count: $value\n" );

undef( $interface );

W2VGetAlpha

Description:

Returns the _alpha member variable set during Word2vec::Word2vec object instantiation of new function.

Input:

None

Output:

$value -> Returns (integer) word2vec alpha value. Default = 0.05 for CBOW and 0.025 for Skip-Gram.

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $value = $interface->W2VGetAlpha();
print( "Alpha: $value\n" );

undef( $interface );

W2VGetClasses

Description:

Returns the _classes member variable set during Word2vec::Word2vec object instantiation of new function.

Input:

None

Output:

$value -> Returns (integer) word2vec classes value. Default = 0

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $value = $interface->W2VGetClasses();
print( "Classes: $value\n" );

undef( $interface );

W2VGetDebugTraining

Description:

Returns the _debug member variable set during Word2vec::Word2vec object instantiation of new function.

Note: 0 = No debug output, 1 = Enable debug output, 2 = Even more debug output

Input:

None

Output:

$value -> Returns (integer) word2vec debug value. Default = 2

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $value = $interface->W2VGetDebugTraining();
print( "Debug: $value\n" );

undef( $interface );

W2VGetBinaryOutput

Description:

Returns the _binaryOutput member variable set during Word2vec::Word2vec object instantiation of new function.

Note: 1 = Save trained vector data in binary format, 2 = Save trained vector data in plain text format.

Input:

None

Output:

$value -> Returns (integer) word2vec binary flag. Default = 0

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $value = $interface->W2VGetBinaryOutput();
print( "Binary Output: $value\n" );

undef( $interface );

W2VGetReadVocabFilePath

Description:

Returns the _readVocab member variable set during Word2vec::Word2vec object instantiation of new function.

Input:

None

Output:

$string -> Returns (string) word2vec read vocabulary file name or empty string if not set.

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $str = $interface->W2VGetReadVocabFilePath();
print( "Read Vocab File Path: $str\n" );

undef( $interface );

W2VGetSaveVocabFilePath

Description:

Returns the _saveVocab member variable set during Word2vec::Word2vec object instantiation of new function.

Input:

None

Output:

$string -> Returns (string) word2vec save vocabulary file name or empty string if not set.

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $str = $interface->W2VGetSaveVocabFilePath();
print( "Save Vocab File Path: $str\n" );

undef( $interface );

W2VGetUseCBOW

Description:

Returns the _useCBOW member variable set during Word2vec::Word2vec object instantiation of new function.

Note: 0 = Skip-Gram Model, 1 = Continuous Bag Of Words Model.

Input:

None

Output:

$value -> Returns (integer) word2vec Continuous-Bag-Of-Words flag. Default = 1

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $value = $interface->W2VGetUseCBOW();
print( "Use CBOW?: $value\n" );

undef( $interface );

W2VGetWorkingDir

Description:

Returns the _workingDir member variable set during Word2vec::Word2vec object instantiation of new function.

Input:

None

Output:

$value -> Returns (string) working directory path or current directory if not specified.

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $str = $interface->W2VGetWorkingDir();
print( "Working Directory: $str\n" );

undef( $interface );

W2VGetWord2VecExeDir

Description:

Returns the _word2VecExeDir member variable set during Word2vec::Word2vec object instantiation of new function.

Input:

None

Output:

$value -> Returns (string) word2vec executable directory path or empty string if not specified.

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $str = $interface->W2VGetWord2VecExeDir();
print( "Word2Vec Executable File Directory: $str\n" );

undef( $interface );

W2VGetVocabularyHash

Description:

Returns the _hashRefOfWordVectors member variable set during Word2vec::Word2vec object instantiation of new function.

Input:

None

Output:

$value -> Returns hash reference of vocabulary/dictionary words. (Word2vec trained data in memory)

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my @vocabulary = $interface->W2VGetVocabularyHash();

undef( $interface );

W2VGetOverwriteOldFile

Description:

Returns the _overwriteOldFile member variable set during Word2vec::Word2vec object instantiation of new function.

Input:

None

Output:

$value -> Returns 1 = True or 0 = False.

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $value = $interface->W2VGetOverwriteOldFile();
print( "Overwrite Exiting File?: $value\n" );

undef( $interface );

Word2Vec Mutator Functions

W2VSetTrainFilePath

Description:

Sets member variable to string parameter. Sets training file path.

Input:

$string -> Text corpus training file path

Output:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->W2VSetTrainFilePath( "samples/textcorpus.txt" );

undef( $interface );

W2VSetOutputFilePath

Description:

Sets member variable to string parameter. Sets output file path.

Input:

$string -> Post word2vec training save file path

Output:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->W2VSetOutputFilePath( "samples/tempvectors.bin" );

undef( $interface );

W2VSetWordVecSize

Description:

Sets member variable to integer parameter. Sets word2vec word vector size.

Input:

$value -> Word2vec word vector size

Output:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->W2VSetWordVecSize( 100 );

undef( $interface );

W2VSetWindowSize

Description:

Sets member variable to integer parameter. Sets word2vec window size.

Input:

$value -> Word2vec window size

Output:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->W2VSetWindowSize( 8 );

undef( $interface );

W2VSetSample

Description:

Sets member variable to integer parameter. Sets word2vec sample size.

Input:

$value -> Word2vec sample size

Output:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->W2VSetSample( 3 );

undef( $interface );

W2VSetHSoftMax

Description:

Sets member variable to integer parameter. Sets word2vec HSoftMax value.

Input:

$value -> Word2vec HSoftMax size

Output:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->W2VSetHSoftMax( 12 );

undef( $interface );

W2VSetNegative

Description:

Sets member variable to integer parameter. Sets word2vec negative value.

Input:

$value -> Word2vec negative value

Output:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->W2VSetNegative( 12 );

undef( $interface );

W2VSetNumOfThreads

Description:

Sets member variable to integer parameter. Sets word2vec number of training threads to specified value.

Input:

$value -> Word2vec number of threads value

Output:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->W2VSetNumOfThreads( 12 );

undef( $interface );

W2VSetNumOfIterations

Description:

Sets member variable to integer parameter. Sets word2vec iterations value.

Input:

$value -> Word2vec number of iterations value

Output:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->W2VSetNumOfIterations( 12 );

undef( $interface );

W2VSetMinCount

Description:

Sets member variable to integer parameter. Sets word2vec min-count value.

Input:

$value -> Word2vec min-count value

Output:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->W2VSetMinCount( 7 );

undef( $interface );

W2VSetAlpha

Description:

Sets member variable to float parameter. Sets word2vec alpha value.

Input:

$value -> Word2vec alpha value. (Float)

Output:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->SetAlpha( 0.0012 );

undef( $interface );

W2VSetClasses

Description:

Sets member variable to integer parameter. Sets word2vec classes value.

Input:

$value -> Word2vec classes value.

Output:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->W2VSetClasses( 0 );

undef( $interface );

W2VSetDebugTraining

Description:

Sets member variable to integer parameter. Sets word2vec debug parameter value.

Input:

$value -> Word2vec debug training value.

Output:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->W2VSetDebugTraining( 0 );

undef( $interface );

W2VSetBinaryOutput

Description:

Sets member variable to integer parameter. Sets word2vec binary parameter value.

Input:

$value -> Word2vec binary output mode value. ( '1' = Binary Output / '0' = Plain Text )

Output:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->W2VSetBinaryOutput( 1 );

undef( $interface );

W2VSetSaveVocabFilePath

Description:

Sets member variable to string parameter. Sets word2vec save vocabulary file name.

Input:

$string -> Word2vec save vocabulary file name and path.

Output:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->W2VSetSaveVocabFilePath( "samples/vocab.txt" );

undef( $interface );

W2VSetReadVocabFilePath

Description:

Sets member variable to string parameter. Sets word2vec read vocabulary file name.

Input:

$string -> Word2vec read vocabulary file name and path.

Output:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->W2VSetReadVocabFilePath( "samples/vocab.txt" );

undef( $interface );

W2VSetUseCBOW

Description:

Sets member variable to integer parameter. Sets word2vec CBOW parameter value.

Input:

$value -> Word2vec CBOW mode value.

Output:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->W2VSetUseCBOW( 1 );

undef( $interface );

W2VSetWorkingDir

Description:

Sets member variable to string parameter. Sets working directory.

Input:

$string -> Working directory

Output:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->W2VSetWorkingDir( "/samples" );

undef( $interface );

W2VSetWord2VecExeDir

Description:

Sets member variable to string parameter. Sets word2vec executable file directory.

Input:

$string -> Word2vec directory

Output:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->W2VSetWord2VecExeDir( "/word2vec" );

undef( $interface );

W2VSetVocabularyHash

Description:

Sets vocabulary/dictionary hash reference to hash reference parameter.

Warning: This will overwrite any existing vocabulary/dictionary data in memory.

Input:

$hashReference -> Vocabulary/Dictionary hash reference of word2vec word vectors.

Output:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->W2VReadTrainedVectorDataFromFile( "samples/samplevectors.bin" );
my $vocabularyHasReference = $interface->W2VGetVocabularyHash();
$interface->W2VSetVocabularyHash( $vocabularyHasReference );

undef( $interface );

W2VClearVocabularyHash

Description:

Clears vocabulary/dictionary hash.

Input:

None

Output:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->W2VClearVocabularyHash();

undef( $interface );

W2VAddWordVectorToVocabHash

Description:

Adds word vector string to vocabulary/dictionary.

Input:

$string -> Word2vec word vector string

Output:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();

# Note: This is representational data of word2vec's word vector format and not actual data.
$interface->W2VAddWordVectorToVocabHash( "of 0.4346 -0.1235 0.5789 0.2347 -0.0056 -0.0001" );

undef( $interface );

W2VSetOverwriteOldFile

Description:

Sets member variable to integer parameter. Enables overwriting output file if one already exists.

Input:

$value -> '1' = Overwrite exiting file / '0' = Graceful termination when file with same name exists

Output:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->W2VSetOverwriteOldFile( 1 );

undef( $interface );

Word2Phrase Main Functions

W2PExecuteTraining

Description:

Executes word2phrase training based on parameters. Parameter variables have higher precedence than member variables.
Any parameter specified will override its respective member variable.

Note: If no parameters are specified, this module executes word2phrase training based on preset member
variables. Returns string regarding training status.

Input:

$trainFilePath  -> Training text corpus file path
$outputFilePath -> Vector binary file path
$minCount       -> Minimum bi-gram frequency (Positive Integer)
$threshold      -> Maximum bi-gram frequency (Positive Integer)
$debug          -> Displays word2phrase debug information during training. (0 = None, 1 = Show Debug Information, 2 = Show Even More Debug Information)
$overwrite      -> Overwrites old training file when executing training. (0 = False / 1 = True)

Output:

$value          -> '0' = Successful / '-1' = Un-successful

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->W2PSetMinCount( 12 );
$interface->W2PSetMaxCount( 20 );
$interface->W2PSetTrainFilePath( "textCorpus.txt" );
$interface->W2PSetOutputFilePath( "phraseTextCorpus.txt" );
$interface->W2PExecuteTraining();
undef( $interface );

# Or

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->W2PExecuteTraining( "textCorpus.txt", "phraseTextCorpus.txt", 12, 20, 2, 1 );
undef( $interface );

W2PExecuteStringTraining

Description:

Executes word2phrase training based on parameters. Parameter variables have higher precedence than member variables.
Any parameter specified will override its respective member variable.

Note: If no parameters are specified, this module executes word2phrase training based on preset member
variables. Returns string regarding training status.

Input:

$trainingString -> String to train
$outputFilePath -> Vector binary file path
$minCount       -> Minimum bi-gram frequency (Positive Integer)
$threshold      -> Maximum bi-gram frequency (Positive Integer)
$debug          -> Displays word2phrase debug information during training. (0 = None, 1 = Show Debug Information, 2 = Show Even More Debug Information)
$overwrite      -> Overwrites old training file when executing training. (0 = False / 1 = True)

Output:

$value          -> '0' = Successful / '-1' = Un-successful

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->W2PSetMinCount( 12 );
$interface->W2PSetMaxCount( 20 );
$interface->W2PSetTrainFilePath( "large string to train here" );
$interface->W2PSetOutputFilePath( "phraseTextCorpus.txt" );
$interface->W2PExecuteTraining();
undef( $interface );

# Or

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->W2PExecuteTraining( "large string to train here", "phraseTextCorpus.txt", 12, 20, 2, 1 );
undef( $interface );

Word2Phrase Accessor Functions

W2PGetDebugLog

Description:

Returns the _debugLog member variable set during Word2vec::Interface object initialization of new function.

Input:

None

Output:

$value -> 0 = False, 1 = True

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $debugLog = $interface->W2PGetDebugLog();

print( "Debug Logging Enabled\n" ) if $debugLog == 1;
print( "Debug Logging Disabled\n" ) if $debugLog == 0;

undef( $interface );

W2PGetWriteLog

Description:

Returns the _writeLog member variable set during Word2vec::Interface object initialization of new function.

Input:

None

Output:

$value -> 0 = False, 1 = True

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $writeLog = $interface->W2PGetWriteLog();

print( "Write Logging Enabled\n" ) if $writeLog == 1;
print( "Write Logging Disabled\n" ) if $writeLog == 0;

undef( $interface );

W2PGetFileHandle

Description:

Returns file handle used by word2phrase::WriteLog() method.

Input:

None

Output:

$fileHandle -> Returns file handle blob used by 'WriteLog()' function or undefined.

Example:

<This should not be called.>

W2PGetTrainFilePath

Description:

Returns (string) training file path.

Input:

None

Output:

$string -> word2phrase training file path

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $filePath = $interface->W2PGetTrainFilePath();

print( "Output File Path: $filePath\n" ) if defined( $filePath );
undef( $interface );

W2PGetOutputFilePath

Description:

Returns (string) output file path.

Input:

None

Output:

$string -> word2phrase output file path

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $filePath = $interface->W2PGetOutputFilePath();

print( "Output File Path: $filePath\n" ) if defined( $filePath );
undef( $interface );

W2PGetMinCount

Description:

Returns (integer) minimum bi-gram range.

Input:

None

Output:

$value ->  Minimum bi-gram frequency (Positive Integer)

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $mincount = $interface->W2PGetMinCount();

print( "MinCount: $mincount\n" ) if defined( $mincount );
undef( $interface );

W2PGetThreshold

Description:

Returns (integer) maximum bi-gram range.

Input:

None

Output:

$value ->  Maximum bi-gram frequency (Positive Integer)

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $mincount = $interface->W2PGetThreshold();

print( "MinCount: $mincount\n" ) if defined( $mincount );
undef( $interface );

W2PGetW2PDebug

Description:

Returns word2phrase debug parameter value.

Input:

None

Output:

$value -> 0 = No debugging, 1 = Show debugging, 2 = Show even more debugging

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $interfacedebug = $interface->W2PGetW2PDebug();

print( "Word2Phrase Debug Level: $interfacedebug\n" ) if defined( $interfacedebug );

undef( $interface );

W2PGetWorkingDir

Description:

Returns (string) working directory path.

Input:

None

Output:

$string -> Current working directory path

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $workingDir = $interface->W2PGetWorkingDir();

print( "Working Directory: $workingDir\n" ) if defined( $workingDir );

undef( $interface );

W2PGetWord2PhraseExeDir

Description:

Returns (string) word2phrase executable directory path.

Input:

None

Output:

$string -> Word2Phrase executable directory path

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $workingDir = $interface->W2PGetWord2PhraseExeDir();

print( "Word2Phrase Executable Directory: $workingDir\n" ) if defined( $workingDir );

undef( $interface );

W2PGetOverwriteOldFile

Description:

Returns the current value of the overwrite training file variable.

Input:

None

Output:

$value -> 1 = True/Overwrite or 0 = False/Append to current file

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $overwrite = $interface->W2PGetOverwriteOldFile();

if defined( $overwrite )
{
   print( "Overwrite Old File: " );
   print( "Yes\n" ) if $overwrite == 1;
   print( "No\n" ) if $overwrite == 0;
}

undef( $interface );

Word2Phrase Mutator Functions

W2PSetTrainFilePath

Description:

Sets training file path.

Input:

$string -> Training file path

Output:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->W2PSetTrainFilePath( "filePath" );

undef( $interface );

W2PSetOutputFilePath

Description:

Sets word2phrase output file path.

Input:

$string -> word2phrase output file path

Output:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->W2PSetOutputFilePath( "filePath" );

undef( $interface );

W2PSetMinCount

Description:

Sets minimum range value.

Input:

$value -> Minimum frequency value (Positive integer)

Output:

None

Example:

use Word2vec::Interface:

my $interface = Word2vec::Interface->new();
$interface->W2PSetMinCount( 1 );

undef( $interface );

W2PSetThreshold

Description:

Sets maximum range value.

Input:

$value -> Maximum frequency value (Positive integer)

Output:

None

Example:

use Word2vec::Interface:

my $interface = Word2vec::Interface->new();
$interface->W2PSetThreshold( 100 );

undef( $interface );

W2PSetW2PDebug

Description:

Sets word2phrase debug parameter.

Input:

$value -> word2phrase debug parameter (0 = No debug info, 1 = Show debug info, 2 = Show more debug info.)

Output:

None

Example:

use Word2vec::Interface:

my $interface = Word2vec::Interface->new();
$interface->W2PSetW2PDebug( 2 );

undef( $interface );

W2PSetWorkingDir

Description:

Sets working directory path.

Input:

$string -> Current working directory path.

Output:

None

Example:

use Word2vec::Interface:

my $interface = Word2vec::Interface->new();
$interface->W2PSetWorkingDir( "filePath" );

undef( $interface );

W2PSetWord2PhraseExeDir

Description:

Sets word2phrase executable file directory path.

Input:

$string -> Word2Phrase executable directory path.

Output:

None

Example:

use Word2vec::Interface:

my $interface = Word2vec::Interface->new();
$interface->W2PSetWord2PhraseExeDir( "filePath" );

undef( $interface );

W2PSetOverwriteOldFile

Description:

Enables overwriting word2phrase output file if one already exists with the same output file name.

Input:

$value -> Integer: 1 = Overwrite old file, 0 = No not overwrite old file.

Output:

None

Example:

use Word2vec::Interface:

my $interface = Word2vec::Interface->new();
$interface->W2PSetOverwriteOldFile( 1 );

undef( $interface );

XMLToW2V Main Functions

XTWConvertMedlineXMLToW2V

Description:

Parses specified parameter Medline XML file or directory of files, creating a text corpus. Returns 0 if successful or -1 during an error.

Note: Supports plain Medline XML or gun-zipped XML files.

Input:

$filePath -> XML file path to parse. (This can be a single file or directory of XML/XML.gz files).

Output:

$value    -> '0' = Successful / '-1' = Un-Successful

Example:

use Word2vec::Interface;

$interface = Word2vec::Interface->new();      # Note: Specifying no parameters implies default settings
$interface->XTWSetSavePath( "testCorpus.txt" );
$interface->XTWSetStoreTitle( 1 );
$interface->XTWSetStoreAbstract( 1 );
$interface->XTWSetBeginDate( "01/01/2004" );
$interface->XTWSetEndDate( "08/13/2016" );
$interface->XTWSetOverwriteExistingFile( 1 );
$interface->XTWConvertMedlineXMLToW2V( "/xmlDirectory/" );
undef( $interface );

XTWCreateCompoundWordBST

Description:

Creates a binary search tree using compound word data in memory and stores root node. This also clears the compound word array afterwards.

Warning: Compound word file must be loaded into memory using XTWReadCompoundWordDataFromFile() prior to calling this method. This function
         will also delete the compound word array upon completion as it will no longer be necessary.

Input:

None

Output:

$value -> '0' = Successful / '-1' = Un-successful

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->XTWReadCompoundWordDataFromFile( "samples/compoundword.txt" );
$interface->CreateCompoundWordBST();

XTWCompoundifyString

Description:

Compoundifies string parameter based on compound word data in memory using the compound word binary search tree.

Warning: Compound word file must be loaded into memory using XTWReadCompoundWordDataFromFile() prior to calling this method.

Input:

$string -> String to compoundify

Output:

$string -> Compounded string or "(null)" if string parameter is not defined.

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->XTWReadCompoundWordDataFromFile( "samples/compoundword.txt" );
$interface->CreateCompoundWordBST();
my $compoundedString = $interface->CompoundifyString( "String to compoundify" );
print( "Compounded String: $compoundedString\n" );

undef( $interface );

XTWReadCompoundWordDataFromFile

Description:

Reads compound word file and stores in memory. $autoSetMaxCompWordLength parameter is not required to be set. This
parameter instructs the method to auto set the maximum compound word length dependent on the longest compound word found.

Note: $autoSetMaxCompWordLength options: defined = True and Undefined = False.

Input:

$filePath                 -> Compound word file path
$autoSetMaxCompWordLength -> Maximum length of a given compoundified phrase the module's compoundify algorithm will permit.

Note: Calling this method with $autoSetMaxCompWordLength defined will automatically set the maxCompoundWordLength variable to the longest compound phrase.

Output:

$value                    -> '0' = Successful / '-1' = Un-successful

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->XTWReadCompoundWordDataFromFile( "samples/compoundword.txt", 1 );

undef( $interface );

XTWSaveCompoundWordListToFile

Description:

Saves compound word data in memory to a specified file location.

Input:

$savePath -> Path to save compound word list to file.

Output:

$value    -> '0' = Successful / '-1' = Un-successful

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->XTWReadCompoundWordDataFromFile( "samples/compoundword.txt" );
$interface->XTWSaveCompoundWordDataFromFile( "samples/newcompoundword.txt" );
undef( $interface );

XTWReadTextFromFile

Description:

Reads a plain text file with utf8 encoding in memory. Returns string data if successful and "(null)" if unsuccessful.

Input:

$filePath -> Text file to read into memory

Output:

$string   -> String data if successful or "(null)" if un-successful.

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $textData = $interface->XTWReadTextFromFile( "samples/textcorpus.txt" );
print( "Text Data: $textData\n" );
undef( $interface );

XTWSaveTextToFile

Description:

Saves a plain text file with utf8 encoding in a specified location.

Input:

$savePath -> Path to save string data.
$string   -> String to save

Output:

$value    -> '0' = Successful / '-1' = Un-successful

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $result = $interface->XTWSaveTextToFile( "text.txt", "Hello world!" );

print( "File saved\n" ) if $result == 0;
print( "File unable to save\n" ) if $result == -1;

undef( $interface );

XTWReadXMLDataFromFile

Description:

Reads an XML file from a specified location. Returns string in memory if successful and "(null)" if unsuccessful.

Input:

$filePath -> File to read given path

Output:

$value    -> '0' = Successful / '-1' = Un-successful

Example:

Warning: This is a private function and is called by XML::Twig parsing functions. It should not be called outside of xmltow2v module.

XTWSaveTextCorpusToFile

Description:

Saves text corpus data to specified file path. This method will append to any existing file if $appendToFile parameter
is defined or "overwrite" option is disabled. Enabling "overwrite" option will overwrite any existing files.

Input:

$savePath     -> Path to save the text corpus
$appendToFile -> Specifies whether the module will overwrite any existing data or append to existing text corpus data.

Note: Leaving this variable undefined will fetch the "Overwrite" member variable and set the value to this parameter.

Output:

$value        -> '0' = Successful / '-1' = Un-successful

Example:

Warning: This is a private function and is called by XML::Twig parsing functions. It should not be called outside of xmltow2v module.

XTWIsDateInSpecifiedRange

Description:

Checks to see if $date is within $beginDate and $endDate range. Returns 1 if true and 0 if false.

Note: Date Format: XX/XX/XXXX (Month/Day/Year)

Input:

$date      -> Date to check against minimum and maximum data range. (String)
$beginDate -> Minimum date range (String)
$endDate   -> Maximum date range (String)

Output:

$value     -> '1' = True/Date is within specified range Or '0' = False/Date is not within specified range.

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
print( "Is \"01/01/2004\" within the date range: \"02/21/1985\" to \"08/13/2016\"?\n" );
print( "Yes\n" ) if $interface->XTWIsDateInSpecifiedRange( "01/01/2004", "02/21/1985", "08/13/2016" ) == 1;
print( "No\n" ) if $interface->XTWIsDateInSpecifiedRange( "01/01/2004", "02/21/1985", "08/13/2016" ) == 0;

undef( $interface );

XTWIsFileOrDirectory

Description:

Checks to see if specified path is a file or directory.

Input:

$path   -> File or directory path. (String)

Output:

$string -> Returns: "file" = file, "dir" = directory and "unknown" if the path is not a file or directory (undefined).

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $path = "path/to/a/directory";

print( "Is \"$path\" a file or directory? " . $interface->XTWIsFileOrDirectory( $path ) . "\n" );

$path = "path/to/a/file.file";

print( "Is \"$path\" a file or directory? " . $interface->XTWIsFileOrDirectory( $path ) . "\n" );

undef( $interface );

XTWRemoveSpecialCharactersFromString

Description:

Removes special characters from string parameter, removes extra spaces and converts text to lowercase.

Note: This method is called when parsing and compiling Medline title/abstract data.

Input:

$string -> String passed to remove special characters from and convert to lowercase.

Output:

$string -> String with all special characters removed and converted to lowercase.

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();

my $str = "Heart Attack is$ an!@ also KNOWN as an Acute MYOCARDIAL inFARCTion!";

print( "Original String: $str\n" );

$str = $interface->XTWRemoveSpecialCharactersFromString( $str );

print( "Modified String: $str\n" );

undef( $interface );

XTWGetFileType

Description:

Returns file data type (string).

Input:

$filePath -> File to check located at file path

Output:

$string   -> File type

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new()
my $fileType = $interface->XTWGetFileType( "samples/textcorpus.txt" );

undef( $interface );

XTWDateCheck

Description:

Checks specified begin and end date strings for formatting and logic errors.

Input:

None

Output:

$value   -> "0" = Passed Checks / "-1" = Failed Checks

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new()
print "Passed Date Checks\n" if ( $interface->_DateCheck() == 0 );
print "Failed Date Checks\n" if ( $interface->_DateCheck() == -1 );

undef( $interface );

XMLToW2V Accessor Functions

XTWGetDebugLog

Description:

Returns the _debugLog member variable set during Word2vec::Interface object initialization of new function.

Input:

None

Output:

$value -> '0' = False, '1' = True

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new()
my $debugLog = $interface->XTWGetDebugLog();

print( "Debug Logging Enabled\n" ) if $debugLog == 1;
print( "Debug Logging Disabled\n" ) if $debugLog == 0;


undef( $interface );

XTWGetWriteLog

Description:

Returns the _writeLog member variable set during Word2vec::Interface object initialization of new function.

Input:

None

Output:

$value -> '0' = False, '1' = True

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $writeLog = $interface->XTWGetWriteLog();

print( "Write Logging Enabled\n" ) if $writeLog == 1;
print( "Write Logging Disabled\n" ) if $writeLog == 0;

undef( $interface );

XTWGetStoreTitle

Description:

Returns the _storeTitle member variable set during Word2vec::Interface object instantiation of new function.

Input:

None

Output:

$value -> '1' = True / '0' = False

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $storeTitle = $interface->XTWGetStoreTitle();

print( "Store Title Option: Enabled\n" ) if $storeTitle == 1;
print( "Store Title Option: Disabled\n" ) if $storeTitle == 0;

undef( $interface );

XTWGetStoreAbstract

Description:

Returns the _storeAbstract member variable set during Word2vec::Interface object instantiation of new function.

Input:

None

Output:

$value -> '1' = True / '0' = False

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $storeAbstract = $interface->XTWGetStoreAbstract();

print( "Store Abstract Option: Enabled\n" ) if $storeAbsract == 1;
print( "Store Abstract Option: Disabled\n" ) if $storeAbstract == 0;

undef( $interface );

XTWGetQuickParse

Description:

Returns the _quickParse member variable set during Word2vec::Interface object instantiation of new function.

Input:

None

Output:

$value -> '1' = True / '0' = False

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $quickParse = $interface->XTWGetQuickParse();

print( "Quick Parse Option: Enabled\n" ) if $quickParse == 1;
print( "Quick Parse Option: Disabled\n" ) if $quickParse == 0;

undef( $interface );

XTWGetCompoundifyText

Description:

Returns the _compoundifyText member variable set during Word2vec::Interface object instantiation of new function.

Input:

None

Output:

$value -> '1' = True / '0' = False

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $compoundify = $interface->XTWGetCompoundifyText();

print( "Compoundify Text Option: Enabled\n" ) if $compoundify == 1;
print( "Compoundify Text Option: Disabled\n" ) if $compoundify == 0;

undef( $interface );

XTWGetStoreAsSentencePerLine

Description:

Returns the _storeAsSentencePerLine member variable set during Word2vec::Xmltow2v object instantiation of new function.

Input:

None

Output:

$value -> '1' = True / '0' = False

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $storeAsSentencePerLine = $interface->GetStoreAsSentencePerLine();

print( "Store As Sentence Per Line: Enabled\n" )  if $storeAsSentencePerLine == 1;
print( "Store As Sentence Per Line: Disabled\n" ) if $storeAsSentencePerLine == 0;

undef( $interface );

XTWGetNumOfThreads

Description:

Returns the _numOfThreads member variable set during Word2vec::Interface object instantiation of new function.

Input:

None

Output:

$value -> Number of threads

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $numOfThreads = $interface->XTWGetNumOfThreads();

print( "Number of threads: $numOfThreads\n" );

undef( $interface );

XTWGetWorkingDir

Description:

Returns the _workingDir member variable set during Word2vec::Interface object instantiation of new function.

Input:

None

Output:

$string -> Working directory string

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $workingDirectory = $interface->XTWGetWorkingDir();

print( "Working Directory: $workingDirectory\n" );

undef( $interface );

XTWGetSavePath

Description:

Returns the _saveDir member variable set during Word2vec::Interface object instantiation of new function.

Input:

None

Output:

$string -> Save directory string

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $savePath = $interface->XTWGetSavePath();

print( "Save Directory: $savePath\n" );

undef( $interface );

XTWGetBeginDate

Description:

Returns the _beginDate member variable set during Word2vec::Interface object instantiation of new function.

Input:

None

Output:

$date -> Beginning date range - Format: XX/XX/XXXX (Mon/Day/Year)

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $date = $interface->XTWGetBeginDate();

print( "Date: $date\n" );

undef( $interface );

XTWGetEndDate

Description:

Returns the _endDate member variable set during Word2vec::Interface object instantiation of new function.

Input:

None

Output:

$date -> End date range - Format: XX/XX/XXXX (Mon/Day/Year).

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $date = $interface->XTWGetEndDate();

print( "Date: $date\n" );

undef( $interface );

XTWGetXMLStringToParse

Returns the XML data (string) to be parsed.

Description:

Returns the _xmlStringToParse member variable set during Word2vec::Interface object instantiation of new function.

Input:

None

Output:

$string -> Medline XML data string

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $xmlStr = $interface->XTWGetXMLStringToParse();

print( "XML String: $xmlStr\n" );

undef( $interface );

XTWGetTextCorpusStr

Description:

Returns the _textCorpusStr member variable set during Word2vec::Interface object instantiation of new function.

Input:

None

Output:

$string -> Text corpus string

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $str = $interface->XTWGetTextCorpusStr();

print( "Text Corpus: $str\n" );

undef( $interface );

XTWGetFileHandle

Description:

Returns the _fileHandle member variable set during Word2vec::Interface object instantiation of new function.

Warning: This is a private function. File handle is used by 'xmltow2v::WriteLog()' method. Do not manipulate this file handle as errors can result.

Input:

None

Output:

$fileHandle -> Returns file handle for WriteLog() method.

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $fileHandle = $interface->XTWGetFileHandle();

undef( $interface );

XTWGetTwigHandler

Returns XML::Twig handler.

Description:

Returns the _twigHandler member variable set during Word2vec::Interface object instantiation of new function.

Warning: This is a private function and should not be called or manipulated.

Input:

None

Output:

$twigHandler -> XML::Twig handler.

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $xmlHandler = $interface->XTWGetTwigHandler();

undef( $interface );

XTWGetParsedCount

Description:

Returns the _parsedCount member variable set during Word2vec::Interface object instantiation of new function.

Input:

None

Output:

$value -> Number of parsed Medline articles.

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $numOfParsed = $interface->XTWGetParsedCount();

print( "Number of parsed Medline articles: $numOfParsed\n" );

undef( $interface );

XTWGetTempStr

Description:

Returns the _tempStr member variable set during Word2vec::Interface object instantiation of new function.

Warning: This is a private function and should not be called or manipulated. Used by module as a temporary storage
         location for parsed Medline 'Title' and 'Abstract' flag string data.

Input:

None

Output:

$string -> Temporary string storage location.

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $tempStr = $interface->XTWGetTempStr();

print( "Temp String: $tempStr\n" );

undef( $interface );

XTWGetTempDate

Description:

Returns the _tempDate member variable set during Word2vec::Interface object instantiation of new function.
Used by module as a temporary storage location for parsed Medline 'DateCreated' flag string data.

Input:

None

Output:

$date -> Date string - Format: XX/XX/XXXX (Mon/Day/Year).

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $date = $interface->XTWGetTempDate();

print( "Temp Date: $date\n" );

undef( $interface );

XTWGetCompoundWordAry

Description:

Returns the _compoundWordAry member array reference set during Word2vec::Interface object instantiation of new function.

Warning: Compound word data must be loaded in memory first via XTWReadCompoundWordDataFromFile().

Input:

None

Output:

$arrayReference -> Compound word array reference.

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $arrayReference = $interface->XTWGetCompoundWordAry();
my @compoundWord = @{ $arrayReference };

print( "Compound Word Array: @compoundWord\n" );

undef( $interface );

XTWGetCompoundWordBST

Description:

Returns the _compoundWordBST member variable set during Word2vec::Interface object instantiation of new function.

Input:

None

Output:

$bst -> Compound word binary search tree.

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $bst = $interface->XTWGetCompoundWordBST();

undef( $interface );

XTWGetMaxCompoundWordLength

Description:

Returns the _maxCompoundWordLength member variable set during Word2vec::Interface object instantiation of new function.

Note: If not defined, it is automatically set to and returns 20.

Input:

None

Output:

$value -> Maximum number of compound words in a given phrase.

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $compoundWordLength = $interface->XTWGetMaxCompoundWordLength();

print( "Maximum Compound Word Length: $compoundWordLength\n" );

undef( $interface );

XTWGetOverwriteExistingFile

Description:

Returns the _overwriteExisitingFile member variable set during Word2vec::Interface object instantiation of new function.
Enables overwriting of existing text corpus if set to '1' or appends to the existing text corpus if set to '0'.

Input:

None

Output:

$value -> '1' = Overwrite existing file / '0' = Append to exiting file.

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
my $overwriteExitingFile = $interface->XTWGetOverwriteExistingFile();

print( "Overwrite Existing File? YES\n" ) if ( $overwriteExistingFile == 1 );
print( "Overwrite Existing File? NO\n" ) if ( $overwriteExistingFile == 0 );

undef( $interface );

XMLToW2V Mutator Functions

XTWSetStoreTitle

Description:

Sets member variable to passed integer parameter. Instructs module to store article title if true or omit if false.

Input:

$value -> '1' = Store Titles / '0' = Omit Titles

Ouput:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->XTWSetStoreTitle( 1 );

undef( $interface );

XTWSetStoreAbstract

Description:

Sets member variable to passed integer parameter. Instructs module to store article abstracts if true or omit if false.

Input:

$value -> '1' = Store Abstracts / '0' = Omit Abstracts

Ouput:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->XTWSetStoreAbstract( 1 );

undef( $interface );

XTWSetWorkingDir

Description:

Sets member variable to passed string parameter. Represents the working directory.

Input:

$string -> Working directory string

Ouput:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->XTWSetWorkingDir( "/samples/" );

undef( $interface );

XTWSetSavePath

Description:

Sets member variable to passed integer parameter. Represents the text corpus save path.

Input:

$string -> Text corpus save path

Output:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->XTWSetSavePath( "samples/textcorpus.txt" );

undef( $interface );

XTWSetQuickParse

Description:

Sets member variable to passed integer parameter. Instructs module to utilize quick parse
routines to speed up text corpus compilation. This method is somewhat less accurate due to its non-exhaustive nature.

Input:

$value -> '1' = Enable Quick Parse / '0' = Disable Quick Parse

Ouput:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->XTWSetQuickParse( 1 );

undef( $interface );

XTWSetCompoundifyText

Description:

Sets member variable to passed integer parameter. Instructs module to utilize 'compoundify' option if true.

Warning: This requires compound word data to be loaded into memory with XTWReadCompoundWordDataFromFile() method prior
         to executing text corpus compilation.

Input:

$value -> '1' = Compoundify text / '0' = Do not compoundify text

Ouput:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->XTWSetCompoundifyText( 1 );

undef( $interface );

XTWSetStoreAsSentencePerLine

Description:

Sets member variable to passed integer parameter. Instructs module to utilize 'storeAsSentencePerLine' option if true.

Input:

$value -> '1' = Store as sentence per line / '0' = Do not store as sentence per line

Ouput:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->XTWSetStoreAsSentencePerLine( 1 );

undef( $interface );

XTWSetNumOfThreads

Description:

Sets member variable to passed integer parameter. Sets the requested number of threads to parse Medline XML files
and compile the text corpus.

Input:

$value -> Integer (Positive value)

Ouput:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->XTWSetNumOfThreads( 4 );

undef( $interface );

XTWSetBeginDate

Description:

Sets member variable to passed string parameter. Sets beginning date range for earliest articles to store, by
'DateCreated' Medline tag, within the text corpus during compilation.

Note: Expected format - "XX/XX/XXXX" (Mon/Day/Year)

Input:

$string -> Date string - Format: "XX/XX/XXXX"

Ouput:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->XTWSetBeginDate( "01/01/2004" );

undef( $interface );

XTWSetEndDate

Description:

Sets member variable to passed string parameter. Sets ending date range for latest article to store, by
'DateCreated' Medline tag, within the text corpus during compilation.

Note: Expected format - "XX/XX/XXXX" (Mon/Day/Year)

Input:

$string -> Date string - Format: "XX/XX/XXXX"

Ouput:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->XTWSetEndDate( "08/13/2016" );

undef( $interface );

XTWSetXMLStringToParse

Description:

Sets member variable to passed string parameter. This string normally consists of Medline XML data to be
parsed for text corpus compilation.

Warning: This is a private function and should not be called or manipulated.

Input:

$string -> String

Ouput:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->XTWSetXMLStringToParse( "Hello World!" );

undef( $interface );

XTWSetTextCorpusStr

Description:

Sets member variable to passed string parameter. Overwrites any stored text corpus data in memory to the string parameter.

Warning: This is a private function and should not be called or manipulated.

Input:

$string -> String

Ouput:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->XTWSetTextCorpusStr( "Hello World!" );

undef( $interface );

XTWAppendStrToTextCorpus

Description:

Sets member variable to passed string parameter. Appends string parameter to text corpus string in memory.

Warning: This is a private function and should not be called or manipulated.

Input:

$string -> String

Ouput:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->XTWAppendStrToTextCorpus( "Hello World!" );

undef( $interface );

XTWClearTextCorpus

Description:

Clears text corpus data in memory.

Warning: This is a private function and should not be called or manipulated.

Input:

None

Ouput:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->XTWClearTextCorpus();

undef( $interface );

XTWSetTempStr

Description:

Sets member variable to passed string parameter. Sets temporary member string to passed string parameter.
(Temporary placeholder for Medline Title and Abstract data).

Note: This removes special characters and converts all characters to lowercase.

Warning: This is a private function and should not be called or manipulated.

Input:

$string -> String

Ouput:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->XTWSetTempStr( "Hello World!" );

undef( $interface );

XTWAppendToTempStr

Description:

Appends string parameter to temporary member string in memory.

Note: This removes special characters and converts all characters to lowercase.

Warning: This is a private function and should not be called or manipulated.

Input:

$string -> String

Ouput:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->XTWAppendToTempStr( "Hello World!" );

undef( $interface );

XTWClearTempStr

Clears the temporary string storage in memory.

Warning: This is a private function and should not be called or manipulated.

Input:

None

Ouput:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->XTWClearTempStr();

undef( $interface );

XTWSetTempDate

Description:

Sets member variable to passed string parameter. Sets temporary date string to passed string.

Note: Date Format - "XX/XX/XXXX" (Mon/Day/Year)

Warning: This is a private function and should not be called or manipulated.

Input:

$string -> Date string - Format: "XX/XX/XXXX"

Ouput:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->XTWSetTempDate( "08/13/2016" );

undef( $interface );

XTWClearTempDate

Description:

Clears the temporary date storage location in memory.

Warning: This is a private function and should not be called or manipulated.

Input:

None

Ouput:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->XTWClearTempDate();

undef( $interface );

XTWSetCompoundWordAry

Description:

Sets member variable to de-referenced passed array reference parameter. Stores compound word array by
de-referencing array reference parameter.

Note: Clears previous data if existing.

Warning: This is a private function and should not be called or manipulated.

Input:

$arrayReference -> Array reference of compound words

Ouput:

None

Example:

use Word2vec::Interface;

my @compoundWordAry = ( "big dog", "respiratory failure", "seven large masses" );

my $interface = Word2vec::Interface->new();
$interface->XTWSetCompoundWordAry( \@compoundWordAry );

undef( $interface );

XTWClearCompoundWordAry

Description:

Clears compound word array in memory.

Warning: This is a private function and should not be called or manipulated.

Input:

None

Ouput:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->XTWClearCompoundWordAry();

undef( $interface );

XTWSetCompoundWordBST

Description:

Sets member variable to passed Word2vec::Bst parameter. Sets compound word binary search tree to passed binary tree parameter.

Note: Un-defines previous binary tree if existing.

Warning: This is a private function and should not be called or manipulated.

Input:

Word2vec::Bst -> Binary Search Tree

Ouput:

None

Example:

use Word2vec::Interface;

my @compoundWordAry = ( "big dog", "respiratory failure", "seven large masses" );
@compoundWordAry = sort( @compoundWordAry );

my $arySize = @compoundWordAry;

my $bst = Word2vec::Bst;
$bst->CreateTree( \@compoundWordAry, 0, $arySize, undef );

my $interface = Word2vec::Interface->new();
$interface->XTWSetCompoundWordBST( $bst );

undef( $interface );

XTWClearCompoundWordBST

Description:

Clears/Un-defines existing compound word binary search tree from memory.

Warning: This is a private function and should not be called or manipulated.

Input:

None

Ouput:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->XTWClearCompoundWordBST();

undef( $interface );

XTWSetMaxCompoundWordLength

Description:

Sets member variable to passed integer parameter. Sets maximum number of compound words in a phrase for comparison.

ie. "medical campus of Virginia Commonwealth University" can be interpreted as a compound word of 6 words.
Setting this variable to 3 will only attempt compoundifying a maximum amount of three words.
The result would be "medical_campus_of Virginia commonwealth university" even-though an exact representation
of this compounded string can exist. Setting this variable to 6 will result in compounding all six words if
they exists in the compound word array/bst.

Warning: This is a private function and should not be called or manipulated.

Input:

$value -> Integer

Ouput:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->XTWSetMaxCompoundWordLength( 8 );

undef( $interface );

XTWSetOverwriteExistingFile

Description:

Sets member variable to passed integer parameter. Sets option to overwrite existing text corpus during compilation
if 1 or append to existing text corpus if 0.

Input:

$value -> '1' = Overwrite existing text corpus / '0' = Append to existing text corpus during compilation.

Output:

None

Example:

use Word2vec::Interface;

my $interface = Word2vec::Interface->new();
$interface->XTWSetOverWriteExistingFile( 1 );

undef( $xmltow2v );

Author

Clint Cuffy, Virginia Commonwealth University

COPYRIGHT

Copyright (c) 2016

Bridget T McInnes, Virginia Commonwealth University
btmcinnes at vcu dot edu

Clint Cuffy, Virginia Commonwealth University
cuffyca at vcu dot edu

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to:

The Free Software Foundation, Inc.,
59 Temple Place - Suite 330,
Boston, MA  02111-1307, USA.