NAME

Word2vec::Util - Word2vec-Interface Utility Module.

SYNOPSIS

use Word2vec::Util;

my $util = Word2vec::Util->new();

my $result = $util->IsFileOrDirectory( "../samples/stoplist" );

print( "Path Type Is A File\n" ) if $result eq "file";
print( "Path Type Is A Directory\n" ) if $result eq "dir";
print( "Path Type Is Unknown\n" ) if $result eq "unknown";

undef( $util );

DESCRIPTION

Word2vec::Util is a module of utility functions for the Word2vec::Interface package.

Main Functions

new

Description:

Returns a new "Word2vec::Util" module object.

Note: Specifying no parameters implies default options.

Default Parameters:
   debugLog                    = 0
   writeLog                    = 0

Input:

$debugLog                    -> Instructs module to print debug statements to the console. (1 = True / 0 = False)
$writeLog                    -> Instructs module to print debug statements to a log file. (1 = True / 0 = False)

Output:

Word2vec::Util object.

Example:

use Word2vec::Util;

my $util = Word2vec::Util->new();

undef( $util );

DESTROY

Description:

Removes Word2vec::Util object from memory.

Input:

None

Output:

None

Example:

See above example for "new" function.

Note: Destroy function is also automatically called during global destruction when exiting the program.

CleanText

Description:

Normalizes text based on the following.
 - Text converted to lowercase
 - More than one white space is replaced with a single white space
 - Apostrophe "s" ('s) characters are removed
 - Hyphen character is replaced with a single white space
 - All special characters removed outside of lowercase 'a-z' and compoundified terms retained, joined by '_' (underscore).
 - Line-feed/carriage return (LF-CR) endings are cleaned and converted to OS specific LF-CR endings

Input:

$string -> String of text to normalize

Output:

$string -> Cleaned/Normalized text.

Example:

use Word2vec::Util;

my $util = Word2vec::Util->new();
my $text = "123485clean-text!!@&^#*@";

print( "Original Text: \"$text\"\n" );

$text = $util->CleanText( $text );

print( "Cleaned Text: \"$text\"\n" );

undef( $util );

RemoveNewLineEndingsFromString

Description:

Removes new line endings from string. Supports MSWin32, linux and MacOS line endings.

Input:

$string -> String with line-feed/carriage return ending to remove.

Output:

$string -> String without line-feed/carriage return ending.

Example:

use Word2vec::Util;

my $util = Word2vec::Util->new();
my $text = "this is sample text\n";

print( "Original Text: \"$text\"\n" );

$text = $util->RemoveNewLineEndingsFromString( $text );

print( "Cleaned Text: \"$text\"\n" );

undef( $util );

IsFileOrDirectory

Description:

Given a path, returns a string specifying whether this path represents a file or directory.

Input:

$path   -> String representing path to check

Output:

$string -> Returns "file", "dir" or "unknown".

Example:

use Word2vec::Util;

my $util = Word2vec::Util->new();

my $result = $util->IsFileOrDirectory( "../samples/stoplist" );

print( "Path Type Is A File\n" ) if $result eq "file";
print( "Path Type Is A Directory\n" ) if $result eq "dir";
print( "Path Type Is Unknown\n" ) if $result eq "unknown";

undef( $util );

IsWordOrCUITerm

Description:

Checks whether the passed string argument is word or CUI term.

Input:

$string   -> Word or CUI string term

Output:

$string -> Returns "cui", "word" or undef

Example:

use Word2vec::Util;

my $util = Word2vec::Util->new();

my $result = $util->IsWordOrCUITerm( "Cookie" );

print( "Passed String Argument Term Type: \"$result\"\n" ) if defined( $result );
print( "Cannot Determine String Argument Term Type\n" )    if !defined( $result );

my $result = $util->IsWordOrCUITerm( "C08132016" );

print( "Passed String Argument Term Type: \"$result\"\n" ) if defined( $result );
print( "Cannot Determine String Argument Term Type\n" )    if !defined( $result );

undef( $util );

GetFilesInDirectory

Description:

Given a path and file tag string, returns a string of files consisting of the file tag string in the specified path.

Input:

$path    -> String representing path
$fileTag -> String consisting of file tag to fetch.

Output:

$string  -> Returns string of file names consisting of $fileTag.

Example:

use Word2vec::Util;

my $util = Word2vec::Util->new();

# Looks in specified path for files including ".sval" in their file name.
my $result = $util->GetFilesInDirectory( "../samples/", ".sval" );

print( "Found File Name(s): $result\n" ) if defined( $result );

undef( $util );

GetOSType

Description:

Returns (string) operating system type.

Input:

None

Output:

$string -> Operating System String

Example:

use Word2vec::Util;

my $util = Word2vec::Util->new();

my $result = $util->GetOSType();

print( "Current OS Type: $result\n" ) if defined( $result );

undef( $util );

Accessor Functions

GetDebugLog

Description:

Returns the _debugLog member variable set during Word2vec::Util object initialization of new function.

Input:

None

Output:

$value -> '0' = False, '1' = True

Example:

use Word2vec::Util;

my $util = Word2vec::Util->new()
my $debugLog = $util->GetDebugLog();

print( "Debug Logging Enabled\n" ) if $debugLog == 1;
print( "Debug Logging Disabled\n" ) if $debugLog == 0;


undef( $util );

GetWriteLog

Description:

Returns the _writeLog member variable set during Word2vec::Util object initialization of new function.

Input:

None

Output:

$value -> '0' = False, '1' = True

Example:

use Word2vec::Util;

my $util = Word2vec::Util->new();
my $writeLog = $util->GetWriteLog();

print( "Write Logging Enabled\n" ) if $writeLog == 1;
print( "Write Logging Disabled\n" ) if $writeLog == 0;

undef( $util );

Debug Functions

WriteLog

Description:

Prints passed string parameter to the console, log file or both depending on user options.

Note: printNewLine parameter prints a new line character following the string if the parameter
is undefined and does not if parameter is 0.

Input:

$string -> String to print to the console/log file.
$value  -> 0 = Do not print newline character after string, all else prints new line character including 'undef'.

Output:

None

Example:

use Word2vec::Util:

my $util = Word2vec::Util->new();
$util->WriteLog( "Hello World" );

undef( $util );

Author

Clint Cuffy, Virginia Commonwealth University

COPYRIGHT

Copyright (c) 2016

Bridget T McInnes, Virginia Commonwealth University
btmcinnes at vcu dot edu

Clint Cuffy, Virginia Commonwealth University
cuffyca at vcu dot edu

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to:

The Free Software Foundation, Inc.,
59 Temple Place - Suite 330,
Boston, MA  02111-1307, USA.