Sponsoring The Perl Toolchain Summit 2025: Help make this important event another success Learn more

NAME

Word2vec::Util - Word2vec-Interface Utility Module.

SYNOPSIS

my $util = Word2vec::Util->new();
my $result = $util->IsFileOrDirectory( "../samples/stoplist" );
print( "Path Type Is A File\n" ) if $result eq "file";
print( "Path Type Is A Directory\n" ) if $result eq "dir";
print( "Path Type Is Unknown\n" ) if $result eq "unknown";
undef( $util );

DESCRIPTION

Word2vec::Util is a module of utility functions for the Word2vec::Interface package.

Main Functions

new

Description:

Returns a new "Word2vec::Util" module object.
Note: Specifying no parameters implies default options.
Default Parameters:
debugLog = 0
writeLog = 0

Input:

$debugLog -> Instructs module to print debug statements to the console. (1 = True / 0 = False)
$writeLog -> Instructs module to print debug statements to a log file. (1 = True / 0 = False)

Output:

Word2vec::Util object.

Example:

my $util = Word2vec::Util->new();
undef( $util );

DESTROY

Description:

Removes Word2vec::Util object from memory.

Input:

None

Output:

None

Example:

See above example for "new" function.
Note: Destroy function is also automatically called during global destruction when exiting the program.

CleanText

Description:

Normalizes text based on the following.
- Text converted to lowercase
- More than one white space is replaced with a single white space
- Apostrophe "s" ('s) characters are removed
- Hyphen character is replaced with a single white space
- All special characters removed outside of lowercase 'a-z' and compoundified terms retained, joined by '_' (underscore).
- Line-feed/carriage return (LF-CR) endings are cleaned and converted to OS specific LF-CR endings

Input:

$string -> String of text to normalize

Output:

$string -> Cleaned/Normalized text.

Example:

my $util = Word2vec::Util->new();
my $text = "123485clean-text!!@&^#*@";
print( "Original Text: \"$text\"\n" );
$text = $util->CleanText( $text );
print( "Cleaned Text: \"$text\"\n" );
undef( $util );

RemoveNewLineEndingsFromString

Description:

Removes new line endings from string. Supports MSWin32, linux and MacOS line endings.

Input:

$string -> String with line-feed/carriage return ending to remove.

Output:

$string -> String without line-feed/carriage return ending.

Example:

my $util = Word2vec::Util->new();
my $text = "this is sample text\n";
print( "Original Text: \"$text\"\n" );
$text = $util->RemoveNewLineEndingsFromString( $text );
print( "Cleaned Text: \"$text\"\n" );
undef( $util );

IsFileOrDirectory

Description:

Given a path, returns a string specifying whether this path represents a file or directory.

Input:

$path -> String representing path to check

Output:

$string -> Returns "file", "dir" or "unknown".

Example:

my $util = Word2vec::Util->new();
my $result = $util->IsFileOrDirectory( "../samples/stoplist" );
print( "Path Type Is A File\n" ) if $result eq "file";
print( "Path Type Is A Directory\n" ) if $result eq "dir";
print( "Path Type Is Unknown\n" ) if $result eq "unknown";
undef( $util );

IsWordOrCUITerm

Description:

Checks whether the passed string argument is word or CUI term.

Input:

$string -> Word or CUI string term

Output:

$string -> Returns "cui", "word" or undef

Example:

my $util = Word2vec::Util->new();
my $result = $util->IsWordOrCUITerm( "Cookie" );
print( "Passed String Argument Term Type: \"$result\"\n" ) if defined( $result );
print( "Cannot Determine String Argument Term Type\n" ) if !defined( $result );
my $result = $util->IsWordOrCUITerm( "C08132016" );
print( "Passed String Argument Term Type: \"$result\"\n" ) if defined( $result );
print( "Cannot Determine String Argument Term Type\n" ) if !defined( $result );
undef( $util );

GetFilesInDirectory

Description:

Given a path and file tag string, returns a string of files consisting of the file tag string in the specified path.

Input:

$path -> String representing path
$fileTag -> String consisting of file tag to fetch.

Output:

$string -> Returns string of file names consisting of $fileTag.

Example:

my $util = Word2vec::Util->new();
# Looks in specified path for files including ".sval" in their file name.
my $result = $util->GetFilesInDirectory( "../samples/", ".sval" );
print( "Found File Name(s): $result\n" ) if defined( $result );
undef( $util );

GetOSType

Description:

Returns (string) operating system type.

Input:

None

Output:

$string -> Operating System String

Example:

my $util = Word2vec::Util->new();
my $result = $util->GetOSType();
print( "Current OS Type: $result\n" ) if defined( $result );
undef( $util );

Accessor Functions

GetDebugLog

Description:

Returns the _debugLog member variable set during Word2vec::Util object initialization of new function.

Input:

None

Output:

$value -> '0' = False, '1' = True

Example:

my $util = Word2vec::Util->new()
my $debugLog = $util->GetDebugLog();
print( "Debug Logging Enabled\n" ) if $debugLog == 1;
print( "Debug Logging Disabled\n" ) if $debugLog == 0;
undef( $util );

GetWriteLog

Description:

Returns the _writeLog member variable set during Word2vec::Util object initialization of new function.

Input:

None

Output:

$value -> '0' = False, '1' = True

Example:

my $util = Word2vec::Util->new();
my $writeLog = $util->GetWriteLog();
print( "Write Logging Enabled\n" ) if $writeLog == 1;
print( "Write Logging Disabled\n" ) if $writeLog == 0;
undef( $util );

Debug Functions

WriteLog

Description:

Prints passed string parameter to the console, log file or both depending on user options.
Note: printNewLine parameter prints a new line character following the string if the parameter
is undefined and does not if parameter is 0.

Input:

$string -> String to print to the console/log file.
$value -> 0 = Do not print newline character after string, all else prints new line character including 'undef'.

Output:

None

Example:

my $util = Word2vec::Util->new();
$util->WriteLog( "Hello World" );
undef( $util );

Author

Clint Cuffy, Virginia Commonwealth University

COPYRIGHT

Copyright (c) 2016

Bridget T McInnes, Virginia Commonwealth University
btmcinnes at vcu dot edu
Clint Cuffy, Virginia Commonwealth University
cuffyca at vcu dot edu

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to:

The Free Software Foundation, Inc.,
59 Temple Place - Suite 330,
Boston, MA 02111-1307, USA.