NAME
Word2vec::Util - Word2vec-Interface Utility Module.
SYNOPSIS
use
Word2vec::Util;
my
$util
= Word2vec::Util->new();
my
$result
=
$util
->IsFileOrDirectory(
"../samples/stoplist"
);
(
"Path Type Is A File\n"
)
if
$result
eq
"file"
;
(
"Path Type Is A Directory\n"
)
if
$result
eq
"dir"
;
(
"Path Type Is Unknown\n"
)
if
$result
eq
"unknown"
;
undef
(
$util
);
DESCRIPTION
Word2vec::Util is a module of utility functions for the Word2vec::Interface package.
Main Functions
new
Description:
Returns a new
"Word2vec::Util"
module object.
Note: Specifying
no
parameters implies
default
options.
Default Parameters:
debugLog = 0
writeLog = 0
Input:
$debugLog
-> Instructs module to
debug statements to the console. (1 = True / 0 = False)
$writeLog
-> Instructs module to
debug statements to a
log
file. (1 = True / 0 = False)
Output:
Word2vec::Util object.
Example:
DESTROY
Description:
Removes Word2vec::Util object from memory.
Input:
None
Output:
None
Example:
See above example
for
"new"
function.
Note: Destroy function is also automatically called during global destruction
when
exiting the program.
CleanText
Description:
Normalizes text based on the following.
- Text converted to lowercase
- More than one white space is replaced
with
a single white space
- Apostrophe
"s"
('s) characters are removed
- Hyphen character is replaced
with
a single white space
- All special characters removed outside of lowercase
'a-z'
and compoundified terms retained, joined by
'_'
(underscore).
- Line-feed/carriage
return
(LF-CR) endings are cleaned and converted to OS specific LF-CR endings
Input:
$string
-> String of text to normalize
Output:
$string
-> Cleaned/Normalized text.
Example:
use
Word2vec::Util;
my
$util
= Word2vec::Util->new();
my
$text
=
"123485clean-text!!@&^#*@"
;
(
"Original Text: \"$text\"\n"
);
$text
=
$util
->CleanText(
$text
);
(
"Cleaned Text: \"$text\"\n"
);
undef
(
$util
);
RemoveNewLineEndingsFromString
Description:
Removes new line endings from string. Supports MSWin32, linux and MacOS line endings.
Input:
$string
-> String
with
line-feed/carriage
return
ending to remove.
Output:
$string
-> String without line-feed/carriage
return
ending.
Example:
use
Word2vec::Util;
my
$util
= Word2vec::Util->new();
my
$text
=
"this is sample text\n"
;
(
"Original Text: \"$text\"\n"
);
$text
=
$util
->RemoveNewLineEndingsFromString(
$text
);
(
"Cleaned Text: \"$text\"\n"
);
undef
(
$util
);
IsFileOrDirectory
Description:
Given a path, returns a string specifying whether this path represents a file or directory.
Input:
$path
-> String representing path to check
Output:
$string
-> Returns
"file"
,
"dir"
or
"unknown"
.
Example:
use
Word2vec::Util;
my
$util
= Word2vec::Util->new();
my
$result
=
$util
->IsFileOrDirectory(
"../samples/stoplist"
);
(
"Path Type Is A File\n"
)
if
$result
eq
"file"
;
(
"Path Type Is A Directory\n"
)
if
$result
eq
"dir"
;
(
"Path Type Is Unknown\n"
)
if
$result
eq
"unknown"
;
undef
(
$util
);
IsWordOrCUITerm
Description:
Checks whether the passed string argument is word or CUI term.
Input:
$string
-> Word or CUI string term
Output:
$string
-> Returns
"cui"
,
"word"
or
undef
Example:
use
Word2vec::Util;
my
$util
= Word2vec::Util->new();
my
$result
=
$util
->IsWordOrCUITerm(
"Cookie"
);
(
"Passed String Argument Term Type: \"$result\"\n"
)
if
defined
(
$result
);
(
"Cannot Determine String Argument Term Type\n"
)
if
!
defined
(
$result
);
my
$result
=
$util
->IsWordOrCUITerm(
"C08132016"
);
(
"Passed String Argument Term Type: \"$result\"\n"
)
if
defined
(
$result
);
(
"Cannot Determine String Argument Term Type\n"
)
if
!
defined
(
$result
);
undef
(
$util
);
GetFilesInDirectory
Description:
Given a path and file tag string, returns a string of files consisting of the file tag string in the specified path.
Input:
$path
-> String representing path
$fileTag
-> String consisting of file tag to fetch.
Output:
$string
-> Returns string of file names consisting of
$fileTag
.
Example:
use
Word2vec::Util;
my
$util
= Word2vec::Util->new();
# Looks in specified path for files including ".sval" in their file name.
my
$result
=
$util
->GetFilesInDirectory(
"../samples/"
,
".sval"
);
(
"Found File Name(s): $result\n"
)
if
defined
(
$result
);
undef
(
$util
);
GetOSType
Description:
Returns (string) operating
system
type.
Input:
None
Output:
$string
-> Operating System String
Example:
use
Word2vec::Util;
my
$util
= Word2vec::Util->new();
my
$result
=
$util
->GetOSType();
(
"Current OS Type: $result\n"
)
if
defined
(
$result
);
undef
(
$util
);
Accessor Functions
GetDebugLog
Description:
Returns the _debugLog member variable set during Word2vec::Util object initialization of new function.
Input:
None
Output:
$value
->
'0'
= False,
'1'
= True
Example:
use
Word2vec::Util;
my
$util
= Word2vec::Util->new()
my
$debugLog
=
$util
->GetDebugLog();
(
"Debug Logging Enabled\n"
)
if
$debugLog
== 1;
(
"Debug Logging Disabled\n"
)
if
$debugLog
== 0;
undef
(
$util
);
GetWriteLog
Description:
Returns the _writeLog member variable set during Word2vec::Util object initialization of new function.
Input:
None
Output:
$value
->
'0'
= False,
'1'
= True
Example:
use
Word2vec::Util;
my
$util
= Word2vec::Util->new();
my
$writeLog
=
$util
->GetWriteLog();
(
"Write Logging Enabled\n"
)
if
$writeLog
== 1;
(
"Write Logging Disabled\n"
)
if
$writeLog
== 0;
undef
(
$util
);
Debug Functions
WriteLog
Description:
Prints passed string parameter to the console,
log
file or both depending on user options.
Note: printNewLine parameter prints a new line character following the string
if
the parameter
is undefined and does not
if
parameter is 0.
Input:
$string
-> String to
to the console/
log
file.
$value
-> 0 = Do not
newline character
after
string, all
else
prints new line character including
'undef'
.
Output:
None
Example:
use
Word2vec::Util:
my
$util
= Word2vec::Util->new();
$util
->WriteLog(
"Hello World"
);
undef
(
$util
);
Author
Clint Cuffy, Virginia Commonwealth University
COPYRIGHT
Copyright (c) 2016
Bridget T McInnes, Virginia Commonwealth University
btmcinnes at vcu dot edu
Clint Cuffy, Virginia Commonwealth University
cuffyca at vcu dot edu
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to:
The Free Software Foundation, Inc.,
59 Temple Place - Suite 330,
Boston, MA 02111-1307, USA.