WebService::GoogleHack Package
SYNOPSIS
WebService::GoogleHack - Is a Perl package that interacts with the Google API, and has some basic functionalities that allow the user to interact with Googleand retrieve results. It also has some Natural Language Processing capabilities, such as the ability to predict the semantic orienation of words, build word clusters, find words that are common to a pair of words etc.
DESCRIPTION
This module acts as a driver module. Basically it acts as an interface between the user and the modules. The modules that are controlled by WebService::GoogleHack are:
WebService::GoogleHack::Text, Search, Rate, Spelling
AUTHOR
Pratheepan Raveendranathan, <rave0029@d.umn.edu>
Ted Pedersen, <tpederse@d.umn.edu>
BUGS
SEE ALSO
WebService::GoogleHack home page Pratheepan Raveendranathan Ted Pedersen
Google-Hack Maling List <google-hack-users@lists.sourceforge.net>
COPYRIGHT AND LICENSE
Copyright (c) 2003 by Pratheepan Raveendranathan, Ted Pedersen
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to
The Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
PACKAGE METHODS
__PACKAGE__->new(\%args)
Purpose: This function creates an object of type GoogleHack and returns a blessed reference.
__PACKAGE__->init(\%args)
Purpose: This this function can used to inititalize the member variables.
Valid arguments are :
key
string. key to the google-api
File_location
string. This the wsdl file name
adverbs_list
string. The location of the adverbs list file
verbs_list
string. The location of the verbs list file
adjectives_list
string. The location of the adjectives list file
nouns_list
string. The location of the nouns list file
stop_list
string. The location of the stop_words list file
__PACKAGE__->setMaxResults(\%args)
Purpose: This function sets the maximum number of results retrived
Valid arguments are :
maxResults
Number. The maximum number of results we want to be able to retrieve. Should be less than 10.
__PACKAGE__->setlr(\%args)
Purpose: This this function can used to set the language restriction
Valid arguments are :
lr
string. Language Restricion eg lang_eng
__PACKAGE__->setoe(\%args)
Purpose: This this function can used to set oe
Valid arguments are :
oe
string.
__PACKAGE__->setie(\%args)
Purpose: This this function can used to set ie
Valid arguments are :
ie
string.
__PACKAGE__->setStartPos(\%args)
Purpose: This function sets the startposition for the search results
Valid arguments are :
StartPos
string.
__PACKAGE__->setFilter(\%args)
Purpose: This functions sets the search filter as on or off
Valid arguments are :
Filter
boolean. True or False
__PACKAGE__->setRestrict(\%args)
Purpose: this funciton restricts the search to a specific domains
Valid arguments are :
Restrict
String. UncleSam for the US Government
__PACKAGE__->setSafeSearch(\%args)
Purpose: This functions enables safe search, Restricts search to non-abusive material.
Valid arguments are :
Restrict
Boolean. "True" or "False".
__PACKAGE__->measureSemanticRelatedness(\%args)
Purpose: this is function is used to measure the relatedness between two words it basically calls the measureSemanticRelatedness function which is in the Rate class
Valid arguments are :
searchString1
string. The search string which can be a phrase or word
searchString2
string. The search string which can be a phrase or word
Returns: Returns the object containing the PMI measure. ($search->{'PMI'}).
__PACKAGE__->predictSemanticOrientation(\%args)
Purpose: this function tries to predict the semantic orientation of a paragraph of text need
Valid arguments are :
config_file
string. The location of the review file
positive_inference.
string. Positive inference such as excellent
negative_inference.
string. Negative inference such a poor
trace_file.
string. The location of the trace file. If a file_name is given, the results are stored in this file
Returns : the PMI measure and the prediction which is 0 or 1.
__PACKAGE__->phraseSpelling(\%args)
Purpose: This is function is used to retrieve a spelling suggestion from Google
Valid arguments are :
$searchString
string. Need to pass the search string, which can be a single word
Returns: Returns suggested spelling if there is one, otherwise returns "No Spelling Suggested":
__PACKAGE__->Search(\%args)
Purpose: This function is used to query googles
Valid arguments are :
$searchString
string. Need to pass the search string, which can be a single word or phrase, maximum ten words
num_results
integer. The number of results you wast to retrieve, default is 10. Maximum is 1000.
Returns: Returns a GoogleHack object containing the search results.
__PACKAGE__->initConfig(\%args)
Purpose: this function is used to read a configuration file containing informaiton such as the Google-API key, the words list etc.
Valid arguments are :
filename
string. Location of the configuration file.
returns : Returns an object which contains the parsed information.
__PACKAGE__->printConfig(\%args)
Purpose: This function is used to print the information read from a configuration file
No arguments.
__PACKAGE__->getSearchSnippetWords(\%args)
Purpose: Given a search word, this function tries to retreive the text surrounding the search word in the retrieved snippets.
Valid arguments are :
searchString
string. The search string which can be a word or phrase
numResults
string. The number of results to be processed from google.
trace_file.
string. The location of the trace file. If a file_name is given, the results are stored in this file
proximity
string. The number of words surrounding the searchString (Not Implemented yet
returns : Returns an object which contains the parsed information
__PACKAGE__->getCachedSurroundingWords(\%args)
Purpose: Given a search word, this function tries to retreive the text
surrounding the search word in the retrieved CACHED Web pages. It basically
does the search and passes the search results to the
WebService::GoogleHack::Text::getCachedSurroundingWords function.
Valid arguments are :
searchString
string. The search string which can be a word or phrase
trace_file.
string. The location of the trace file. If a file_name is given, the results are stored in this file
returns : Returns a hash with the keys being the words and the values being the frequency of occurence.
__PACKAGE__->getSearchSnippetSentences(\%args)
Purpose: Given a search word, this function tries to retreive the
sentences in the snippet.It basically does the search and passes the
search results to the WebService::GoogleHack::Text::getSnippetSentences
function
Valid arguments are :
searchString
string. The search string which can be a word or phrase
trace_file.
string. The location of the trace file. If a file_name is given, the results are stored in this file
returns : Returns an array of strings.
__PACKAGE__->getCachedSurroundingSentences(\%args)
Purpose: Given a search word, this function tries to retreive the
sentences in the cached web page.
Valid arguments are :
searchString
string. The search string which can be a word or phrase
trace_file.
string. The location of the trace file. If a file_name is given, the results are stored in this file
returns : Returns a hash which contains the parsed sentences as values and the key being the web URL.
__PACKAGE__->getSearchCommonWords(\%args)
Purpose:Given two search words, this function tries to retreive the common
text/words surrounding the search strings in the retrieved snippets.
Valid arguments are :
searchString1
string. The search string which can be a word or phrase
searchString2
string. The search string which can be a word or phrase
trace_file.
string. The location of the trace file. If a file_name is given, the results are stored in this file
returns : Returns a hash which contains the parsed sentences as values and the key being the web URL.
__PACKAGE__->getWordClusters(\%args)
Purpose:Given a search string, this function retreive the top frequency words
, and does a search on those words, and builds a list of words that can be
regarded as a cluster of related words.
Valid arguments are :
searchString1
string. The search string which can be a word or phrase
*=item *
iterations
number. The number of iterations that you want the function to search and build cluster on.
trace_file.
string. The location of the trace file. If a file_name is given, the results are stored in this file
returns : Returns a hash which contains the parsed sentences as values and the key being the web URL.
__PACKAGE__->getPairWordClusters(\%args)
Purpose:Given two search strings, this function retreive the snippets for
each string, and then finds the intersection of words, and then repeats the
search with the intersection of words.
Valid arguments are :
searchString1
string. The search string which can be a word or phrase
searchString2
string. The search string which can be a word or phrase
iterations
number. The number of iterations that you want the function to search and build cluster on.
trace_file.
string. The location of the trace file. If a file_name is given, the results are stored in this file
returns : Returns a hash which contains the intersecting words as keys and the values being the frequency of occurence.
__PACKAGE__->getText(\%args)
Purpose:Given a search string, this function will retreive the resulting
URLs from Google, follow those links, and retrieve the text from there. The
function will then clean up the text and store it in a file along with the URL,
Date and time of retrieval.The file will be stored under the name of the
search string.
Valid arguments are :
searchString
string. The search string which can be a word or phrase.
iterations
number. The number of iterations that you want the function to search and build cluster on.
path_to_data_directory.
string. The location where the file containing the retrived information has to be stored.
returns : Returns nothing.
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 1324:
Expected '=item *'