NAME
WWW::Search::HotBot - backend for searching hotbot.lycos.com
SYNOPSIS
use WWW::Search;
my $oSearch = new WWW::Search('HotBot');
my $sQuery = WWW::Search::escape_query("+sushi restaurant +Columbus Ohio");
$oSearch->native_query($sQuery);
while (my $oResult = $oSearch->next_result())
{ print $oResult->url, "\n"; }
DESCRIPTION
This class is a HotBot specialization of WWW::Search. It handles making and interpreting HotBot searches http://www.hotbot.com.
This class exports no public interface; all interaction should be done through WWW::Search objects.
By default, WWW::Search::HotBot uses hotbot.com's "advanced search" interface. If you want to perform a query with the same default options as if a user typed it in the browser window (i.e. at http://www.hotbot.com), call $oSearch->gui_query($sQuery) instead of ->native_query().
The default behavior is for HotBot to look for "any of" the query terms:
$oSearch->native_query(escape_query('Dorothy Oz'));
If you want "all of", call native_query like this:
$oSearch->native_query(escape_query('Dorothy Oz'), {'SM' => 'MC'});
If you want to send HotBot a boolean phrase, call native_query like this:
$oSearch->native_query(escape_query('Oz AND Dorothy NOT Australia'), {'SM' => 'B'});
See below for other query-handling options.
OPTIONS
The following search options can be activated by sending a hash as the second argument to native_query().
Format / Treatment of Query Terms
The default is logical OR of all the query terms.
- {'SM' => 'MC'}
-
"Must Contain": logical AND of all the query terms.
- {'SM' => 'SC'}
-
"Should Contain": logical OR of all the query terms. This is the default.
- {'SM' => 'B'}
-
"Boolean": the entire query is treated as a boolean expression with AND, OR, NOT, and parentheses.
- {'SM' => 'name'}
-
The entire query is treated as a person's name.
- {'SM' => 'phrase'}
-
The entire query is treated as a phrase.
- {'SM' => 'title'}
-
The query is applied to the page title. (I assume the logical OR of the query terms will be applied to the page title.)
- {'SM' => 'url'}
-
The query is assumed to be a URL, and the results will be pages that link to the query URL.
Restricting Search to a Date Range
The default is no date restrictions.
- {'date' => 'within', 'DV' => 90}
-
Only return pages updated within 90 days of today. (Substitute any integer in place of 90.)
- {'date' => 'range', 'DR' => 'newer', 'DY' => 97, 'DM' => 12, 'DD' => 25}
-
Only return pages updated after Christmas 1997. (Substitute any year, month, and day for 97, 12, 25.)
- {'date' => 'range', 'DR' => 'older', 'DY' => 97, 'DM' => 12, 'DD' => 25}
-
Only return pages updated before Christmas 1997. (Substitute any year, month, and day for 97, 12, 25.)
Restricting Search to a Geographic Area
The default is no restriction to geographic area.
- {'RD' => 'AN'}
-
Return pages from anywhere. This is the default.
- {'RD' => 'DM', 'Domain' => 'microsoft.com, .cz'}
-
Restrict search to pages located in the listed domains. (Substitute any list of domain substrings.)
- {'RD' => 'RG', 'RG' => '.com'}
-
Restrict search to North American commercial web sites.
- {'RD' => 'RG', 'RG' => '.edu'}
-
Restrict search to North American educational web sites.
- {'RD' => 'RG', 'RG' => '.gov'}
-
Restrict search to United Stated Government web sites.
- {'RD' => 'RG', 'RG' => '.mil'}
-
Restrict search to United States military commercial web sites.
- {'RD' => 'RG', 'RG' => '.net'}
-
Restrict search to North American '.net' web sites.
- {'RD' => 'RG', 'RG' => '.org'}
-
Restrict search to North American organizational web sites.
- {'RD' => 'RG', 'RG' => 'NA'}
-
"North America": Restrict search to all of the above types of web sites.
- {'RD' => 'RG', 'RG' => 'AF'}
-
Restrict search to web sites in Africa.
- {'RD' => 'RG', 'RG' => 'AS'}
-
Restrict search to web sites in India and Asia.
- {'RD' => 'RG', 'RG' => 'CA'}
-
Restrict search to web sites in Central America.
- {'RD' => 'RG', 'RG' => 'DU'}
-
Restrict search to web sites in Oceania.
- {'RD' => 'RG', 'RG' => 'EU'}
-
Restrict search to web sites in Europe.
- {'RD' => 'RG', 'RG' => 'ME'}
-
Restrict search to web sites in the Middle East.
- {'RD' => 'RG', 'RG' => 'SE'}
-
Restrict search to web sites in Southeast Asia.
Requesting Certain Multimedia Data Types
The default is not specifically requesting any multimedia types (presumably, this will NOT restrict the search to NON-multimedia pages).
- {'FAC' => 1}
-
Return pages which contain Adobe Acrobat PDF data.
- {'FAX' => 1}
-
Return pages which contain ActiveX.
- {'FJA' => 1}
-
Return pages which contain Java.
- {'FJS' => 1}
-
Return pages which contain JavaScript.
- {'FRA' => 1}
-
Return pages which contain audio.
- {'FSU' => 1, 'FS' => '.txt, .doc'}
-
Return pages which have one of the listed extensions. (Substitute any list of DOS-like file extensions.)
- {'FSW' => 1}
-
Return pages which contain ShockWave.
- {'FVI' => 1}
-
Return pages which contain images.
- {'FVR' => 1}
-
Return pages which contain VRML.
- {'FVS' => 1}
-
Return pages which contain VB Script.
- {'FVV' => 1}
-
Return pages which contain video.
Requesting Pages at Certain Depths on Website
The default is pages at any level on their website.
- {'PS'=>'A'}
-
Return pages at any level on their website. This is the default.
- {'PS' => 'D', 'D' => 3 }
-
Return pages within 3 links of "top" page of their website. (Substitute any integer in place of 3.)
- {'PS' => 'F'}
-
Only return pages that are the "top" page of their website.
SEE ALSO
To make new back-ends, see WWW::Search.
CAVEATS
When www.hotbot.com reports a "Mirror" URL, WWW::Search::HotBot ignores it. Therefore, the number of URLs returned by WWW::Search::HotBot might not agree with the value returned in approximate_result_count.
BUGS
Please tell the author if you find any!
AUTHOR
As of 1998-02-02, WWW::Search::HotBot
is maintained by Martin Thurn (MartinThurn@iname.com).
WWW::Search::HotBot
was originally written by Wm. L. Scheding, based on WWW::Search::AltaVista
.
LEGALESE
THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
VERSION HISTORY
If it is not listed here, then it was not a meaningful nor released revision.
2.24, 2001-10
2.23, 2001-07-19
Tweak pattern for result-count; set agent_name to something that works
2.21, 2000-12-11
new URL for advanced search
2.19, 2000-10-11
added AM1=MC to all URLs in GUI mode (hotbot.com seems to "randomly" add this if you search manually at their site)
2.18, 2000-06-26
fix for only one page of gui results; and "next" link in new place
2.17, 2000-05-24
was still missing first URL of non-gui(?) results!
2.16, 2000-05-17
was missing first URL of gui results
2.15, 2000-04-03
fixed gui_query()
2.14, 2000-02-01
testing now uses WWW::Search::Test module
2.13, 2000-01-31
bugfix: was missing title
2.12, 2000-01-19
new function gui_query(), and handle output from it
2.10, 1999-12-22
handle new result format
2.09, 1999-12-15
handle new result count format
2.08, 1999-12-10
handle new output format
2.07, 1999-11-12
BUGFIX for domain-limited URL parsing (thanks to Leon Brocard)
2.06, 1999-10-18
www.hotbot.com changed their output format slightly; now uses strip_tags() on title and description
2.05, 1999-10-05
now uses hash_to_cgi_string(); new test cases
2.03, 1999-09-28
BUGFIX: was missing the "Next page" link sometimes.
2.02, 1999-08-17
Now is able to parse "URL-only" format (i.e. {'DE' => 0}) and "brief description" format (i.e. {'DE' => 1}) if the user so desires.
1.34, 1999-07-01
New test cases.
1.32, 1999-06-20
Now unescapes the URLs before returning them.
1.31, 1999-06-11
www.hotbot.com changed their output format ever so slightly. (Thanks to Jim jsmyser@bigfoot.com for pointing it out)
1.30, 1999-04-12
BUG FIX: results for domain-limited search were not parsed. (Thanks to Christopher York yorkc@ccwf.cc.utexas.edu for pointing it out)
1.29, 1999-02-22
www.hotbot.com changed their output format. (Thanks to Tim Chklovski timc@mit.edu for pointing it out)
1.27, 1998-11-06
HotBot changed their output format(?). HotBot.pm now uses hotbot.com's text-only search results format. Minor documentation changes.
1.25, 1998-09-11
HotBot changed their output format ever so slightly. Documentation added for all known HotBot query options!
1.23
Better documentation for boolean queries. (Thanks to Jason Titus jason_titus@odsnet.com)
1.22
www.hotbot.com changed their output format.
1.21
www.hotbot.com changed their output format.
1.17
www.hotbot.com changed their search script location and output format on 1998-05-21. Also, as many as 6 fields of each SearchResult are now filled in.
1.13
Fixed the maximum_to_retrieve off-by-one problem. Updated test cases.
1.12
www.hotbot.com does not do truncation. Therefore, if the query contains truncation characters (i.e. '*' at end of words), they are simply deleted before the query is sent to www.hotbot.com.
1.11, 1998-02-05
Fixed and revamped by Martin Thurn.
IMPORTANT NOTICE
As of 2001-10-25, www.hotbot.com is broken as follows: When you click the "next" link on the results page, it returns the same set of results all over again. Therefore, this release of WWW::Search::HotBot can only retrieve the first 100 hits.