NAME
Search::Kinosearch::KSearch - Perform searches
DEPRECATED
Search::Kinosearch has been superseded by KinoSearch. Please use the new version.
SYNOPSIS
my $ksearch = Search::Kinosearch::KSearch->new(
-mainpath => '/foo/bar/kindex',
);
my $query = Search::Kinosearch::Query->new(
-string => 'this AND NOT (that OR "the other thing")',
-lowercase => 1,
-tokenize => 1,
-stem => 1,
);
$ksearch->add_query( $query );
$ksearch->process;
while (my $result = $ksearch->fetch_hit_hashref) {
print "$result->{title}\n";
}
DESCRIPTION
KSearch objects perform queries against the kindex files created by Kindexer objects.
Queries are fed into KSearch using Search::Kinosearch::Query objects. You can feed multiple Query objects to a KSearch object in order to fine tune your result set, but KSearch objects themselves are single shot -- if you need to perform multiple searches, you need to create multiple objects.
Multiple calls to add_query()
It is possible to perform a search which is the result of multiple queries - in fact, that is the only way to implement an "advanced search" interface:
my $find_the_word_people = Search::Kinosearch::Query->new(
-string => 'people',
-required => 1,
-fields => {
title => 3,
bodytext => 1,
},
-tokenize => 1,
-stem => 1,
-lowercase => 1,
);
my $in_article_ii_only = Search::Kinosearch::Query->new(
-string => 'Article II',
-required => 1,
-fields => {
section => 1,
},
);
$ksearch->add_query( $find_the_word_people );
$ksearch->add_query( $in_article_ii_only );
my $status = $ksearch->process;
...
Since both queries are marked as '-required => 1', all documents returned must 1) match 'people' in one or both of the 'title' and 'bodytext' fields, and 2) match 'Article II' in the 'section' field.
Excerpts
Kinosearch attempts to find the section of the text with the greatest density of search terms in a field that you specify (typically the bodytext). Any search terms encountered within the text are highlighted with html tags. In addition to the field from which the excerpt is taken, Kinosearch gives you control over the length of the the excerpt and the text of the highlight tags.
CONSTRUCTOR
new()
my $ksearch = Search::Kinosearch::KSearch->new(
-mainpath => '/foo/kindex' # default: 'kindex'
-freqpath => '/ramd/fdata' # default: 'kindex/freqdata'
-kindex => $kindex, # default: created using -mainpath
-any_or_all => 'any', # default: 'any'
-sort_by => 'score', # default: 'string'
-allow_boolean => 0, # default: 1
-allow_phrases => 0, # default: 1
-num_results => 20, # default: 10
-offset => 40, # default: 0
#-language => 'Es', # default: 'En'
-stoplist => \%big_list # default: see below
-excerpt_field => 'bodytext', # default: undef
-excerpt_length => 200, # default: 150
-hl_tag_open => '<b>', # default: '<strong>'
-hl_tag_close => '</b>', # default: '</strong>'
);
Construct a KSearch object.
- -mainpath
-
The path to your kindex.
- -freqpath
-
Specify an alternative location for the frequency data -- most likely, a ram disk.
- -kindex
-
A Search::Kinosearch::Kindex object. If you provide such an object, you don't need to specify -mainpath or -freqpath.
- -any_or_all
-
Searches return results containing 'any' or 'all' search terms.
- -sort_by
-
'score' or 'datetime'.
- -allow_boolean
-
If set to 0, disables parenthetical groupings; boolean terms "AND", "OR" and "AND NOT"; and prepended +plus and -minus.
- -allow_phrases
-
Enable/disable phrase-matching.
- -num_results
-
Maximum number of documents returned.
- -offset
-
Number of documents to skip when returning ranked results. Example: if -offset is set to 10, the first document returned will be the 11th most highly ranked.
- -language
-
The language of the query. At present only 'En' works. See Search::Kinosearch::Lingua.
- -stoplist
-
A hashref of words to exclude from the query. If no list is specified, a default list is loaded based on the -language parameter; for instance, if -language is set to 'Es', then $Search::Kinosearch::Lingua::Es::stoplist is used. Stopwords encountered in the query are reported in the search status hash returned by process().
- -excerpt_field
-
Field to be used when generating excerpts.
- -excerpt_length
-
Maximum length of excerpt, in characters.
- -hl_tag_open
-
Override the default opening tag used to highlight search terms which appear in the excerpt.
- -hl_tag_close
-
Override the default closing tag used to highlight search terms which appear in the excerpt.
METHODS
add_query()
$ksearch->add_query( $query )
Add a query, in the form of a Search::Kinosearch::Query object, to the KSearch object.
process()
my $searchstatus = $ksearch->process;
print "Documents matched: $searchstatus->{num_hits}\n";
Execute the search, generate a result list, and return a hashref pointing to information about the search.
Here's how the status hash might look if you were to search for 'we the people in order to form a more perfect union'.
$searchstatus = {
num_docs => 52
num_hits => 18,
stopwords => {
we => undef,
the => undef,
in => undef,
to => undef,
a => undef,
},
};
- num_docs
-
The number of documents searched.
- num_hits
-
The approximate number of documents matched. (The number is only approximate because it may include documents which have been marked as deleted, but not yet purged from the kindex.)
- stopwords
-
A hash where the keys are stopwords encountered.
fetch_hit_hashref()
Shift ranked results off of an array. Each result is a hashref with all stored fields represented. Two special fields are added.
- excerpt
-
A relevant excerpt taken from the field specified by the -excerpt_field parameter.
- score
-
The document's numerical score.
TO DO
Think hard about the interface, specifically about all the parameters supplied to the constructor. If KSearch gets broken into smaller pieces, those parameters should go away. Better to do that soon, while the user base is small.
Break out excerpting/highlighting code into a separate module.
Sanity check: process can only be called once.
SEE ALSO
AUTHOR
Marvin Humphrey <marvin at rectangular dot com> http://www.rectangular.com
COPYRIGHT
Copyright (c) 2005 Marvin Humphrey. All rights reserved. This module is free software. It may be used, redistributed and/or modified under the same terms as Perl itself.