NAME
OpenInteract2::FullTextIndexer - Base class for OI2 indexers
SYNOPSIS
my $indexer = CTX->fulltext_indexer;
# Or lookup a specific indexer:
my $indexer = CTX->fulltext_indexer( 'Plucene' );
# Add something to the index
$indexer->add_to_index( 'page', '/foo/listing.html', \$foo_content );
# Remove all index entries for something
$indexer->remove_from_index( 'page', '/foo/listing.html' );
# Refresh the index for a particular item
$indexer->refresh_index( 'page', '/foo/listing.html', \$new_foo_content );
# Search the index with default 'return_type' = 'object'
my $results = $indexer->search_index({
search_type => 'all',
terms => [ 'ulysses', 'grant' ],
});
foreach my $result ( @{ $results } ) {
my $object = $result->[0];
my $score = $result->[1];
print "Object ", ref( $object ), " with ID ", $object->id, " ",
"was found with a score of $score\n";
}
# Search the index with different return types
# return type of 'iterator' returns OpenInteract2::FullTextIterator
my $results = $indexer->search_index({
search_type => 'all',
terms => [ 'ulysses', 'grant' ],
return_type => 'iterator',
});
while ( my $object = $results->get_next ) {
print "Object ", ref( $object ), " with ID ", $object->id, " ",
"was found\n";
}
# get additional information from iterator...
while ( my ( $object, $item_num, $score ) = $results->get_next ) {
print "Object $item_num is a ", ref( $object ), " with ID ",
$object->id, " and a score of $score\n";
}
# return type of 'raw' returns arrayref of arrayrefs
my $results = $indexer->search_index({
search_type => 'all',
terms => [ 'ulysses', 'grant' ],
return_type => 'raw',
});
foreach my $result ( @{ $results } ) {
my ( $class, $id, $full_score, $score_info ) = @{ $result };
print "Object $class with ID $id was found with total score ",
"$full_score and individual term scores:\n";
foreach my $term ( keys %{ $score_info } ) {
print " * $term: $score_info->{$term}\n";
}
}
DESCRIPTION
This is the base class for full-text indexers in OpenInteract2. All objects returned by the OpenInteract2::Context method fulltext_indexer()
will meet this interface.
METHODS
Public Interface
new( \%params )
Instantiates a new indexer with parameters \%params
.
You should not call this directly but instead get an indexer from the OpenInteract2::Context object:
# get the default indexer
my $indexer = CTX->fulltext_indexer;
# get a specific indexer
my $indexer = CTX->fulltext_indexer( 'soundex' );
add_to_index( $content_class, $content_id, \$content_text )
Indexes the text in the scalar reference \$content_text
, categorizing it with $content_class
and $content_id
. The text in \$content_text
is not modified by this operation.
While $content_class
is typically an SPOPS subclass, it does not have to be. The class merely has to be able to retrieve, identify and describe an object. To do this it must implement:
Class method: fetch( $id )
Returns an object with identifier
$id
.Object method: id()
Returns the identifier for an object.
Object method: object_description()
Should return a hashref with the keys as described in SPOPS under object_description().
refresh_index( $content_class, $content_id, \$content_ref )
Removes existing records from the index marked by $content_class
and $content_id
then indexes \$content_ref
.
remove_from_index( $content_class, $content_id )
Deletes all records from the index marked by $content_class
and $content_id
.
search_index( \%params )
Searches the index given the data in \%params
:
terms (\@)
Arrayref of terms to search for.
search_type ($): 'all' (default) or 'any'
Determines if matching records must have all or any of the given terms.
return_type ($): 'object' (default), 'iterator' or 'raw'
Determines what type of data to return.
Using 'object' means you get back an arrayref of two-item arrayrefs -- the first is the object, the second the match score.
Using 'iterator' means you get back a OpenInteract2::FullTextIterator object.
Using 'raw' means you get back an arrayref of four-item arrayrefs - the first is the class, the second the ID, the third the full-score for this match and the fourth a hashref of match scores the keys as the terms searched and the values the match score for that term. (Generally this is just a count of the number of occurrences, but implementations are free to do whatever they want.)
SUBCLASSING
Optional Methods
In addition to overriding the interface method search_index()
subclasses can implement:
init( \%params )
Gives you a chance to set values from \%params
in the object.
No return value necessary.
_screen_results( $search_type, $results, @search_terms )
Remove any records from $results
-- which is the return value from _run_search()
, below -- that do not correspond to $search_type
. The default implementation only acts when given a $search_type
of 'all', removing records that do not have matches for all the @search_terms
.
Return value should be an arrayref of the new results.
Mandatory Methods
Subclasses must implement:
add_to_index( $content_class, $content_id, \$content_ref )
remove_from_index( $content_class, $content_id )
_run_search( $search_type, @search_terms)
The $search_type
is either 'any' or 'all'. This should only return an arrayref of records like this:
[ $class, $id, full-score, { search-term => term-score, ... } ]
SEE ALSO
OpenInteract2::FullTextIterator
The 'full_text' package shipped with OI2.
COPYRIGHT
Copyright (c) 2004 Chris Winters. All rights reserved.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
AUTHORS
Chris Winters <chris@cwinters.com>