NAME
OpenInteract2::FullTextRules - Rules for automatically indexing SPOPS objects
SYNOPSIS
# In object's spops.ini file tell OI2 you want your objects to be
# indexed; with this all 'save()' calls to the object will trigger
# the object's 'description' and 'title' fields being indexed.
[myobj]
is_searchable = yes
fulltext_field = description
fulltext_field = title
METHODS
SPOPS Ruleset
ruleset_add( $class, \%ruleset_table )
Adds the necessary rules to the $class that puts this class in its ISA. Currently, these rules consist of:
post_save_action: reindex this object -- first obliterate all references in the index, then build the references anew (called on both INSERTs and UPDATEs)
post_remove_action: remove all references to this object from the index
Internal
_indexable_object_text()
Gets the text out of the object to index. Currently, we treat all text from the object as one big field.
Note that if you have defined 'fulltext_pre_index_method' as a configuration item in your class it is called before indexing. This is useful if you have a method to fetch external data into your object.
_tokenize( $text )
Breaks text down into tokens. This process is very simple. First we break the text into words, then we lower case each word, then we 'stem' each word. Here is a brief description of stemming:
Truncation - Also referred to as "root/suffix management" or
"Stemming" or "Word Stemming", truncation allows some search engines
to recognize and shorten long words such as "plants" or "boating" to
their root words (or word stems) "plant" and "boat." This makes
searching for such words much easier because it is not necessary to
consider every permutation of that word when trying to find it.1 In a
search, the ability to enter the first part of a keyword, insert a
symbol (usually *), and accept any variant spellings or word endings,
from the occurrence of the symbol forward (e.g., femini* retrieves
feminine, feminism, feminism, etc.).3 See also word variants, plurals
and singulars.
(From: http://ollie.dcccd.edu/library/Module2/Books/concepts.htm)
We use the Lingua::Stem module for this, which implements the Porter algorithm for stemming, as do most implementations, apparently. (This is something that this class treats as a black box itself :)
Parameters:
text ($)
Text to tokenize
SEE ALSO
OpenInteract2::FullTextIndexer in the 'full_text' package
COPYRIGHT
Copyright (c) 2004 Chris Winters. All rights reserved.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
AUTHORS
Chris Winters <chris@cwinters.com>