NAME
XAO::Indexer -- Full text data indexing for XAO::FS
SYNOPSIS
my $keywords=$cgi->param('keywords');
my $cn_index=$odb->fetch('/Indexes/customer_names');
my $sr=$cn_index->search_by_string('name',$keywords);
DESCRIPTION
XAO Indexer allows to build an optimised external index to collections of data stored in a XAO::FS database and then perform keyword based searches.
It is being used with great success on collection of millions of records on some sites, probably most notably on http://ISBNdb.com/ where it powers all the searches.
PROBLEM & SOLUTION
Searches are limited to just keywords, but allow to find many keywords in a specific sequence or just many keywords that belong to a specific collection, but could be in different properties of different objects.
To perform the same kind of search on just two properties of an object with two possible keywords a join similar to the following is required:
( (property1 match keyword1) and (property1 match keyword2) ) or
( (property1 match keyword1) and (property2 match keyword2) ) or
( (property2 match keyword1) and (property2 match keyword2) ) or
( (property2 match keyword1) and (property1 match keyword2) )
With bigger number of keywords and properties the expression becomes too big to be efficiently handled by SQL server and in some cases probably to be even parsed normally by an SQL server.
In addition, such keyword searches are not optimised in SQL databases usually and frequently involve full table scans.
XAO Indexer solves this problem by pre-building a specially formatted index table that has results for specific keywords. As an additional benefit it allows to get results pre-sorted using some (possibly computed) criteria without any performance impact.
It needs to be mentioned though, that XAO Indexer is not integrated with the collection it builds index for in any way. It has to be maintained and updated manually and can return IDs of objects that no longer exist in the database.
The process of re-building indexes can take significant time depending on the content of source collection. In our tests it takes approximately 5 minutes to build an index based on 60,000 records 5..50 fields per record spread over 3 or more related objects (products, categories and specifications).
STRUCTURE
XAO::Indexer is a stub module that only holds common documentation that you are reading now. Real functionality is provided by:
- XAO::DO::Data::Index
-
This is a XAO FS Hash object that gets stored into some container in your database, usually /Indexes. It provides wrapper methods to all indexing functionality, see XAO::DO::Data::Index for details.
Most of the time you will interact with this object in your code. Something like:
my $keywords=$cgi->param('keywords'); my $cn_index=$odb->fetch('/Indexes/customer_names'); my $sr=$cn_index->search('name',$keywords);
- XAO::DO::Indexer::Base
-
This is the core of XAO Indexer -- a base class for derived data collection specific indexers. Usually it is enough to override just a couple of its methods -- analyze_object(), get_collection() and get_orderings(). See XAO::DO::Indexer::Base for details.
- xao-indexer script
-
Provides command-line functions to create, update and delete indexes. Provides also a simple search functionality intended for debugging purposes mainly.
AUTHORS
Copyright (c) 2005 Andrew Maltsev
Copyright (c) 2003-2004 Andrew Maltsev, XAO Inc.
<am@ejelta.com> -- http://ejelta.com/xao/
SEE ALSO
Recommended reading: XAO::DO::Data::Index, XAO::DO::Indexer::Base, XAO::FS, XAO::Web.