NAME

KinoSearch::Schema -- User-created specification for an inverted index.

SYNOPSIS

First, create a subclass of KinoSearch::Schema which describes the structure of your inverted index.

# define fields by subclassing KinoSearch::Schema::FieldSpec

package MySchema::title;
use base qw( KinoSearch::Schema::FieldSpec );

package MySchema::content;
use base qw( KinoSearch::Schema::FieldSpec );

# subclass KinoSearch::Schema to finish your specification

package MySchema;
use base qw( KinoSearch::Schema );
use KinoSearch::Analysis::PolyAnalyzer;

__PACKAGE__->init_fields(qw( title content ));

sub analyzer { 
    return KinoSearch::Analysis::PolyAnalyzer->new( language => 'en' );
}

Use the subclass in an indexing script...

use MySchema;
my $invindexer = KinoSearch::InvIndexer->new( 
    invindex => MySchema->clobber('/path/to/invindex'),
);

Use it again at search-time...

use MySchema;
my $searcher = KinoSearch::Searcher->new( 
    invindex => MySchema->open('/path/to/invindex')
);

DESCRIPTION

A Schema is a blueprint specifying how other entities should interpret the raw data in an invindex and interact with it. It's akin to an SQL table definition, but implemented using only Perl code.

Subclassing

KinoSearch::Schema is an abstract class. To use it, you must provide your own subclass.

Every Schema subclass must meet two requirements. It must call init_fields(), and it must provide an implementation of analyzer().

Always use the same Schema

The same Schema must always be used with any given invindex. If you tell an InvIndexer to build an invindex using a given Schema, then lie about what the InvIndexer did by supplying your Searcher with either a modified version or a completely different Schema, you'll either get incorrect results or a crash.

Once an actual index has been created using a particular Schema, existing fields may not be removed and their definitions may not be changed. However, it is possible to add new fields during subsequent indexing sessions.

CLASS METHODS

init_fields

package MySchema;
__PACKAGE__->init_fields(qw( title content ));

Takes a list of field names as arguments. For each field name, KinoSearch verifies that a corresponding subclass of KinoSearch::Schema::FieldSpec has been loaded and registers it with the Schema subclass.

The FieldSpec subclass names are derived by combining the Schema's class name with the field name -- for instance, in the above example they would be named "MySchema::title" and "MySchema::content".

analzyer

sub analyzer {
    return KinoSearch::Analysis::PolyAnalyzer->new( language => 'en' );
}

Abstract method. Implementations must return an object which isa KinoSearch::Analysis::Analyzer, which will be used to parse and process field content. Individual fields can override this default by providing their own analyzer().

similarity

sub similarity { KinoSearch::Contrib::LongFieldSim->new }

By default, returns a KinoSearch::Search::Similarity object. If you wish to change scoring behavior by supplying your own subclass of Similarity, override this method.

CONSTRUCTOR

new

my $schema = MySchema->new;
my $folder = KinoSearch::RAMFolder->new;
my $invindex = KinoSearch::InvIndex->create(
    schema => $schema,
    folder => $folder,
);

new() returns an instance of your schema subclass.

Most of the time, you won't need to call new() explicitly, as it is called internally by the factory methods described below.

FACTORY METHODS

A Schema is just a blueprint, so it's not very useful on its own. What you need is an InvIndex built according to your Schema, whose content you can manipulate and search.

These three factory methods return an InvIndex object representing an index on your file system at the filepath you specify.

create

my $invindex = MySchema->create('/path/to/invindex');

Create a directory and initialize a new invindex at the specified location. Fails if the directory already exists and contains files.

clobber

my $invindex = MySchema->clobber('/path/to/invindex');

Similar to create, but if the specified directory already exists, first attempts to delete any files within it that look like index files.

open

my $invindex = MySchema->open('/path/to/invindex');

Open an existing invindex for either reading or updating.

COPYRIGHT

Copyright 2007 Marvin Humphrey

LICENSE, DISCLAIMER, BUGS, etc.

See KinoSearch version 0.20_01.