NAME

KinoSearch::Schema - User-created specification for an inverted index.

SYNOPSIS

First, create a subclass of KinoSearch::Schema which describes the structure of your inverted index.

package MySchema;
use base qw( KinoSearch::Schema );
use KinoSearch::Analysis::PolyAnalyzer;

our %fields = (
    title   => 'text',
    content => 'text',
);

sub analyzer { 
    return KinoSearch::Analysis::PolyAnalyzer->new( language => 'en' );
}

Use the subclass in an indexing script...

use MySchema;
my $invindexer = KinoSearch::InvIndexer->new( 
    invindex => MySchema->clobber('/path/to/invindex'),
);

Use it again at search-time...

use MySchema;
my $searcher = KinoSearch::Searcher->new( 
    invindex => MySchema->read('/path/to/invindex')
);

DESCRIPTION

A Schema is a blueprint specifying how other entities should interpret the raw data in an inverted index and interact with it. It's akin to an SQL table definition, but implemented using only Perl code.

Subclassing

KinoSearch::Schema is an abstract class. To use it, you must provide your own subclass.

Every Schema subclass must meet two requirements: it must declare a %fields hash, and it must provide an implementation of analyzer().

Always use the same Schema

The same Schema must always be used with any given invindex. If you tell an InvIndexer to build an invindex using a given Schema, then lie about what the InvIndexer did by supplying your Searcher with either a modified version or a completely different Schema, you'll either get incorrect results or a crash.

Once an actual index has been created using a particular Schema, existing fields may not be associated with new FieldSpec subclasses and their definitions may not be changed. However, it is possible to add new fields during subsequent indexing sessions.

CLASS VARIABLES

%fields

Every Schema subclass must declare a %fields hash using our (not my). Each key in the hash is a field name, and each value must be either

1 a natively supported type, or
2 a class name identifying a class which isa KinoSearch::FieldSpec.

At present, there is only one natively supported type: text. The FieldSpec subclass which determines its behavior is KinoSearch::FieldSpec::text. However, all lower-case-only names are reserved.

package UnAnalyzedField;
use base qw( KinoSearch::FieldSpec::text );
sub analyzed { 0 }

package MySchema;
use base qw( KinoSearch::Schema );

our %fields = (
    title   => 'text',
    content => 'text',
    url     => 'UnAnalyzedField',
);

new() uses the contents of %fields as a base set when initializing each new Schema object. Additional fields may be be added subsequently to individual objects using add_field().

CLASS METHODS

analyzer

sub analyzer {
    return KinoSearch::Analysis::PolyAnalyzer->new( language => 'en' );
}

Abstract method. Implementations must return an object which isa KinoSearch::Analysis::Analyzer, which will be used to parse and process field content. Individual fields can override this default by providing their own analyzer().

similarity

sub similarity { KSx::Search::LongFieldSim->new }

Expert API. By default, returns a KinoSearch::Search::Similarity object. If you wish to change scoring behavior by supplying your own subclass of Similarity, override this method.

pre_sort

sub pre_sort { 
    my %spec  = ( field => 'price', reverse => 1 );
    return \%spec;
}

Expert, experimental API. Used only in conjunction with Searcher->set_prune_factor. Causes documents to be prioritized for scoring according to their value for the specified field. Ordinarily all documents are scored so the sort order is immaterial, but if you stop sooner -- that is, when search results are "pruned" -- the sort order matters.

CONSTRUCTOR

new

my $schema = MySchema->new;
my $folder = KinoSearch::RAMFolder->new;
my $invindex = KinoSearch::InvIndex->clobber(
    schema => $schema,
    folder => $folder,
);

new() returns an instance of your schema subclass.

Most of the time, you won't need to call new() explicitly, as it is called internally by the factory methods described below.

FACTORY METHODS

A Schema is just a blueprint, so it's not very useful on its own. What you need is an InvIndex built according to your Schema, whose content you can manipulate and search.

The following factory methods return an InvIndex object representing an index on your file system at the filepath you specify. If they are invoked as instance methods by Schema object, they use that object; when invoked as class methods, a new Schema instance is created.

clobber

my $invindex = MySchema->clobber('/path/to/invindex');
my $invindex = $schema->clobber('/path/to/invindex');

Create a directory and initialize a new invindex at the specified location. If the specified directory already exists, first attempts to delete any files within it that look like index files.

open

my $invindex = MySchema->open('/path/to/invindex');
my $invindex = $schema->open('/path/to/invindex');

Open an invindex for reading/writing, creating a new one if needed. All fields which have ever been defined for this invindex will be loaded/verified via add_field().

read

my $invindex = MySchema->read('/path/to/invindex');
my $invindex = $schema->read('/path/to/invindex');

Open an invindex for either reading or updating. Fails if the invindex doesn't exist. All fields which have ever been defined for this invindex will be loaded/verified via add_field().

INSTANCE METHODS

add_field

$schema->add_field( foo => 'text' );

Add a field to an individual schema object.

Calling add_field multiple times against the same field name is fine, but the name of the supported field type or FieldSpec subclass must always be the same or an exception will be thrown.

COPYRIGHT

Copyright 2007 Marvin Humphrey

LICENSE, DISCLAIMER, BUGS, etc.

See KinoSearch version 0.20.