Sponsoring The Perl Toolchain Summit 2025: Help make this important event another success Learn more

NAME

KSx::Simple - Basic search engine.

SYNOPSIS

First, build an index of your documents.

my $index = KSx::Simple->new(
path => '/path/to/index/'
language => 'en',
);
while ( my ( $title, $content ) = each %source_docs ) {
$index->add_doc({
title => $title,
content => $content,
});
}

Later, search the index.

my $total_hits = $index->search(
query => $query_string,
offset => 0,
num_wanted => 10,
);
print "Total hits: $total_hits\n";
while ( my $hit = $index->next ) {
print "$hit->{title}\n",
}

DESCRIPTION

KSx::Simple is a stripped-down interface for the KinoSearch search engine library.

METHODS

new

my $index = KSx::Simple->new(
path => '/path/to/index/',
language => 'en',
);

Create a KSx::Simple object, which can be used for both indexing and searching. Two hash-style parameters are required.

  • path - Where the index directory should be located. If no index is found at the specified location, one will be created.

  • language - The language of the documents in your collection, indicated by a two-letter ISO code. 12 languages are supported:

    |-----------------------|
    | Language | ISO code |
    |-----------------------|
    | Danish | da |
    | Dutch | nl |
    | English | en |
    | Finnish | fi |
    | French | fr |
    | German | de |
    | Italian | it |
    | Norwegian | no |
    | Portuguese | pt |
    | Spanish | es |
    | Swedish | sv |
    | Russian | ru |
    |-----------------------|

add_doc

$index->add_doc({
location => $url,
title => $title,
content => $content,
});

Add a document to the index. The document must be supplied as a hashref, with field names as keys and content as values.

my $total_hits = $index->search(
query => $query_string, # required
offset => 40, # default 0
num_wanted => 20, # default 10
);

Search the index. Returns the total number of documents which match the query. (This number is unlikely to match num_wanted.)

  • query - A search query string.

  • offset - The number of most-relevant hits to discard, typically used when "paging" through hits N at a time. Setting offset to 20 and num_wanted to 10 retrieves hits 21-30, assuming that 30 hits can be found.

  • num_wanted - The number of hits you would like to see after offset is taken into account.

BUGS

Not thread-safe.

COPYRIGHT AND LICENSE

Copyright 2007-2011 Marvin Humphrey

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.