The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

TM::Index::Match - Topic Maps, Indexing support (match layer)

SYNOPSIS

    # somehow get a map (any subclass of TM will do)
    my $tm = ... 

    # one option: create a lazy index which learns as you go
    use TM::Index::Match;
    my $idx = new TM::Index::Match ($tm);
    
    # for most operations which involve match_forall to be called
    # reading and querying the map should be much faster

    # learn about some statistics, what keys are most likely to be useful
    my @optimized_keys = @{ $stats->{proposed_keys} };

    # another option: create an eager index
    my $idx = new TM::Index::Match ($tm, closed => 1);

    # pre-populate it, use the proposed keys
    $idx->populate (@optimized_keys);
    # this may be a lengthy operation if the map is big
    # but then the index is 'complete'

    # query map now, should also be faster

    # getting rid of an index explicitly
    $idx->detach;

    # cleaning an index
    $idx->discard;

DESCRIPTION

One performance bottleneck when using the TM package or any of its subclasses are the low-level query functions match_forall and match_exists. They are looking for assertions of a certain nature. Almost all high-level functions, and certainly TM::QL use these.

This package provides an indexing mechanism to speed up the match_* functions by caching some results in a very specific way. When an index is attached to a map, then it will intercept all queries going to these functions.

Open vs. Closed Index

There are two options:

open:

The default is to keep the index lazy. In this mode the index is empty at the start and it will learn more and more by its own. In this sense, the index lives under an open world assumption (hence the name), as the absence of information does not mean that there is no result.

closed:

A closed world index has to be populated to be useful. If a query is launched and the result is stored in the index, then it will be used, like for an open index. If no result in the index is found for a query, the empty result will be assumed.

Hash Technology

The default implementation uses an in-memory hash, no further fancy. Optionally, you can provide your own hash object. Also one which is tied to an DBM file, etc.

Map Attachment

To activate an index, you have to attach it to a map. This is done at constructor time.

It is possible (not sure how useful it is) to have one particular index to be attached to several different maps. It is not possible to have several TM::Index::Match indices attached to one map. Indices of a different nature (non-match related) are not affected.

INTERFACE

Constructor

The only mandatory parameter for the constructor is the map for which this index should apply. The map must be an instance of TM or any of its subclasses, otherwise an exception is the consequence. If the map already has an index of this nature, the constructor will fail with an exception as well.

Optional parameters are

closed (default: 0)

This controls whether the index is operating under closed or open world assumptions.

cache (default: {})

You optionally can pass in your own HASH reference.

Example:

   my $idx = new TM::Index::Match ($tm)

NOTE: When the index object goes out of scope, the destructor will make the index detach itself from the map. Unfortunately, the exact moment when this happens is somehow undefined in Perl, so it is better to do this manually at the end.

Example:

   {
    my $idx2 = new TM::Index::Match ($tm, closed => 1);
    ....
    } # destructor called and index detaches automatically, but only in theory

   {
    my $idx2 = new TM::Index::Match ($tm, closed => 1);
    ....
    $idx2->detach; # better do things yourself
    }

Methods

detach

$idx->detach

Makes the index detach safely from the map. The map is not harmed in this process.

populate

$idx->populate (@list_of_keys)

To populate the index with canned results this method can be invoked. At this stage it is not very clever and may take quite some time to work its way through a larger map. This is most likely something to be done in the background.

The list of keys to be passed in is a bit black magic. Your current best bet is to look at the index statistics method, and retrieve a proposed list from there:

   @optimized_keys = @{ $stats->{proposed_keys} };

   $idx->populate (@optimized_keys[0..2]); # only take the first few

If this list is empty, nothing clever will happen.

discard

$idx->discard

This throws away the index content.

statistics

$hashref = $idx->statistics

This returns a hash containing statistical information about certain keys, how much data is behind them, how often they are used when adding information to the index, how often data is read out successfully. The cost component can give you an estimated about the cost/benefit.

SEE ALSO

TM

COPYRIGHT AND LICENSE

Copyright 200[6] by Robert Barta, <drrho@cpan.org>

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.