Algorithm::Classifier::IsolationForest

Isolation Forest (Liu, Fei Tony & Ting, Kai & Zhou, Zhi-Hua, 2008) detects anomalies by random partitioning rather than by modelling normal points. Each tree repeatedly splits the data. Points that get isolated after only a few splits are likely anomalies. The score is the average isolation depth across many trees, normalised so values approach 1 for anomalies and stay below 0.5 for normal points.

In extended mode the module implements the Extended Isolation Forest variant. Each split is a random hyperplane instead of an axis-aligned cut, which removes the rectangular, axis-aligned bias in the score field and tends to help on elongated or multi-modal data.

use Algorithm::Classifier::IsolationForest;

my @data = ([0.1, -0.2], [0.0, 0.1], [5.0, 6.0], ...);

# Classic, axis-parallel Isolation Forest
my $iforest = Algorithm::Classifier::IsolationForest->new(
    n_trees     => 100,
    sample_size => 256,
    seed        => 42,
);
$iforest->fit(\@data);

my $scores = $iforest->score_samples(\@data);  # arrayref, each in (0,1]
my $flags  = $iforest->predict(\@data, 0.6);    # arrayref of 0/1

# Save and reload
$iforest->save('model.json');
my $reloaded = Algorithm::Classifier::IsolationForest->load('model.json');

# Extended Isolation Forest (oblique hyperplane splits)
my $eif = IsolationForest->new(mode => 'extended', seed => 42);
$eif->fit(\@data);

Performance options

A handful of constructor / method-level knobs unlock measurable speedups for specific workloads. All of them are no-ops when the optional Inline::C backend is absent.

parallel_fit => N — fork-based parallel training

Builds the n_trees across N forked workers (Unix-like platforms; no-op elsewhere). Each worker gets a derived RNG seed, so parallel fits are reproducible across runs at fixed worker count — though the trees differ from a serial fit with the same seed, because the RNG draws happen in a different order. Inference results are unaffected.

my $f = Algorithm::Classifier::IsolationForest->new(
    n_trees      => 200,
    sample_size  => 256,
    seed         => 42,
    parallel_fit => 4,       # 4 forked workers
)->fit(\@training_data);

pack_data — score the same dataset many times faster

pack_data returns an opaque wrapper that the scoring methods accept directly, skipping the per-call walk over the arrayref-of-arrayrefs. Use it when the same dataset is scored repeatedly (interactive threshold tuning, dashboards, plotting that updates as parameters change).

my $packed = $f->pack_data(\@data);
my $scores = $f->score_samples($packed);
my $flags  = $f->predict($packed, 0.6);
my ($s, $l) = $f->score_predict_split($packed);  # two flat arrayrefs

score_predict_split — get scores + labels without the AV-of-AVs

When you want both anomaly scores and 0/1 labels but don't need them paired together row-by-row, score_predict_split returns the two as flat arrayrefs and skips the ~2 * n_pts SV allocations that the classic score_predict_samples shape requires.

my ($scores, $labels) = $f->score_predict_split(\@data, 0.6);

Native acceleration (Inline::C, OpenMP, SIMD)

The scoring hot path (score_samples, predict, path_lengths, score_predict_samples, score_predict_split) is automatically accelerated through Inline::C when it is installed and a working C compiler is present. On top of that:

Detection happens once at module load and is cached under _Inline/. None of these dependencies are required: without them the module falls back to a pure-Perl implementation that produces identical results, just slower.

Check which backend is active on your machine:

iforest accel

Sample output on a host with everything wired up:

Algorithm::Classifier::IsolationForest acceleration status
  Inline::C : available
  OpenMP    : available
  SIMD      : available

Active backend: Inline::C with OpenMP + SIMD

User code that wants to introspect the active backend can read three package variables:

$Algorithm::Classifier::IsolationForest::HAS_C       # 0/1
$Algorithm::Classifier::IsolationForest::HAS_OPENMP  # 0/1
$Algorithm::Classifier::IsolationForest::HAS_SIMD    # 0/1

Install

Source

perl Makefile.PL
make
make test
make install

FreeBSD

pkg install p5-App-Cmd p5-File-Slurp p5-App-cpanminus \
            p5-Inline p5-Inline-C gcc
cpanm Algorithm::Classifier::IsolationForest

gcc ships with libgomp and provides the OpenMP runtime; the system clang does not by default. p5-Inline-C is what makes the C backend build at module load.

Debian

apt-get install libapp-cmd-perl libfile-slurp-perl cpanminus \
                libinline-c-perl gcc
cpanm Algorithm::Classifier::IsolationForest

libinline-c-perl brings in libinline-perl. gcc pulls in libgomp1 (the OpenMP runtime), which is what enables the parallel tree-walk. Both dependencies are optional — leave them out and the module installs and runs in pure-Perl mode.