Algorithm::Classifier::IsolationForest
Isolation Forest (Liu, Fei Tony & Ting, Kai & Zhou, Zhi-Hua, 2008) detects anomalies by random partitioning rather than by modelling normal points. Each tree repeatedly splits the data. Points that get isolated after only a few splits are likely anomalies. The score is the average isolation depth across many trees, normalised so values approach 1 for anomalies and stay below 0.5 for normal points.
In extended mode the module implements the Extended Isolation Forest variant. Each split is a random hyperplane instead of an axis-aligned cut, which removes the rectangular, axis-aligned bias in the score field and tends to help on elongated or multi-modal data.
use Algorithm::Classifier::IsolationForest;
my @data = ([0.1, -0.2], [0.0, 0.1], [5.0, 6.0], ...);
# Classic, axis-parallel Isolation Forest
my $iforest = Algorithm::Classifier::IsolationForest->new(
n_trees => 100,
sample_size => 256,
seed => 42,
);
$iforest->fit(\@data);
my $scores = $iforest->score_samples(\@data); # arrayref, each in (0,1]
my $flags = $iforest->predict(\@data, 0.6); # arrayref of 0/1
# Save and reload
$iforest->save('model.json');
my $reloaded = Algorithm::Classifier::IsolationForest->load('model.json');
# Extended Isolation Forest (oblique hyperplane splits)
my $eif = IsolationForest->new(mode => 'extended', seed => 42);
$eif->fit(\@data);
Performance options
A handful of constructor / method-level knobs unlock measurable speedups for specific workloads. All of them are no-ops when the optional Inline::C backend is absent.
parallel_fit => N — fork-based parallel training
Builds the n_trees across N forked workers (Unix-like platforms; no-op
elsewhere). Each worker gets a derived RNG seed, so parallel fits are
reproducible across runs at fixed worker count — though the trees
differ from a serial fit with the same seed, because the RNG draws
happen in a different order. Inference results are unaffected.
my $f = Algorithm::Classifier::IsolationForest->new(
n_trees => 200,
sample_size => 256,
seed => 42,
parallel_fit => 4, # 4 forked workers
)->fit(\@training_data);
pack_data — score the same dataset many times faster
pack_data returns an opaque wrapper that the scoring methods accept
directly, skipping the per-call walk over the arrayref-of-arrayrefs.
Use it when the same dataset is scored repeatedly (interactive threshold
tuning, dashboards, plotting that updates as parameters change).
my $packed = $f->pack_data(\@data);
my $scores = $f->score_samples($packed);
my $flags = $f->predict($packed, 0.6);
my ($s, $l) = $f->score_predict_split($packed); # two flat arrayrefs
score_predict_split — get scores + labels without the AV-of-AVs
When you want both anomaly scores and 0/1 labels but don't need them
paired together row-by-row, score_predict_split returns the two as
flat arrayrefs and skips the ~2 * n_pts SV allocations that the
classic score_predict_samples shape requires.
my ($scores, $labels) = $f->score_predict_split(\@data, 0.6);
Native acceleration (Inline::C, OpenMP, SIMD)
The scoring hot path (score_samples, predict, path_lengths,
score_predict_samples, score_predict_split) is automatically
accelerated through Inline::C
when it is installed and a working C compiler is present. On top of
that:
- if the toolchain accepts
-fopenmpand can link againstlibgomp, the per-point tree walk runs in parallel across all available CPU cores using OpenMP; - on OpenMP 4.0+ compilers the extended-mode oblique dot product is
vectorised via
#pragma omp simd— substantially faster for high-feature-count extended models.
Detection happens once at module load and is cached under _Inline/.
None of these dependencies are required: without them the module falls
back to a pure-Perl implementation that produces identical results,
just slower.
Check which backend is active on your machine:
iforest accel
Sample output on a host with everything wired up:
Algorithm::Classifier::IsolationForest acceleration status
Inline::C : available
OpenMP : available
SIMD : available
Active backend: Inline::C with OpenMP + SIMD
User code that wants to introspect the active backend can read three package variables:
$Algorithm::Classifier::IsolationForest::HAS_C # 0/1
$Algorithm::Classifier::IsolationForest::HAS_OPENMP # 0/1
$Algorithm::Classifier::IsolationForest::HAS_SIMD # 0/1
Install
Source
perl Makefile.PL
make
make test
make install
FreeBSD
pkg install p5-App-Cmd p5-File-Slurp p5-App-cpanminus \
p5-Inline p5-Inline-C gcc
cpanm Algorithm::Classifier::IsolationForest
gcc ships with libgomp and provides the OpenMP runtime; the system
clang does not by default. p5-Inline-C is what makes the C backend
build at module load.
Debian
apt-get install libapp-cmd-perl libfile-slurp-perl cpanminus \
libinline-c-perl gcc
cpanm Algorithm::Classifier::IsolationForest
libinline-c-perl brings in libinline-perl. gcc pulls in libgomp1
(the OpenMP runtime), which is what enables the parallel tree-walk. Both
dependencies are optional — leave them out and the module installs and
runs in pure-Perl mode.