NAME

Data::SpatialHash::Shared - Shared-memory spatial hash index for Linux

SYNOPSIS

use Data::SpatialHash::Shared;

# 100k-entry map, auto-sized buckets, 1.0 world-unit cells
my $s = Data::SpatialHash::Shared->new(undef, 100_000, 0, 1.0);

my $h = $s->insert(10.5, 20.5, 42);      # 2D point id=42 -> handle
$s->move($h, 11.0, 21.0);                # entity moved
my @near = $s->query_radius(10, 20, 5);  # ids within radius 5
my @nn   = $s->query_knn(10, 20, 3);     # 3 nearest ids

my $p = $s->insert(1, 2, 3, 7);          # 3D point id=7
$s->each_in_radius(0, 0, 0, 10, sub { my ($id) = @_; });

$s->remove($h);
my $stats = $s->stats;                   # { count => ..., max_chain => ... }

# toroidal world + a one-call collision broad-phase (e.g. a game tick)
my $w = Data::SpatialHash::Shared->new(undef, 100_000, 0, 16, wrap => [2000, 2000]);
my $a = $w->insert(5, 5, 1);  $w->set_radius($a, 3);   # actor 1, interaction radius 3
$w->move_many([ [$a, 6, 6] ]);                          # bulk-reposition each tick
my $near = $w->query_radius_many([ [6, 6, 3] ]);        # batched neighbor queries, one lock
$w->each_colliding_pair(sub { my ($x, $y) = @_; });     # every colliding pair, seam-aware

# spherical world (planet): geo proximity + cube-sphere chunk/LOD ids
my $g = Data::SpatialHash::Shared->new(undef, 100_000, 0, 1000, sphere => 6_371_000);
my $e = $g->insert_geo(0.81, 0.21, 500, 1);             # lat/lon radians, 500 m altitude
my @hit = $g->query_geo_radius(0.81, 0.21, 500, 2000);  # within 2 km (true 3D distance)
my $chunk = $g->cube_cell_geo(0.81, 0.21, 12);          # level-12 cube-sphere cell id

DESCRIPTION

Sparse spatial hash in shared memory: an unbounded Euclidean grid by default, or a bounded seamless torus (see "Toroidal space"). Positions and values are stored in a flat array of entries; a hash table of buckets maps cell coordinates to entry chains. Coordinates are arbitrary floats in 2D or 3D; for 2D use, simply omit the z argument (it defaults to 0). For planets and other spherical worlds, geo helpers map latitude/longitude/altitude to 3D and a cube-sphere cell scheme provides chunk and level-of-detail ids (see "Spherical worlds").

Multiple processes can map the same hash and read and write it concurrently; access is serialized by a write-preferring futex rwlock that recovers automatically if a lock holder dies (see "CRASH SAFETY").

Linux-only. Requires 64-bit Perl.

Coordinate model

Space is divided into axis-aligned cells of size cell_size. A point at (x, y, z) lands in cell (floor(x/cell_size), floor(y/cell_size), floor(z/cell_size)). Spatial queries walk all cells that overlap the query region; smaller cells reduce false positives at the cost of more bucket lookups.

2D vs 3D

All methods that accept coordinates accept either (x, y) or (x, y, z). When z is omitted it is treated as 0. A handle created with a 2D insert can be queried with either 2D or 3D calls.

Toroidal space

By default the grid is unbounded and Euclidean. Construct with wrap => [$Wx, $Wy] (2D) or [$Wx, $Wy, $Wz] (3D) to make space a seamless torus: neighbour-cell expansion wraps around the grid edges and query_radius, query_knn, each_in_radius, query_radius_many, and the pair emitters all use the minimum-image (shortest wrapping) distance

dx = abs(ax - bx);  dx = $Wx - dx if dx > $Wx / 2;   # per axis

so an entry near 0 and one near $Wx are neighbours. Keep positions within [0, $W) per axis for the metric to be meaningful. Each wrapped extent must be a positive multiple of cell_size so the cells tile the world exactly, otherwise the constructor croaks. query_cell resolves its coordinate to a wrapped cell, but query_aabb uses the literal (non-wrapping) box even in a wrapping world. The wrap configuration is part of the mapped format and is restored on reopen; the world accessor returns the extents.

METHODS

Constructors

my $s = Data::SpatialHash::Shared->new($path, $max, $buckets, $cell);
my $s = Data::SpatialHash::Shared->new(undef, $max, $buckets, $cell);
my $s = Data::SpatialHash::Shared->new_memfd($name, $max, $buckets, $cell);
my $s = Data::SpatialHash::Shared->new_from_fd($fd);
my $s = Data::SpatialHash::Shared->new($path, $max, $buckets, $cell, wrap => [$Wx, $Wy]);

$path is the backing file path; undef creates an anonymous mapping. $max is the maximum number of entries. $buckets is the bucket count (0 = auto). $cell is the cell size (float).

Pass wrap => [$Wx, $Wy] (or [$Wx, $Wy, $Wz]) to make the world a seamless torus of those extents (see "Toroidal space"); omit it for an unbounded Euclidean space.

When reopening an existing backing file or memfd, the stored header wins: the caller's $max, $buckets, $cell, wrap, and sphere arguments are ignored and the file's original values are used.

new_memfd creates a Linux memfd (anonymous but transferable via memfd file descriptor). new_from_fd reopens an existing memfd in another process.

Mutators

my $h = $s->insert(x, y, value);        # 2D insert -- returns handle or undef
my $h = $s->insert(x, y, z, value);     # 3D insert
my $h = $s->insert(x, y, z, value, r);  # 3D insert with an interaction radius
$s->set_radius($h, $r);                 # set/replace an entry's radius
$s->move($h, x, y);                     # relocate entry (2D)
$s->move($h, x, y, z);                  # relocate entry (3D)
$s->remove($h);                         # free entry slot
$s->set_value($h, $v);                  # update stored value
$s->clear;                              # remove all entries
my @ids = $s->insert_many([ [x,y,value], [x,y,value,r], ... ]);  # bulk insert
my $n   = $s->move_many([ [handle,x,y], [handle,x,y,z], ... ]);  # bulk move

insert returns a handle (opaque integer) on success, or undef if max_entries is exhausted. move and remove return true on success, or false if the handle is invalid or already removed. set_value instead croaks on an invalid or freed handle; it and clear return nothing.

Each entry may carry an interaction radius (default 0; must be finite and non-negative), used by each_colliding_pair. Set it with the 5-argument insert or with set_radius (which croaks on an invalid or freed handle); for a 2D entry with a radius, insert then call set_radius. insert_many and move_many apply a whole batch under a single lock acquisition -- each row is an arrayref. insert_many inserts 2D entries (rows [x,y,value] or [x,y,value,radius]; use insert in a loop for 3D) and returns the list of handles, with undef for any row that overflowed the pool, was malformed (not an arrayref of length 3 or 4), or carried a negative or non-finite radius. move_many takes [handle,x,y] or [handle,x,y,z] rows and returns the count successfully moved; freed/invalid handles and malformed rows are skipped.

Handles are entry slot indices starting at 0, and 0 is false in Perl. The very first insert into a fresh hash returns handle 0. Always test the result with defined $h, never for truthiness:

my $h = $s->insert($x, $y, $v);
die "full" unless defined $h;   # correct: handle 0 is valid
# WRONG: "unless $h" would treat the first handle (0) as failure

Accessors

$s->has($h);              # true if handle is live
$s->value($h);            # stored value
$s->get_radius($h);       # stored interaction radius (0 if unset)
my ($x, $y, $z) = $s->position($h);   # current position

has is the safe predicate for a possibly-freed handle; value, position, and get_radius croak on an invalid or freed handle.

Queries

Most query methods return a list of stored values (not handles) for matching entries; query_radius_many instead returns an arrayref of id-list arrayrefs (see "Batched radius queries"). For query_knn, results are in nearest-first order.

my @ids = $s->query_radius(x, y, r);         # 2D radius search
my @ids = $s->query_radius(x, y, z, r);      # 3D radius search
my @ids = $s->query_aabb(x0, y0, x1, y1);    # 2D axis-aligned box
my @ids = $s->query_aabb(x0, y0, z0, x1, y1, z1);  # 3D box
my @ids = $s->query_cell(x, y);              # single cell (2D)
my @ids = $s->query_cell(x, y, z);           # single cell (3D)
my @ids = $s->query_knn(x, y, k);            # k nearest (2D)
my @ids = $s->query_knn(x, y, z, k);         # k nearest (3D)
$s->each_in_radius(x, y, r, sub { my ($v) = @_; ... });    # 2D cb
$s->each_in_radius(x, y, z, r, sub { my ($v) = @_; ... }); # 3D cb
my $lists = $s->query_radius_many([ [x,y,r], [x,y,z,r] ]); # N radius queries, one lock

query_radius and each_in_radius require a finite, non-negative radius, and query_knn requires k to be at least 1; out-of-range values croak. The radius is inclusive (a point at exactly the radius matches), unlike the strict each_pair_within; query_geo_radius is inclusive too, as are query_aabb's box edges.

each_in_radius snapshots the matching values under the read lock, then invokes the callback once per value after the lock is released. Because the lock is dropped before any callback runs, the callback may safely call back into the same map (for example has or move) without deadlock.

Batched radius queries

my $lists = $s->query_radius_many([ [x, y, r], [x, y, z, r], ... ]);
# $lists->[i] is an arrayref of ids, == [ $s->query_radius(@{ $queries->[i] }) ]

query_radius_many runs a whole batch of radius queries under a single read lock and returns an arrayref of id-list arrayrefs, one per query in input order. Each row is [x, y, r] (2D) or [x, y, z, r] (3D); a malformed row (not a 3- or 4-element arrayref, or a negative or non-finite r) yields an empty list for that slot, siblings unaffected -- it cannot croak mid-batch while holding the lock, mirroring insert_many. A region-too-large or out-of-memory condition from any query still croaks (after freeing the partial result). This is purely a lock-amortization win for callers issuing many queries per critical section (for example a per-tick collision broad-phase across many actors): one rdlock/rdunlock pair for the batch instead of one per query.

Collision pairs

$s->each_pair_within($max_r, sub { my ($va, $vb) = @_; ... });
$s->each_colliding_pair(sub { my ($va, $vb) = @_; ... });

each_pair_within invokes the callback once for every unordered pair of entries whose centre distance is less than $max_r. each_colliding_pair instead pairs entries whose centre distance is less than the sum of their two radii (see set_radius) -- a heterogeneous-radius collision test computed in a single grid walk, so a small cell_size stays correct even for large-radius entries. Both emit each pair exactly once, are seam-aware in a toroidal world, and -- like each_in_radius -- snapshot under the read lock then run the callback with the lock released (so it may mutate the map). The callback arguments are the stored values of the pair. Distances are 3D when any entry has a non-zero z (or the world wraps in z), otherwise 2D. each_pair_within croaks on a negative or non-finite $max_r.

Region query cost

The cost of a region query scales with the number of grid cells covering the query region, not with the number of matching points. For query_radius, each_in_radius, and each query_radius_many sub-query that is roughly (2 * radius / cell_size) ** dims cells (similarly for query_aabb, which scans the cells spanning the box). The scan runs while holding a read lock. An over-large radius relative to cell_size therefore scans many empty cells, wasting time and stalling concurrent writers that are waiting for the lock. Size cell_size on the order of your typical query radius so each query touches only a handful of cells.

As a safety net, any region query that would scan more than approximately 67 million cells (2**26) -- a query_radius, query_aabb, each_in_radius, or a query_radius_many sub-query whose region spans that many cells, a query_knn that must walk that many cells across its expanding shells, or a each_pair_within / each_colliding_pair whose per-entry neighbourhood spans that many cells -- croaks with a message containing the word "cells" rather than scanning unbounded. If your use case genuinely requires regions that large, increase cell_size so the same physical region maps to fewer cells.

Spherical worlds

For points on or above a sphere (planets, globes), construct with a body radius and use the geo helpers; a separate cube-sphere scheme gives stable hierarchical cell ids for chunking and level-of-detail. Curvature needs no special handling: with sphere set, geo coordinates are converted to and from Cartesian in C, so proximity is exact straight-line distance -- correct for surface and air entities alike -- not a great-circle approximation.

Geo proximity

my $s = Data::SpatialHash::Shared->new(undef, $max, 0, $cell, sphere => $R);

my $h = $s->insert_geo($lat, $lon, $alt, $value);   # radians; alt above the surface
$s->move_geo($h, $lat, $lon, $alt);
my ($lat, $lon, $alt) = $s->position_geo($h);
my @vals = $s->query_geo_radius($lat, $lon, $alt, $dist);   # $dist in world units

sphere is the body radius -- distinct from a per-entry interaction radius -- and must be finite and greater than zero; it is stored in the map and restored on reopen. sphere and wrap are mutually exclusive (a sphere is not a flat torus); passing both croaks. Latitude and longitude are in radians (lat in -pi/2 .. pi/2, lon in -pi .. pi); alt is height above the sphere of radius $R, so an entity lies at distance $R + $alt from the centre. Each geo method converts to Cartesian and delegates to the ordinary 3D engine, so $dist in query_geo_radius is a true straight-line distance and must be finite and non-negative. At a pole, longitude is undefined and position_geo reports it as 0. Calling a geo method on a map created without sphere croaks. insert_geo returns a handle or undef if the pool is exhausted (test with defined -- handle 0 is valid but false); move_geo returns true, or false for a freed/invalid handle; position_geo croaks on a freed or invalid handle.

Cube-sphere cells

A direction (or lat/lon) maps to a hierarchical cell id on a cube-sphere: six cube faces, each an equal-angle grid subdivided to a chosen level (level 0 = the whole face, level L = 2**L cells per face edge, up to level 24). The ids are stateless integers independent of any stored data -- useful as chunk keys for streaming and level-of-detail.

my $cell = $s->cube_cell($x, $y, $z, $level);     # direction (need not be unit length)
my $cell = $s->cube_cell_geo($lat, $lon, $level);
my @adj  = $s->cube_neighbors($cell);             # 4 edge-adjacent cells, seam-aware
my $up   = $s->cube_parent($cell);                # coarser cell (undef at level 0)
my @kids = $s->cube_children($cell);              # 4 finer cells (empty at level 24)
my $lvl  = $s->cube_level($cell);
my ($x, $y, $z) = $s->cube_center($cell);         # cell centre as a unit vector
my ($lat, $lon) = $s->cube_center_geo($cell);

A cell id packs (level, face, i, j) into an unsigned integer, so cells at different levels are different ids. cube_neighbors returns the four edge-adjacent cells, correct across face seams; diagonal/corner neighbours are not included. These methods read no map state (any handle provides them), and a malformed cell id croaks; a zero or non-finite direction yields an arbitrary but valid cell. The grid is near-uniform (equal-angle), not equal-area.

Introspection

$s->count;        # live entry count
$s->max_entries;  # capacity
$s->num_buckets;  # bucket table size
$s->cell_size;    # cell size in world units
my @w = $s->world;  # wrap extents: (Wx,Wy) or (Wx,Wy,Wz); empty if not toroidal
my $R = $s->sphere; # body radius (sphere => $R), or 0 if not a sphere map
$s->stats;        # diagnostic hashref (see STATS)

Lifecycle

$s->path;              # backing file path, or undef for anon/memfd
$s->memfd;             # memfd fd (-1 for file-backed/anon)
$s->sync;              # msync mmap to backing store
$s->unlink;            # remove backing file
Class->unlink($path);  # class-method form

sync and unlink croak on OS failure.

Event Loop Integration

my $fd = $s->eventfd;         # lazy-create eventfd, returns fd
$s->eventfd_set($fd);         # attach an external eventfd
my $fd = $s->fileno;          # current eventfd fd, or -1
$s->notify;                   # write 1 to eventfd (signal update)
my $n = $s->eventfd_consume;  # read+reset eventfd counter

eventfd_set attaches an external eventfd, closing the previously-attached fd (such as one created by eventfd) unless it is the same descriptor. notify returns false if no eventfd is attached, true after writing the signal. eventfd_consume returns the counter as an integer, or undef if no eventfd is attached or nothing is pending (a spurious wakeup).

TUNING

Choose cell_size close to your typical query radius. A value too small means many cells are scanned per radius query; a value too large packs many entries into each cell and increases false-positive tests.

Choose num_buckets to be a power of two slightly above the expected peak live count; any value you pass is rounded up to the next power of two. The default (0 = auto) picks a power of two near max_entries, which is a safe starting point. After loading real data, inspect stats->{load_factor} (target below 0.7) and stats->{max_chain} (target below 5).

BENCHMARKS

Measured on Linux x86_64, Perl 5.40.2 (bench/single.pl). The first four use 100k entries at cell_size 1.0; each_colliding_pair uses 4000 radius-bearing actors on a 2000x2000 torus; the geo/cube rows use a planet (sphere set):

insert:                4.6M/s
query_radius:           61k/s
query_knn(10):         125k/s
move:                  2.2M/s
move_many:              22M/s    # ~10x move -- one lock acquisition per batch
each_colliding_pair:   770/s     # ~1.3 ms per full broad-phase pass
query_geo_radius:       12k/s    # planet proximity (lat/lon/alt -> 3D in C)
cube_cell:             5.5M/s    # cube-sphere cell id (level 14)

STATS

stats() returns a hashref with keys:

count -- current number of live entries
max_entries -- allocated entry capacity
num_buckets -- size of the bucket table
cell_size -- cell size (float)
free_slots -- available entry slots (max_entries - count)
occupied_buckets -- number of buckets with at least one entry
max_chain -- longest collision chain across all buckets
max_cell -- most entries sharing a single grid cell
load_factor -- count / num_buckets (float)
ops -- total mutation operation counter
mmap_size -- size of the shared mapping in bytes

SECURITY

The mmap region is writable by all processes that open it. Do not share backing files with untrusted processes.

CRASH SAFETY

The write lock is a futex-based rwlock with PID-encoded ownership. If the writer process dies while holding the lock, the next writer that cannot acquire the lock checks whether the owner PID is still alive and, if not, recovers the lock. Reader slots are similarly reclaimed when a dead reader's slot is detected.

Limitation: PID reuse is not detected. If a new process acquires the same PID as a dead lock holder before recovery runs, the stale lock may not be released automatically. This edge case requires the kernel to reassign PIDs faster than lock-recovery attempts, which is very unlikely in practice but cannot be ruled out.

SEE ALSO

Data::Graph::Shared - directed weighted graph

Data::Heap::Shared - priority queue (for Dijkstra, Prim, etc.)

Data::Pool::Shared - fixed-size object pool

Data::HashMap::Shared - concurrent hash table

Data::Buffer::Shared - typed shared array

Data::Queue::Shared - FIFO queue

Data::Stack::Shared - LIFO stack

Data::Deque::Shared - double-ended queue

Data::Log::Shared - append-only log

Data::Sync::Shared - synchronization primitives

Data::PubSub::Shared - publish-subscribe ring

Data::ReqRep::Shared - request-reply

Data::BitSet::Shared - shared bitset (lock-free per-bit ops)

Data::RingBuffer::Shared - fixed-size overwriting ring buffer

AUTHOR

vividsnow

LICENSE

This is free software; you can redistribute it and/or modify it under the same terms as Perl itself.