NAME

Data::Path::XS - Fast path-based access to nested data structures

SYNOPSIS

use Data::Path::XS qw(path_get path_set path_delete path_exists);

my $data = { foo => { bar => [1, 2, 3] } };

path_get   ($data, '/foo/bar/1');         # 2
path_set   ($data, '/foo/bar/1', 42);     # 42 (returns the value set)
path_exists($data, '/foo/baz');           # 0
path_delete($data, '/foo/bar/0');         # 1 (returns the deleted value)

# Pre-parsed path components (binary-safe; allows "/" in keys)
use Data::Path::XS qw(patha_get patha_set);
patha_get($data, ['foo', 'bar', 1]);
patha_set($data, ['foo', 'new'], 'value');

# Pre-compiled paths for hot loops
use Data::Path::XS qw(path_compile pathc_get);
my $cp    = path_compile('/foo/bar/1');
my $other = { foo => { bar => [4, 5, 6] } };
pathc_get($data,  $cp);                   # 42
pathc_get($other, $cp);                   # 5 — reuse across data

# Keyword syntax (compile-time optimized)
use Data::Path::XS ':keywords';

my $v = pathget    $data, "/foo/bar/1";
pathset            $data, "/foo/bar/1", 99;
pathdelete         $data, "/foo/bar/1";
print "ok\n" if pathexists $data, "/foo/bar";

DESCRIPTION

Fast XS access to deeply nested Perl data structures via slash-separated paths (similar shape to JSON Pointer, but without RFC 6901's ~0/~1 escaping). Four parallel APIs let you trade ergonomics against speed:

  • "STRING PATH API" - path_*, the general-purpose entry point.

  • "ARRAY PATH API" - patha_*, when path components are already parsed or may contain / or other special characters.

  • "COMPILED PATH API" - path_compile + pathc_*, when the same path is reused many times on different data.

  • "KEYWORDS API" - pathget/pathset/etc. as syntax via XS::Parse::Keyword, compiled to inline custom ops or (where possible) native Perl assignment ops.

All four APIs share the same path syntax ("PATH FORMAT") and the same container-dispatch semantics ("Numeric vs String Keys").

IMPORTING

use Data::Path::XS qw(path_get path_set ...);   # function exports
use Data::Path::XS ':keywords';                  # enable keyword syntax
use Data::Path::XS ':keywords', qw(path_get);    # both

The :keywords tag installs lexically-scoped keyword hints; the keywords are visible only inside the importing scope. no Data::Path::XS; removes them. Function exports follow standard Exporter rules.

PATH FORMAT

  • Components are separated by /. A leading / is optional: "/foo/bar" and "foo/bar" are equivalent.

  • An empty string or "/" refers to the root. Repeated and trailing slashes ("//foo//") are tolerated and yield the same components.

  • Numeric components may address array elements when the parent container is an array; on a hash parent the same string is treated as a hash key. See "Numeric vs String Keys".

  • Negative indices work like Perl's native array access (-1 is the last element). See "Negative Array Indices".

  • No escaping is provided in the string API: keys containing / or the empty string cannot be expressed in a string path. Use the array API (e.g. patha_get($data, ['', 'a/b'])) for those.

  • UTF-8 keys are propagated correctly. The path SV's SvUTF8 flag (or, in the array API, each key SV's flag) is forwarded to hv_fetch/ hv_store so "/café" matches hash keys stored under use utf8.

Numeric vs String Keys

All four APIs dispatch by parent container type, not by key shape:

my $h = { '0' => 'zero' };
path_get($h, '/0');               # 'zero' - hash key
pathget $h, "/0";                 # 'zero' - same

my $a = ['x', 'y', 'z'];
path_get($a, '/0');               # 'x' - array index
pathget $a, "/0";                 # 'x' - same

When autovivifying a missing intermediate, the type to create is chosen by the next component's shape: a numeric next component creates an array, otherwise a hash.

STRING PATH API

path_get($data, $path)

Returns the value at $path, or undef if any component is missing. An empty path returns $data itself. Never autovivifies.

path_get($data, '/foo/bar');
path_get($data, '');               # returns $data

path_set($data, $path, $value)

Stores $value at $path, creating intermediate hashes/arrays as needed (see "Numeric vs String Keys" for the type-decision rule). Existing non-reference scalars at intermediate positions are silently replaced. Returns $value. Croaks on an empty path or on a path that cannot be navigated (e.g. through a tied container, see "Tied containers").

path_set($data, '/foo/bar', 42);
path_set($data, '/items/0/name', 'first');   # autovivifies array

path_delete($data, $path)

Deletes the value at $path and returns it, or undef if not found. Croaks on an empty path.

my $old = path_delete($data, '/foo/bar');

path_exists($data, $path)

Returns 1 if $path resolves to an existing element (using exists semantics: explicit undef values count as existing), 0 otherwise. The empty path always exists.

do_thing() if path_exists($data, '/foo/bar');

ARRAY PATH API

The patha_* functions take an arrayref of components instead of a slash-separated string. Use this when path pieces are already parsed, when keys may contain /, or when you want to address an empty-string key (['']).

Each key SV's SvUTF8 flag is honoured per component.

patha_get($data, \@path)

patha_get($data, ['foo', 'bar', 0]);
patha_get($data, []);             # returns $data

patha_set($data, \@path, $value)

patha_set($data, ['foo', 'bar'], 42);

patha_delete($data, \@path)

patha_delete($data, ['foo', 'bar']);

patha_exists($data, \@path)

patha_exists($data, ['foo', 'bar']);

COMPILED PATH API

Pre-compile a path once, then reuse it for many lookups. The compiled object holds parsed components, pre-computed array indices, and the UTF-8 flag, so per-call overhead drops to the navigation itself.

path_compile($path)

Returns a compiled path object (a blessed reference). The object owns its own copy of the path string, so the caller may freely mutate or discard the source SV.

my $cp = path_compile('/users/0/name');

pathc_get($data, $compiled)

for my $record (@records) {
    my $val = pathc_get($record, $cp);
}

pathc_set($data, $compiled, $value)

pathc_set($data, $cp, 'new value');

pathc_delete($data, $compiled)

pathc_delete($data, $cp);

pathc_exists($data, $compiled)

pathc_exists($data, $cp);

KEYWORDS API

use Data::Path::XS ':keywords';

The keywords compile to either an inline custom op or, where the path allows, native Perl assignment ops. They never call into XSUB dispatch and so reach near-native speed.

pathget DATA, PATH

Get a value. Returns undef for missing paths and never autovivifies.

my $val = pathget $data, "/users/0/name";

pathset DATA, PATH, VALUE

Set a value, autovivifying intermediates as needed. Returns VALUE.

pathset $data, "/users/0/name", "Alice";

pathdelete DATA, PATH

Delete a value and return it.

my $old = pathdelete $data, "/users/0/name";

pathexists DATA, PATH

True if PATH exists.

print "found\n" if pathexists $data, "/users/0/name";

Constant vs Dynamic Paths

When pathset is called with a compile-time constant path that

  • contains only string components (no numeric pieces), and

  • does not carry the SvUTF8 flag (i.e. is not authored under use utf8),

the keyword compiles directly to a native HELEM-chain assignment with autovivification - zero per-call overhead. Because this uses Perl's native ops:

  • error messages match Perl's (e.g. "Not a HASH reference") rather than this module's ("Cannot navigate to path"), and

  • a non-reference intermediate causes a croak rather than being silently replaced.

In every other case (numeric component, UTF-8 path, non-constant path), the keyword falls through to a custom op with the same semantics as path_set.

The other three keywords (pathget, pathexists, pathdelete) always use custom ops.

EDGE CASES

Empty Paths

The empty path ("", "/", "///") addresses the root:

path_get   ($data, "");           # $data
path_exists($data, "/");          # 1
path_set   ($data, "", $v);       # croaks "Cannot set root"
path_delete($data, "");           # croaks "Cannot delete root"

Negative Array Indices

Negative indices behave like Perl's:

my $data = { arr => ['a', 'b', 'c'] };
path_get($data, '/arr/-1');       # 'c'
path_set($data, '/arr/-1', 'z');  # arr now ['a','b','z']

Out-of-range negative indices return undef (or false for exists).

Leading Zeros

Strings with leading zeros are treated as hash keys, not array indices:

path_get($data, '/arr/007');      # $data->{arr}{007}
path_get($data, '/arr/0');        # $data->{arr}[0] (single zero ok)

Integer Overflow

Indices with more than 18 digits (9 on 32-bit perls) are treated as hash keys to prevent overflow:

path_get($data, '/arr/12345678901234567890');  # hash key

LIMITATIONS

Tied containers

Read operations (path_get, path_exists, path_delete, and their array/compiled/keyword counterparts) work on tied hashes and arrays via the standard fetch/exists/delete magic.

Write operations (path_set, patha_set, pathc_set, and the pathset keyword) currently croak with a message of the form "Cannot ... on tied/magical hash" or "... on tied/magical array", rather than invoking the tied STORE method. For tied write targets, assign through native Perl syntax. This limitation may be relaxed in a future release.

THREAD SAFETY

The module uses no global state and is safe in threaded programs as long as each thread operates on its own data. No locking is performed on shared structures.

Compiled-path objects own internal buffers and should not be shared across threads; create one per thread.

PERFORMANCE

Indicative numbers from bench/benchmark.pl on a single sample run (rate per second, higher is better):

Operation                Pure Perl    Native Perl    Data::Path::XS
----------------------- ----------- -------------- -----------------
path_get shallow            2.1 M/s        35.4 M/s          22.6 M/s
path_get deep (5 levels)    0.8 M/s         7.0 M/s           8.6 M/s
path_get missing key        1.3 M/s         4.4 M/s          14.7 M/s
path_set deep existing      0.8 M/s         8.1 M/s           7.3 M/s
pathget kw const shallow    -              37.5 M/s          42.2 M/s
pathget kw const deep       -               7.3 M/s           8.5 M/s
pathexists kw const deep    -               6.3 M/s          10.2 M/s

The keyword API matches or exceeds native Perl on most workloads. The compiled API adds another ~20-35% on hot paths by skipping parsing. Run bench/benchmark.pl for a fuller comparison on your hardware.

SEE ALSO

  • Data::Diver - pure-Perl deep accessor with similar reach.

  • JSON::Pointer - RFC 6901 path syntax (with ~0/~1 escaping) over the same kinds of structures.

  • Data::DPath - XPath-like queries over data.

  • XS::Parse::Keyword - the keyword-plugin framework used to install the pathget/pathset/pathdelete/pathexists syntax.

AUTHOR

vividsnow

BUGS

Please report issues at https://github.com/vividsnow/perl5-data-path-xs/issues.

LICENSE

This is free software; you can redistribute it and/or modify it under the same terms as Perl itself.