NAME
PDL::Guide::Migration::NumPy - A Rosetta Stone for the NumPy professional
DESCRIPTION
If you know NumPy, you already know 90% of PDL. Both systems provide N-dimensional arrays (ndarrays), broadcasting, and vectorised C-speed operations.
PDL's primary advantages are its integrated multi-threading, memory efficiency, and native C-interop.
Terminology Mapping
- ndarray
-
The core data object. Known as ndarray in NumPy and PDL.
- Broadcasting
-
The process of applying an operation to sub-sections of data. Same in PDL as in NumPy.
- Dimension 0
-
The first (leftmost) index. In PDL, this is the "fastest" dimension in memory (contiguous).
- Virtual ndarray
-
A slice or transformation that shares memory with its parent. Known as a View in NumPy.
- Physicalise
-
Forcing a virtual ndarray to become a contiguous, independent block of memory. Similar to NumPy's copy() or ascontiguousarray(). In PDL, changes to the "parent" ndarray get transmitted to the "child", and vice-versa, even if the child is physicalised. This can be prevented by
severing the child.
The "Mental Model" Shift
- 1. Dimension Order and Layout
-
PDL follows a C-like memory layout where the 0-th (first) dimension is the "fastest" in memory.
# NumPy: a(rows, cols) -> cols are contiguous # PDL: $a(cols, rows) -> cols are contiguousBoth systems store rows contiguously in memory, but they label the indices differently. In PDL, the first index is the column.
- 2. Automatic Threading
-
PDL automatically distributes operations across CPU cores via POSIX threads. You don't need to manage pools or use
multiprocessingfor standard numerical tasks.
Slicing and Dicing
NumPy users rely heavily on a[start:stop:step] notation. PDL provides two ways to achieve this: the core slice method and the PDL::NiceSlice source filter. Note that PDL uses inclusive ranges.
1. Core Syntax (The slice method)
The slice method is always available and does not require a filter.
# NumPy: a[0:5] (0,1,2,3,4) # PDL: $a->slice("0:4")
$sub = $a->slice("0:4");
# Reverse a dimension
# NumPy: a[::-1] # PDL: $a->slice("-1:0")
$rev = $a->slice("-1:0");
2. Modern Syntax (PDL::NiceSlice)
For a NumPy-like experience, PDL::NiceSlice allows you to use variables directly in parentheses.
use PDL::NiceSlice;
# NumPy # PDL
subset = a[0:5, :] $subset = $a(0:4, :);
rev = a[::-1, ::2] $rev = $a(-1:0, 0:-1:2);
Syntax Rosetta Stone
Creation and Inspection
# NumPy # PDL
a = np.array([1,2,3]) $a = pdl(1,2,3);
b = np.arange(10) $b = sequence(10);
c = np.zeros((3,3)) $c = zeros(3,3);
print(a.shape) print $a->dims;
print(a.dtype) print $a->type;
Operations
# NumPy # PDL
res = a + b $res = $a + $b;
res = np.dot(a, b) $res = $a x $b;
res = a.sum(axis=0) $res = $a->sumover;
In-place Operations (Memory Efficiency)
In NumPy, you prevent memory allocation by passing an out= argument. PDL uses an inplace flag, which tells the next method in the chain to modify the ndarray rather than creating a new one.
# NumPy # PDL
np.sqrt(a, out=a) $a->inplace->sqrt;
a += 5 $a += 5; (auto-inplace)
This is particularly powerful for large datasets where you want to chain operations without doubling your RAM usage.
Computed Assignment (Copying into Slices)
In NumPy, you assign values to a slice using the standard = operator. In PDL, the .= (data assignment) operator is used to copy values into the memory pointed to by a slice.
# NumPy # PDL
a[0:5, 0:5] = 10 $a(0:4, 0:4) .= 10;
# Copying one ndarray into a slice
# NumPy: a[0:5] = b # PDL: $a(0:4) .= $b
$a(0:4) .= $b;
Why not use "="?
In PDL, = is a reference assignment. Writing $a(0:4) = $b does nothing to $a. Using .= ensures the data reaches the intended physical destination.
Note on "Physical" vs "Virtual" ndarrays
NumPy's views are similar to PDL's virtual ndarrays. When you slice an ndarray in PDL, it doesn't copy the data; it creates a "window" into the original.
If you modify a slice inplace, the original ndarray changes too:
$a = zeroes(10, 10);
$slice = $a(0:4, 0:4);
$slice .= 5; # Original $a now has a 5x5 block of 5s
Broadcasters
In NumPy, you "broadcast" a smaller array over a larger one. In PDL, you can also do this.
$a = sequence(10, 10);
$b = pdl(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
# Adds $b to every row of $a
$c = $a + $b;
SEE ALSO
PDL::Indexing, PDL::NiceSlice, PDL::Course, PDL::ParallelCPU
And a translation of the PDL API into NumPy: https://github.com/dkogan/numpysane/#whats-wrong-with-existing-numpy-functions