NAME

PDL::Guide::Migration::NumPy - A Rosetta Stone for the NumPy professional

DESCRIPTION

If you know NumPy, you already know 90% of PDL. Both systems provide N-dimensional arrays (ndarrays), broadcasting, and vectorised C-speed operations.

PDL's primary advantages are its integrated multi-threading, memory efficiency, and native C-interop.

Terminology Mapping

ndarray

The core data object. Known as ndarray in NumPy and PDL.

Broadcasting

The process of applying an operation to sub-sections of data. Same in PDL as in NumPy.

Dimension 0

The first (leftmost) index. In PDL, this is the "fastest" dimension in memory (contiguous).

Virtual ndarray

A slice or transformation that shares memory with its parent. Known as a View in NumPy.

Physicalise

Forcing a virtual ndarray to become a contiguous, independent block of memory. Similar to NumPy's copy() or ascontiguousarray(). In PDL, changes to the "parent" ndarray get transmitted to the "child", and vice-versa, even if the child is physicalised. This can be prevented by severing the child.

The "Mental Model" Shift

1. Dimension Order and Layout

PDL follows a C-like memory layout where the 0-th (first) dimension is the "fastest" in memory.

# NumPy: a(rows, cols) -> cols are contiguous
# PDL:   $a(cols, rows) -> cols are contiguous

Both systems store rows contiguously in memory, but they label the indices differently. In PDL, the first index is the column.

2. Automatic Threading

PDL automatically distributes operations across CPU cores via POSIX threads. You don't need to manage pools or use multiprocessing for standard numerical tasks.

Slicing and Dicing

NumPy users rely heavily on a[start:stop:step] notation. PDL provides two ways to achieve this: the core slice method and the PDL::NiceSlice source filter. Note that PDL uses inclusive ranges.

1. Core Syntax (The slice method)

The slice method is always available and does not require a filter.

# NumPy: a[0:5] (0,1,2,3,4)      # PDL: $a->slice("0:4")
                                        $sub = $a->slice("0:4");

# Reverse a dimension
# NumPy: a[::-1]                 # PDL: $a->slice("-1:0")
                                        $rev = $a->slice("-1:0");

2. Modern Syntax (PDL::NiceSlice)

For a NumPy-like experience, PDL::NiceSlice allows you to use variables directly in parentheses.

use PDL::NiceSlice;

# NumPy                          # PDL
subset = a[0:5, :]               $subset = $a(0:4, :);
rev = a[::-1, ::2]               $rev = $a(-1:0, 0:-1:2);

Syntax Rosetta Stone

Creation and Inspection

# NumPy                          # PDL
a = np.array([1,2,3])            $a = pdl(1,2,3);
b = np.arange(10)                $b = sequence(10);
c = np.zeros((3,3))              $c = zeros(3,3);
print(a.shape)                   print $a->dims;
print(a.dtype)                   print $a->type;

Operations

# NumPy                          # PDL
res = a + b                      $res = $a + $b;
res = np.dot(a, b)               $res = $a x $b;
res = a.sum(axis=0)              $res = $a->sumover;

In-place Operations (Memory Efficiency)

In NumPy, you prevent memory allocation by passing an out= argument. PDL uses an inplace flag, which tells the next method in the chain to modify the ndarray rather than creating a new one.

# NumPy                          # PDL
np.sqrt(a, out=a)                $a->inplace->sqrt;
a += 5                           $a += 5; (auto-inplace)

This is particularly powerful for large datasets where you want to chain operations without doubling your RAM usage.

Computed Assignment (Copying into Slices)

In NumPy, you assign values to a slice using the standard = operator. In PDL, the .= (data assignment) operator is used to copy values into the memory pointed to by a slice.

# NumPy                          # PDL
a[0:5, 0:5] = 10                 $a(0:4, 0:4) .= 10;

# Copying one ndarray into a slice
# NumPy: a[0:5] = b              # PDL: $a(0:4) .= $b
$a(0:4) .= $b;

Why not use "="?

In PDL, = is a reference assignment. Writing $a(0:4) = $b does nothing to $a. Using .= ensures the data reaches the intended physical destination.

Note on "Physical" vs "Virtual" ndarrays

NumPy's views are similar to PDL's virtual ndarrays. When you slice an ndarray in PDL, it doesn't copy the data; it creates a "window" into the original.

If you modify a slice inplace, the original ndarray changes too:

$a = zeroes(10, 10);
$slice = $a(0:4, 0:4);
$slice .= 5; # Original $a now has a 5x5 block of 5s

Broadcasters

In NumPy, you "broadcast" a smaller array over a larger one. In PDL, you can also do this.

$a = sequence(10, 10);
$b = pdl(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);

# Adds $b to every row of $a
$c = $a + $b;

SEE ALSO

PDL::Indexing, PDL::NiceSlice, PDL::Course, PDL::ParallelCPU

And a translation of the PDL API into NumPy: https://github.com/dkogan/numpysane/#whats-wrong-with-existing-numpy-functions