NAME

Task::MemManager::Device - Device-specific memory management extensions for Task::MemManager

VERSION

version 0.02

SYNOPSIS

use Task::MemManager::Device;

# Use default NVIDIA_GPU device
my $buffer = Task::MemManager->new(1000, 4);

# Map buffer to GPU
$buffer->device_movement(
    action    => 'enter',
    direction => 'to',
    device    => 'NVIDIA_GPU',
    device_id => 0
);

# Perform GPU operations (using your C code)
my_gpu_function($buffer->get_buffer, $buffer->get_buffer_size);

# Update buffer from GPU back to CPU
$buffer->device_movement(
    action    => 'update',
    direction => 'from'
);

# Exit and deallocate from GPU
$buffer->device_movement(
    action    => 'exit',
    direction => 'from'
);

DESCRIPTION

Task::MemManager::Device extends the Task::MemManager module by providing device-specific memory management capabilities, particularly for GPU computing using OpenMP target directives. It enables seamless data movement between CPU and GPU memory spaces, supporting various mapping strategies (to, from, tofrom, alloc) and update operations.

The module dynamically generates device-specific modules using Inline::C and OpenMP pragmas, allowing for flexible device support. By default, it provides NVIDIA GPU support with appropriate compilation flags, but can be extended to support AMD GPUs and other devices.

Device modules are automatically loaded and compiled on first use, with the generated code cached by Inline::C for subsequent runs. Each device module implements a set of standard functions for entering data regions, exiting data regions, and updating data between host and device.

LOADING THE MODULE

The module can be loaded with or without specifying device modules:

# Load with default NVIDIA_GPU device
use Task::MemManager::Device;

# Load with specific devices
use Task::MemManager::Device qw(NVIDIA_GPU AMD_GPU);

# Load via Task::MemManager with device specification
use Task::MemManager Device => ['NVIDIA_GPU'];

# Combine with allocator and view specifications
use Task::MemManager
    Allocator => 'CMalloc',
    View      => 'PDL',
    Device    => 'NVIDIA_GPU';

METHODS

device_movement

$buffer->device_movement(%options);

Manages data movement between CPU and device (GPU) memory spaces using OpenMP target directives. This is the primary method for controlling data placement and updates.

Parameters:

action - The type of operation to perform. Required. One of:
- 'enter' - Begin a data mapping region (allocate on device, optionally copy)
- 'exit' - End a data mapping region (optionally copy back, deallocate)
- 'update' - Update data between host and device without changing mapping
direction - The data transfer direction. Required. One of:
- 'to' - Copy data from host to device
- 'from' - Copy data from device to host
- 'tofrom' - Copy data both ways (enter: to device, exit: from device)
- 'alloc' - Allocate device memory without copying (enter only)
- 'release' - Deallocate device memory without copying (exit only)
- 'delete' - Deallocate device memory, discard changes (exit only)
device - Device module name. Optional. Default: 'NVIDIA_GPU'
device_id - Device ID number for multi-device systems. Optional. Default: 0
start - Starting byte offset in buffer. Optional. Default: 0
end - Ending byte position in buffer. Optional. Default: buffer size

Returns: Nothing (dies on error)

Throws:

Dies if action/direction combination is invalid
Dies if attempting to manage same device_id with different device modules
Dies if attempting to enter-map the same buffer twice on same device

Examples:

# Map buffer to GPU, copying data
$buffer->device_movement(
    action    => 'enter',
    direction => 'to'
);

# Allocate GPU memory without copying
$buffer->device_movement(
    action    => 'enter',
    direction => 'alloc'
);

# Update partial buffer region from GPU
$buffer->device_movement(
    action    => 'update',
    direction => 'from',
    start     => 0,
    end       => 1000
);

# Exit mapping, copying data back and deallocating
$buffer->device_movement(
    action    => 'exit',
    direction => 'from'
);

# Exit mapping with release (keep mapping but allow reuse)
$buffer->device_movement(
    action    => 'exit',
    direction => 'release'
);

DEVICE FUNCTIONS

Each device module provides the following functions (where <DEVICE> is replaced with the device name, e.g., NVIDIA_GPU):

<DEVICE>_enter_to_gpu - Map data to device (copy from host)
<DEVICE>_enter_tofrom_gpu - Map data bidirectionally
<DEVICE>_enter_alloc_gpu - Allocate on device without copying
<DEVICE>_exit_from_gpu - Unmap data from device (copy to host)
<DEVICE>_exit_tofrom_gpu - Unmap bidirectional data
<DEVICE>_exit_release_gpu - Release mapping without copying
<DEVICE>_exit_delete_gpu - Delete mapping and discard data
<DEVICE>_update_to_gpu - Update data to device
<DEVICE>_update_from_gpu - Update data from device

These functions are automatically registered and called by the device_movement method. They should not typically be called directly.

COMPILATION OPTIONS

The module supports device-specific compilation options for optimal performance:

NVIDIA_GPU (default)

COMPILER_FLAGS: -fno-stack-protector -fcf-protection=none -fopenmp 
                -std=c11 -fPIC -Wall -Wextra
CCEXFLAGS:      -foffload=nvptx-none
LINKER_FLAGS:   -fopenmp (with system lddlflags)
OPTIMIZE:       -O3 -march=native

AMD_GPU

COMPILER_FLAGS: (same as NVIDIA_GPU)
CCEXFLAGS:      (none - AMD offloading under development)
LINKER_FLAGS:   -fopenmp (with system lddlflags)
OPTIMIZE:       -O3 -march=native

DEFAULT (for other devices)

COMPILER_FLAGS: (same as NVIDIA_GPU)
CCEXFLAGS:      -fopenmp
LINKER_FLAGS:   -fopenmp (with system lddlflags)
OPTIMIZE:       -O3 -march=native

EXAMPLES

Example 1 is a complete working example demonstrating basic GPU memory mapping, computation, and retrieval of results. Example 2 shows how to allocate GPU memory without initial data copy. Example 3 illustrates combining device management with PDL views for seamless integration with Perl Data Language. =head2 Example 1: Basic GPU Memory Mapping

This example demonstrates the fundamental pattern of mapping memory to GPU, performing computations, and retrieving results.

use Task::MemManager::Device;
use Inline (
C => Config => ccflags => "-fno-stack-protector -fcf-protection=none "
  . " -fopenmp  -Iinclude -std=c11 -fPIC "
  . " -Wall -Wextra -Wno-unused-function -Wno-unused-variable"
  . " -Wno-unused-but-set-variable ",
lddlflags => join( q{ }, $Config::Config{lddlflags}, q{-fopenmp} ),
ccflagsex => " -fopenmp ",
libs      => q{ -lm -foffload=-lm },
optimize  => "-O3 -march=native",
); # replace with your OpenMP's device flags
use Inline C => 'DATA';

my $buffer_length = 250000;
my $buffer = Task::MemManager->new($buffer_length, 4);

# Map buffer to GPU
$buffer->device_movement(action => 'enter', direction => 'to');

# Perform GPU computation
assign_as_float($buffer->get_buffer, $buffer->get_buffer_size);

# Update results back to CPU
$buffer->device_movement(action => 'update', direction => 'from');

# Verify results by printing some values
my @values = unpack("f*", $buffer->extract_buffer_region(0, 
                    $buffer->get_buffer_size - 1));

print "First 10 values: ", join(", ", @values[0..9]), "\n";
print "Last 10 values: ", join(", ", @values[-10..-1]), "\n";
# Exit GPU mapping
$buffer->device_movement(action => 'exit', direction => 'from');

__DATA__
__C__
#include "omp.h"

void assign_as_float(unsigned long arr, size_t n) {
    float *array_addr = (float *)arr;
    size_t len = n / sizeof(float);
    #pragma omp target
    for (int i = 0; i < len; i++) {
        array_addr[i] = (float)i * 2.0f;
    }
}

Example 2: GPU Memory Allocation Without Initial Copy

When you want to allocate GPU memory but don't need to copy initial data (e.g., for output-only computations):

# look at Example 1 for the use statements and Inline C setup
my $buffer = Task::MemManager->new(1000000, 4);

# Allocate GPU memory without copying
$buffer->device_movement(action => 'enter', direction => 'alloc');

# Perform GPU computation that generates results
alloc_as_float($buffer->get_buffer, $buffer->get_buffer_size);

# Copy results back to CPU
$buffer->device_movement(action => 'exit', direction => 'from');

__DATA__
__C__
#include "omp.h"

void alloc_as_float(unsigned long arr, size_t n) {
    float *array_addr = (float *)arr;
    size_t len = n / sizeof(float);
    #pragma omp target
    for (int i = 0; i < len; i++) {
        array_addr[i] = (float)i * 3.0f;
    }
}

Example 3: Working with PDL Views

Combining device management with PDL views for seamless integration with Perl Data Language:

use Task::MemManager
    Allocator => 'CMalloc',
    View      => 'PDL',
    Device    => 'NVIDIA_GPU';
use Inline (
C => Config => ccflags => "-fno-stack-protector -fcf-protection=none "
  . " -fopenmp  -Iinclude -std=c11 -fPIC "
  . " -Wall -Wextra -Wno-unused-function -Wno-unused-variable"
  . " -Wno-unused-but-set-variable ",
lddlflags => join( q{ }, $Config::Config{lddlflags}, q{-fopenmp} ),
ccflagsex => " -fopenmp ",
libs      => q{ -lm -foffload=-lm },
optimize  => "-O3 -march=native",
); # replace with your OpenMP's device flags
use Inline C => 'DATA';  

my $buffer_length = 1000;
my $buffer = Task::MemManager->new($buffer_length, 4, 
                                  {allocator => 'CMalloc'});

# Create PDL view
my $pdl_view = $buffer->create_view('PDL',
    {view_name => 'my_pdl_view', pdl_type => 'float'});

# Initialize with random values in PDL
$pdl_view->inplace->random;

# Clone the view for comparison
my $cloned_view = $buffer->clone_view('my_pdl_view');

# Move to GPU and modify
$buffer->device_movement(action => 'enter', direction => 'to');
mod_as_float($buffer->get_buffer, $buffer->get_buffer_size);
$buffer->device_movement(action => 'exit', direction => 'from');

# PDL view automatically reflects changes
my @values = list $pdl_view;
my @original = list $cloned_view;

# Verify: values should be doubled
for my $i (0 .. $#values) {
    die "Mismatch!" unless $values[$i] == $original[$i] * 2.0;
}

__DATA__
__C__
#include "omp.h"

void mod_as_float(unsigned long arr, size_t n) {
    float *array_addr = (float *)arr;
    size_t len = n / sizeof(float);
    #pragma omp target
    for (int i = 0; i < len; i++) {
        array_addr[i] *= 2.0f;
    }
}

Example 4: Multiple Device Management

Managing multiple buffers across different devices (code snippet):

# Create multiple buffers
my $buf1 = Task::MemManager->new(1000, 4);
my $buf2 = Task::MemManager->new(2000, 4);

# Map to different devices (if available)
$buf1->device_movement(
    action    => 'enter',
    direction => 'to',
    device_id => 0
);

$buf2->device_movement(
    action    => 'enter',
    direction => 'to',
    device_id => 1  # Different device
);

# Perform operations on each device - fictional C level functions
process_on_device($buf1->get_buffer, $buf1->get_buffer_size);
process_on_device($buf2->get_buffer, $buf2->get_buffer_size);

# Retrieve results
$buf1->device_movement(action => 'exit', direction => 'from', device_id => 0);
$buf2->device_movement(action => 'exit', direction => 'from', device_id => 1);

Example 5: Partial Buffer Updates

Update only a portion of the buffer between host and device:

my $buffer = Task::MemManager->new(10000, 4);

$buffer->device_movement(action => 'enter', direction => 'to');

# Update only first 1000 bytes from GPU
$buffer->device_movement(
    action    => 'update',
    direction => 'from',
    start     => 0,
    end       => 1000
);

# Later, update another region to GPU
$buffer->device_movement(
    action    => 'update',
    direction => 'to',
    start     => 1000,
    end       => 2000
);

$buffer->device_movement(action => 'exit', direction => 'release');

AUTOMATIC CLEANUP

The module automatically handles cleanup of device mappings when buffer objects are destroyed. The DESTROY method ensures that:

All device mappings are properly released
Device memory is deallocated
No memory leaks occur on the device
Reference counts are properly maintained

Cleanup uses the exit_release_gpu operation, which allows the runtime to manage the actual deallocation timing while ensuring proper cleanup.

DIAGNOSTICS

If you set the environment variable DEBUG to a non-zero value, the module will provide detailed information about when things go wrong

DEPENDENCIES

The module depends on:

Task::MemManager - Base memory management functionality
Inline::C - For C code integration and compilation
Module::Find - For automatic discovery of device modules
Module::Runtime - For dynamic module loading
OpenMP-capable compiler (e.g., GCC 9+, Clang 10+) for GPU offloading

For NVIDIA GPU support, you need:

GCC with nvptx offload support, or
Clang with CUDA/NVPTX target support (not tested yet with the relevant version of perl)

LIMITATIONS AND CAVEATS

Cannot map the same buffer to the same device_id multiple times
Cannot manage the same device_id with different device modules
Device module compilation happens at first use (may take time)
Requires OpenMP 4.5+ for target directives
GPU offloading support varies by compiler and installation
AMD GPU support is experimental and may require additional setup

TODO

Ensure that clang and icx compilers work correctly
Ensure AMD GPU offloading works correctly
Add support for additional devices (e.g., Intel GPUs, FPGAs)
Add support for asynchronous data transfers
Implement device-to-device direct transfers
Add support for unified memory management
Provide device property queries (memory available, etc.)
Add support for interfacing to other parallel programming models (e.g., CUDA, HIP) using OpenMP's interoperability features
Implement automatic workload distribution across multiple devices
Device module loading and registration (when DEBUG = 1)
Function registration for each device (when DEBUG = 1)
Buffer mapping operations (enter/exit/update) (when DEBUG = 1)
Device ID management (when DEBUG = 1)
Buffer lifecycle events (when DEBUG = 1)

AUTHOR

Christos Argyropoulos, <chrisarg at cpan.org> Initial documentation was created by Claude Sonnet 4.5 after providing the human generated test files for the module and the documentation in the MemManager distribution as context.

COPYRIGHT AND LICENSE

This is free software; you can redistribute it and/or modify it under the MIT license. The full text of the license can be found in the LICENSE file. See https://en.wikipedia.org/wiki/MIT_License for more information.

To install Task::MemManager, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Task::MemManager

CPAN shell

perl -MCPAN -e shell
install Task::MemManager

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)