DESCRIPTION

Interface to runtime cuda kernel compile module.

Constructor

MXRtc object in mxnet.
This class allow you to write cuda kernel in perl
and call them with NDArray.

Parameters
----------
name : str
    name of the kernel
inputs : tuple of (str, mxnet.ndarray)
    list of input names and ndarray
outputs : tuple of (str, mxnet.ndarray)
    list of output names and ndarray
kernel : str
    the actual kernel code.
    Note that this is only the body of the kernel, i.e.
    after { and before }. Rtc will decorate the kernel.
    For example, if name = "mykernel" and
    inputs = [('x', mx.nd.zeros((10,)))]
    outputs = [('y', mx.nd.zeros((10,)))]
    kernel = "y[threadIdx.x] = x[threadIdx.x];",
    the kernel that is compile will be:
    extern "C" __global__ mykernel(float *x, float *y) {
        const int x_ndim = 1;
        const int x_dims = { 10 };
        const int y_ndim = 1;
        const int y_dims = { 10 };

        y[threadIdx.x] = x[threadIdx.x];
    }

push

run the kernel.

Parameters
----------
inputs : list of ndarray
    list of input. Can be different ndarray then uses for constructor,
    but must have the same shape and in the same order.
outputs : list of ndarray
    list of out. Can be different ndarray then uses for constructor,
    but must have the same shape and in the same order.
grid_dims : tuple of 3 uint
    grid dimension for kernel launch
block_dims : tuple of 3 uint
    block dimension for kernel launch