NAME

Rex::Rancher - Rancher Kubernetes (RKE2/K3s) deployment automation for Rex

VERSION

version 0.001

SYNOPSIS

use Rex -feature => ['1.4'];
use Rex::Rancher;

# Deploy RKE2 control plane (no GPU)
task "deploy_server", sub {
  rancher_deploy_server(
    distribution    => 'rke2',
    hostname        => 'cp-01',
    domain          => 'k8s.example.com',
    token           => 'my-secret',
    tls_san         => 'k8s.example.com',
    kubeconfig_file => "$ENV{HOME}/.kube/mycluster.yaml",
  );
};

# Deploy RKE2 control plane with GPU support
task "deploy_gpu_server", sub {
  rancher_deploy_server(
    distribution    => 'rke2',
    gpu             => 1,    # requires Rex::GPU installed
    reboot          => 1,    # reboot after driver install (first deploy)
    hostname        => 'gpu-cp-01',
    domain          => 'k8s.example.com',
    token           => 'my-secret',
    tls_san         => 'gpu-cp-01.k8s.example.com',
    kubeconfig_file => "$ENV{HOME}/.kube/gpu-cluster.yaml",
  );
};

# Deploy K3s worker with GPU support
task "deploy_gpu_worker", sub {
  rancher_deploy_agent(
    distribution => 'k3s',
    gpu          => 1,    # requires Rex::GPU installed
    hostname     => 'gpu-01',
    domain       => 'k8s.example.com',
    server       => 'https://10.0.0.1:6443',
    token        => 'K10...',
  );
};

# Deploy a single-node cluster (control plane + workloads on same node)
task "deploy_single_node", sub {
  rancher_deploy_server(
    distribution    => 'rke2',
    token           => 'my-secret',
    tls_san         => '10.0.0.1',
    kubeconfig_file => "$ENV{HOME}/.kube/single.yaml",
  );
  # Remove control-plane taint so workloads can be scheduled
  untaint_node(kubeconfig => "$ENV{HOME}/.kube/single.yaml");
};

DESCRIPTION

Rex::Rancher provides complete, zero-touch Kubernetes cluster deployment for Rancher distributions (RKE2 and K3s) using the Rex orchestration framework. It handles everything from raw Linux node preparation through to a running CNI and GPU device plugin.

GPU support is optional. Pass gpu => 1 and install Rex::GPU separately. Rex::Rancher works identically for non-GPU nodes.

When deploying a GPU server node, the full pipeline runs automatically:

1. Node preparation — hostname, timezone, locale, NTP, swap off, kernel modules (br_netfilter, overlay), sysctl for Kubernetes networking.
2. GPU setup (gpu => 1) — NVIDIA driver via DKMS, optional reboot, Container Toolkit, CDI specs, containerd runtime config. Handled by Rex::GPU.
3. Cluster bring-up — write config, run RKE2 or K3s install script, wait for kubeconfig file on the remote host, fetch and save it locally, wait for API server readiness via Kubernetes::REST.
4. Cilium CNI — Cilium CLI installed on the remote host, Cilium deployed with distribution-appropriate Helm values.
5. NVIDIA device plugin (gpu => 1 + kubeconfig_file) — DaemonSet applied via the Kubernetes API, wait for nvidia.com/gpu capacity on the node. No kubectl required anywhere.

All Kubernetes API operations (steps 3 and 5) run locally on the machine executing Rex using Kubernetes::REST and IO::K8s. No kubectl binary is needed on the remote host.

This distribution supports hosts without an SFTP subsystem (common on Hetzner dedicated servers). Use set connection => "LibSSH" and install Rex::LibSSH.

For fine-grained control, use the individual modules directly:

Rex::Rancher::Node — Node preparation
Rex::Rancher::Server — Control plane installation and config retrieval
Rex::Rancher::Agent — Worker node installation
Rex::Rancher::Cilium — Cilium CNI installation and upgrade
Rex::Rancher::K8s — Kubernetes API operations (device plugin, readiness, untaint)

rancher_deploy_server(%opts)

Full control plane deployment in a single call: prepare the node, optionally set up GPU support, install the Kubernetes distribution, wait for the API, install Cilium CNI, and deploy the NVIDIA device plugin.

When gpu => 1 is passed and Rex::GPU is installed, GPU detection and driver installation are performed automatically as step 2 before the cluster is brought up. After Cilium is running, the NVIDIA device plugin DaemonSet is deployed via the local Kubernetes API (no kubectl required on the remote host) and the function waits for nvidia.com/gpu resources to appear on the node.

The full pipeline for a GPU server deployment:

1. prepare_node — hostname, timezone, swap off, kernel modules, sysctl
2. gpu_setup (only with gpu => 1) — driver + toolkit + CDI + containerd config
3. install_server — write config, run installer, wait for kubeconfig file
4. Fetch kubeconfig locally, patch 127.0.0.1 to the real server address, save to kubeconfig_file, wait for API with "wait_for_api" in Rex::Rancher::K8s
5. install_cilium — install Cilium CLI on remote, apply via cilium install
6. deploy_nvidia_device_plugin (only with gpu => 1 and kubeconfig_file)

Options:

distribution

Kubernetes distribution to install. rke2 (default) or k3s.

gpu

If true, detect GPUs and run the full GPU setup pipeline via Rex::GPU before installing the Kubernetes distribution. Requires Rex::GPU to be installed. Default: 0.

reboot

If true, reboot the host after GPU driver installation and wait for it to come back before proceeding. Only meaningful with gpu => 1. Required on first deploy when nouveau was previously loaded. Default: 0.

hostname

Short hostname to set on the node (optional). If omitted, the existing hostname is left unchanged.

domain

Domain suffix for the FQDN (optional). Used together with hostname to set /etc/hosts. If hostname is given without domain, hostname is still set but no hosts entry is written.

timezone

Timezone string, e.g. Europe/Berlin. Default: UTC.

token

Shared cluster secret used for node joining. Auto-generated if omitted.

tls_san

Additional TLS Subject Alternative Names for the API server certificate. Accepts a string (single SAN or comma-separated list) or an arrayref. The first SAN is used as the server address when patching the kubeconfig (see kubeconfig_file below).

kubeconfig_file

Local file path where the cluster kubeconfig is saved after the server is running. Required for the NVIDIA device plugin step to work. Optional — if omitted no local kubeconfig is saved and device plugin deployment is skipped even when gpu => 1.

RKE2 and K3s write https://127.0.0.1 into the kubeconfig. The first tls_san entry (or kubeconfig_server if provided) is substituted for 127.0.0.1 so the saved file connects to the real server address.

kubeconfig_server

Explicit server address to use when patching the kubeconfig. Overrides the tls_san-based default.

node_labels

Node labels to apply, as an arrayref of key=value strings.

registries

Private registry mirror configuration hashref, written to registries.yaml. See "install_server" in Rex::Rancher::Server for the structure.

cilium

Whether to configure Cilium CNI. Default: 1. Set to 0 to keep the distribution's built-in CNI (Canal for RKE2, Flannel for K3s).

rancher_deploy_agent(%opts)

Full worker node deployment: prepare the node, optionally set up GPU support, install the Kubernetes agent, and join the existing cluster.

The pipeline is shorter than "rancher_deploy_server" — there is no Cilium installation or kubeconfig retrieval. GPU support via gpu => 1 works identically to the server case.

Options: same as "rancher_deploy_server" plus:

server

URL of the server to join. For RKE2: https://SERVER_IP:9345. For K3s: https://SERVER_IP:6443. Required.

token

Node join token. Obtain from the server with "get_token" in Rex::Rancher::Server. Required.

node_name

Override the node name registered in Kubernetes (optional).

SEE ALSO

Rex, Rex::LibSSH, Rex::GPU, Rex::Rancher::K8s, Kubernetes::REST, IO::K8s

SUPPORT

Issues

Please report bugs and feature requests on GitHub at https://github.com/Getty/rex-rancher/issues.

CONTRIBUTING

Contributions are welcome! Please fork the repository and submit a pull request.

AUTHOR

Torsten Raudssus <getty@cpan.org>

COPYRIGHT AND LICENSE

This software is copyright (c) 2026 by Torsten Raudssus <torsten@raudssus.de> https://raudssus.de/.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.