NAME

Rex::Rancher::K8s - Kubernetes API operations for Rex::Rancher (device plugin, readiness)

VERSION

version 0.001

SYNOPSIS

use Rex::Rancher::K8s;

# Wait for API to become available after cluster installation
wait_for_api(kubeconfig => "$ENV{HOME}/.kube/mycluster.yaml");

# Deploy NVIDIA device plugin and wait for gpu resource to appear
deploy_nvidia_device_plugin(
  kubeconfig => "$ENV{HOME}/.kube/mycluster.yaml",
  version    => 'v0.17.0',
);

# Remove control-plane taints on a single-node cluster
untaint_node(kubeconfig => "$ENV{HOME}/.kube/single.yaml");

DESCRIPTION

Rex::Rancher::K8s provides Kubernetes API operations for Rex::Rancher using Kubernetes::REST and IO::K8s. All three public functions run entirely on the local machine against the cluster's HTTP API — no kubectl binary is required anywhere, and no SSH connection to the cluster nodes is needed for these operations.

The module is used internally by "rancher_deploy_server" in Rex::Rancher to:

1. Wait for the API server to respond after install_server returns
2. Deploy the NVIDIA device plugin when gpu => 1

It can also be used standalone for post-deploy operations such as removing control-plane taints on single-node clusters.

No kubectl required

All Kubernetes API calls are made via Kubernetes::REST, which implements the Kubernetes REST API client in pure Perl using IO::K8s for object serialization. The kubeconfig file is parsed by Kubernetes::REST::Kubeconfig to extract the cluster address and credentials.

wait_for_api(%opts)

Wait for the Kubernetes API server to become reachable by polling list(Node) via Kubernetes::REST. Runs from the local machine — no SSH connection to the cluster is needed.

Polls up to 60 times with a 5-second delay between attempts (5-minute total timeout). Returns 1 as soon as the API responds, or 0 if it does not respond within the timeout.

Required options:

kubeconfig

Absolute path to the kubeconfig file saved locally. This file must have the real server address (not 127.0.0.1) — "rancher_deploy_server" in Rex::Rancher patches the address automatically.

wait_for_api(kubeconfig => "$ENV{HOME}/.kube/mycluster.yaml");

deploy_nvidia_device_plugin(%opts)

Deploy the NVIDIA Kubernetes device plugin DaemonSet to kube-system and wait for nvidia.com/gpu capacity to appear on at least one node.

All operations run locally via Kubernetes::REST — no kubectl or SSH to the cluster is needed.

The DaemonSet is created with:

  • runtimeClassName: nvidia — uses the NVIDIA container runtime (registered by "configure_containerd" in Rex::GPU::NVIDIA) to enumerate devices.

  • priorityClassName: system-node-critical — ensures the plugin pod is scheduled even under resource pressure.

  • A nvidia.com/gpu:NoSchedule toleration — allows the pod to run on nodes that still have the GPU taint.

  • FAIL_ON_INIT_ERROR=false — the plugin starts even if CDI or driver initialisation fails, reporting partial GPU availability rather than crash-looping.

If the DaemonSet already exists it is updated (resourceVersion is fetched from the cluster before the update to satisfy the optimistic concurrency requirement).

After applying, polls up to 24 times (2-minute timeout) for nvidia.com/gpu capacity to appear in any node's status.capacity.

Required options:

kubeconfig

Local path to the cluster kubeconfig.

Optional options:

version

NVIDIA device plugin image tag. Default: v0.17.0.

deploy_nvidia_device_plugin(kubeconfig => "$ENV{HOME}/.kube/mycluster.yaml");

untaint_node(%opts)

Remove node-role.kubernetes.io/control-plane:NoSchedule and node-role.kubernetes.io/master:NoSchedule taints from all nodes in the cluster.

Kubernetes adds these taints to control-plane nodes to prevent general workloads from being scheduled there. On single-node clusters (where the control plane is also the only worker) these taints must be removed so that pods can run.

Each node is fetched fresh before patching (to get the current resourceVersion for optimistic concurrency), and only patched if it actually has one of the taints. Nodes that are already untainted are skipped silently.

All operations run locally via Kubernetes::REST.

Required options:

kubeconfig

Local path to the cluster kubeconfig.

untaint_node(kubeconfig => "$ENV{HOME}/.kube/single-node.yaml");

SEE ALSO

Rex::Rancher, Rex::GPU, Kubernetes::REST, IO::K8s

SUPPORT

Issues

Please report bugs and feature requests on GitHub at https://github.com/Getty/rex-rancher/issues.

CONTRIBUTING

Contributions are welcome! Please fork the repository and submit a pull request.

AUTHOR

Torsten Raudssus <getty@cpan.org>

COPYRIGHT AND LICENSE

This software is copyright (c) 2026 by Torsten Raudssus <torsten@raudssus.de> https://raudssus.de/.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.