Prequisities
Before installing HAMi, make sure the following tools and dependencies are properly installed in your environment:
- NVIDIA drivers >= 440
- nvidia-docker version > 2.0
- default runtime configured as nvidia for containerd/docker/cri-o container runtime
- Kubernetes version >= 1.18
- glibc >= 2.17 & glibc < 2.30
- kernel version >= 3.10
- helm > 3.0
Preparing your GPU Nodes
Execute the following steps on all your GPU nodes.
This README assumes pre-installation of NVIDIA drivers and the nvidia-container-toolkit. Additionally, it assumes configuration of the nvidia-container-runtime as the default low-level runtime.
For details see Installing the NVIDIA Container Toolkit.
Example for debian-based systems with Docker and containerd
Install the nvidia-container-toolkit
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sudo tee /etc/apt/sources.list.d/libnvidia-container.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
Configure Docker
When running Kubernetes with Docker, use the nvidia-ctk tool to automatically configure Docker:
sudo nvidia-ctk runtime configure --runtime=docker
And then restart Docker:
sudo systemctl daemon-reload && systemctl restart docker
Configure containerd
When running Kubernetes with containerd, use the nvidia-ctk tool to automatically configure containerd:
sudo nvidia-ctk runtime configure --runtime=containerd
And then restart containerd:
sudo systemctl daemon-reload && systemctl restart containerd
Label your nodes
Label your GPU nodes for scheduling with HAMi by adding the label "gpu=on". Without this label, the nodes cannot be managed by our scheduler.
kubectl label nodes {nodeid} gpu=on