Lab 5: Fake-GPU Scheduling with nvml-mock

IntermediateDuration: about 40 minutesEnvironment: Linux/macOS with kind · no real GPU requiredCost: freeVerified: 2026-06-13By: @maishivamhoo123

This lab uses NVIDIA's nvml-mock library to simulate a high-end GPU node — 8 fake A100 GPUs — inside a local kind cluster. You will build HAMi directly from the main branch, then verify GPU scheduling features: sharing, memory/core limits, percentage-based memory requests, and multi-GPU allocation — all without physical hardware.

What You'll Get

After completing this lab, you will have a local Kubernetes cluster with:

nvml-mock making the node report 8 fake A100 GPUs (nvidia.com/gpu: 80 after HAMi slices each physical GPU into 10 virtual slots)
HAMi device-plugin and scheduler running from the current main branch image
Pods verified for: single GPU, GPU sharing, memory/core limits, percentage-based memory, and multi-GPU allocation

note

No real CUDA runtime exists in this environment. Pods use busybox with CUDA_DISABLE_CONTROL=true to prevent HAMi's control library from attempting real device access. Runtime enforcement of memory and core limits still requires physical GPUs.

Installation Overview

The entire installation process consists of 10 steps:

Fake-GPU Scheduling Installation Overview

Step	Purpose	What It Solves
Create kind Cluster	Bootstrap local Kubernetes	Provide test environment
Build & Deploy nvml-mock	Simulate 8 fake A100 GPUs	Enable GPU discovery without hardware
Build HAMi from main	Compile latest scheduler & device-plugin	Ensure MIG fix is included
Deploy HAMi	Install control plane components	Enable GPU partitioning and scheduling
Verify GPU Resources	Check `nvidia.com/gpu: 80`	Confirm virtual GPU slots registered
Basic GPU Scheduling	Single-GPU Pod allocation	Verify basic scheduler functionality
GPU Sharing	Time-slice 4 Pods on same GPU	Test concurrent GPU access
Memory & Core Limits	Enforce `gpumem` and `gpucores`	Validate resource constraints
Percentage Memory	Request 30% GPU memory	Test percentage-based allocation
Multi-GPU Allocation	Single Pod with 2 GPUs	Verify multi-GPU binding

Prerequisites

macOS
Linux (Ubuntu)

macOS, Intel or Apple Silicon
Docker Desktop or OrbStack installed and running
Homebrew available

Install prerequisites:

brew install kind kubectl helm git go

Verify versions:

kind version                     # 0.20+
kubectl version --client --short # 1.31+
helm version                     # 3.x
go version                       # 1.21+

Ubuntu 20.04 LTS or later, x86_64
Docker Engine installed and running

Install prerequisites:

# kind
KIND_VERSION=v0.23.0
curl -Lo ./kind "https://kind.sigs.k8s.io/dl/${KIND_VERSION}/kind-linux-amd64"
chmod +x ./kind && sudo mv ./kind /usr/local/bin/kind

# kubectl
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl && rm kubectl

# Helm
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

# Go
GO_VERSION=1.24.0
curl -LO "https://go.dev/dl/go${GO_VERSION}.linux-amd64.tar.gz"
sudo rm -rf /usr/local/go && sudo tar -C /usr/local -xzf go${GO_VERSION}.linux-amd64.tar.gz
echo 'export PATH=$PATH:/usr/local/go/bin' >> ~/.bashrc && source ~/.bashrc

Verify versions:

kind version                     # 0.20+
kubectl version --client --short # 1.31+
helm version                     # 3.x
go version                       # 1.21+

tip

Windows users Use WSL2 with Ubuntu and follow the Linux tab above.

Step 1: Create the kind Cluster

kind create cluster --name nvml-mock-test

Set the NODE_NAME variable once — all subsequent commands use it:

NODE_NAME=$(kubectl get nodes -o jsonpath='{.items[0].metadata.name}')
echo "NODE_NAME=${NODE_NAME}"

Example output:

NODE_NAME=nvml-mock-test-control-plane

Step 2: Build and Deploy nvml-mock

nvml-mock provides a fake libnvidia-ml.so, virtual /dev/nvidia* device nodes, and PCI topology entries so HAMi's device-plugin sees 8 A100 GPUs on the node.

2.1 Clone and Build

git clone https://github.com/NVIDIA/k8s-test-infra.git
cd k8s-test-infra
docker build -t nvml-mock:local -f deployments/nvml-mock/Dockerfile .

note

The first build downloads base layers and may take 5–10 minutes. Subsequent builds use Docker layer cache.

2.2 Load into kind

kind load docker-image nvml-mock:local --name nvml-mock-test

2.3 Install via Helm

helm install nvml-mock oci://ghcr.io/nvidia/k8s-test-infra/chart/nvml-mock \
  --set image.repository=nvml-mock \
  --set image.tag=local \
  --wait --timeout 120s

The chart configures an A100 profile by default: 8 GPUs per node, driver version 550.163.01, fake driver root at /var/lib/nvml-mock/driver. This driver root path is passed to HAMi in Step 4.

2.4 Verify GPU Discovery

kubectl get node ${NODE_NAME} \
  -o custom-columns=NAME:.metadata.name,GPU_PRESENT:.metadata.labels.nvidia\\.com/gpu\\.present

Expected output:

NAME                             GPU_PRESENT
nvml-mock-test-control-plane     true

Step 3: Build HAMi from the `main` Branch

The main branch contains a fix preventing nvidia-mig-parted from being called when MIG is not enabled. Building from source ensures the fix is present without waiting for a tagged release.

3.1 Clone and Initialize Submodules

cd ~
git clone https://github.com/Project-HAMi/HAMi.git
cd HAMi
git submodule update --init --recursive

3.2 Build the Docker Image

docker build -t hami:local -f docker/Dockerfile .

note

HAMi uses a three-stage Dockerfile: a Go build stage, a CUDA library build stage, and a final runtime stage. The first build takes several minutes as it pulls the CUDA base images; subsequent runs use the layer cache.

3.3 Load into kind

kind load docker-image hami:local --name nvml-mock-test

Both the scheduler and device-plugin binaries are packaged into the single hami:local image.

Step 4: Deploy HAMi

4.1 Install via Helm

helm install hami ./charts/hami \
  -n kube-system \
  --set devicePlugin.image.repository=hami \
  --set devicePlugin.image.tag=local \
  --set scheduler.image.repository=hami \
  --set scheduler.image.tag=local \
  --set devicePlugin.nvidiaDriverRoot=/var/lib/nvml-mock/driver \
  --set scheduler.kubeScheduler.imageTag=v1.35.0

devicePlugin.nvidiaDriverRoot points HAMi at the fake driver libraries installed by nvml-mock.

4.2 Label the Node

warning

Required before the device-plugin can start The HAMi device-plugin DaemonSet has NODE SELECTOR: gpu=on. Without this label, DESIRED stays at 0, no Pod is scheduled, and no GPUs are registered.

kubectl label node ${NODE_NAME} gpu=on

Confirm the DaemonSet now schedules a Pod:

kubectl -n kube-system get daemonset hami-device-plugin

Expected output:

NAME                 DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
hami-device-plugin   1         1         0       1            0           gpu=on          4m22s

4.3 Set the NVML Device Discovery Strategy

kubectl -n kube-system set env daemonset/hami-device-plugin \
  -c device-plugin \
  DEVICE_DISCOVERY_STRATEGY=nvml

This tells the device-plugin to enumerate GPUs via the NVML API rather than scanning /dev. Without this, the plugin defaults to a file-based strategy that cannot see nvml-mock's virtual devices.

4.4 Roll Out and Verify

kubectl -n kube-system rollout restart daemonset/hami-device-plugin
kubectl -n kube-system rollout status daemonset/hami-device-plugin --timeout=120s

Check for MIG errors — an empty response is the expected output:

kubectl -n kube-system logs daemonset/hami-device-plugin -c device-plugin | grep -i mig

Check overall Pod status:

kubectl -n kube-system get pods -l app.kubernetes.io/name=hami

Expected output:

NAME                              READY   STATUS             RESTARTS   AGE
hami-device-plugin-lbctx          1/2     CrashLoopBackOff   6          9m24s
hami-scheduler-7858c744cc-7pb79   2/2     Running            0          13m

note

The vgpu-monitor sidecar crashes because it requires real GPU monitoring infrastructure. The device-plugin container is running correctly — 1/2 is expected here and does not affect GPU scheduling.

Step 5: Verify GPU Resources

HAMi partitions each physical GPU into 10 virtual slots. With 8 physical GPUs the node should advertise 80 allocatable virtual GPUs.

kubectl describe node ${NODE_NAME} | grep nvidia.com/gpu

Expected output:

                    nvidia.com/gpu.present=true
  nvidia.com/gpu:     80
  nvidia.com/gpu:     80
  nvidia.com/gpu     0           0

Both Capacity and Allocatable showing 80 confirms the device-plugin registered all virtual GPU slots. The final line is the Allocated resources table — currently 0 because no Pods have claimed GPUs yet.

Step 6: Test Basic GPU Scheduling

Deploy a minimal Pod requesting one GPU. CUDA_DISABLE_CONTROL=true prevents HAMi's injected CUDA shim from attempting real device access:

kubectl apply -f - <<'EOF'
apiVersion: v1
kind: Pod
metadata:
  name: gpu-test-1
spec:
  containers:
  - name: sleep
    image: busybox
    command: ["sleep", "3600"]
    env:
    - name: CUDA_DISABLE_CONTROL
      value: "true"
    resources:
      limits:
        nvidia.com/gpu: 1
EOF

Wait for the Pod to run:

kubectl get pod gpu-test-1 -w

Expected output:

NAME         READY   STATUS    RESTARTS   AGE
gpu-test-1   1/1     Running   0          9s

Verify the allocation annotation:

kubectl describe pod gpu-test-1 | grep vgpu-devices-allocated

Expected output:

hami.io/vgpu-devices-allocated: GPU-12345678-1234-1234-1234-123456780006,NVIDIA,40960,100:;

The annotation format is <UUID>,<vendor>,<memMiB>,<cores>. A100 GPUs have 40960 MiB of VRAM — seeing this annotation confirms one virtual GPU was allocated and recorded by the scheduler.

Deploy three more Pods each requesting 1 GPU:

for i in 2 3 4; do
kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: gpu-test-$i
spec:
  containers:
  - name: sleep
    image: busybox
    command: ["sleep", "3600"]
    env:
    - name: CUDA_DISABLE_CONTROL
      value: "true"
    resources:
      limits:
        nvidia.com/gpu: 1
EOF
done

warning

Use <<EOF, not <<'EOF' inside the loop Single-quoting the delimiter suppresses shell expansion. $i would not be substituted and all three Pods would get the same name.

Verify all Pods are running:

kubectl get pods | grep gpu-test

Expected output:

gpu-test-1   1/1     Running   0          3m19s
gpu-test-2   1/1     Running   0          10s
gpu-test-3   1/1     Running   0          10s
gpu-test-4   1/1     Running   0          9s

All four Pods run concurrently against the pool of 80 virtual GPU slots. The scheduler independently tracks each allocation via its own vgpu-devices-allocated annotation.

Step 8: Test Memory and Core Limits

kubectl apply -f - <<'EOF'
apiVersion: v1
kind: Pod
metadata:
  name: gpu-limits
spec:
  containers:
  - name: sleep
    image: busybox
    command: ["sleep", "3600"]
    env:
    - name: CUDA_DISABLE_CONTROL
      value: "true"
    resources:
      limits:
        nvidia.com/gpu: 1
        nvidia.com/gpumem: "10"
        nvidia.com/gpucores: "30"
EOF

info

Resource Limits Format nvidia.com/gpumem takes an absolute value in MiB — "10" means 10 MiB. nvidia.com/gpucores: "30" requests 30 compute cores on the selected GPU.

Verify the allocation:

kubectl describe pod gpu-limits | grep vgpu-devices-allocated

Expected output:

hami.io/vgpu-devices-allocated: GPU-12345678-1234-1234-1234-123456780002,NVIDIA,10,30:;

The annotation records 10 MiB and 30 cores — exactly the values requested.

Step 9: Test Percentage-Based Memory Request

Instead of a fixed MiB value, nvidia.com/gpumem-percentage lets you request a fraction of the GPU's total memory. On an A100 (40960 MiB), requesting 30% allocates approximately 12288 MiB.

tip

Why Percentage-Based Allocation? This is useful when you want workloads to scale proportionally across different GPU models without hardcoding absolute sizes.

Create the Pod:

kubectl apply -f - <<'EOF'
apiVersion: v1
kind: Pod
metadata:
  name: gpu-mem-30pct
spec:
  containers:
  - name: sleep
    image: busybox
    command: ["sleep", "3600"]
    env:
    - name: CUDA_DISABLE_CONTROL
      value: "true"
    resources:
      limits:
        nvidia.com/gpu: 1
        nvidia.com/gpumem-percentage: "30"
EOF

Wait for the Pod to reach Running:

kubectl get pod gpu-mem-30pct -w

Expected output:

NAME            READY   STATUS    RESTARTS   AGE
gpu-mem-30pct   1/1     Running   0          8s

Inspect the allocation annotation:

kubectl get pod gpu-mem-30pct \
  -o jsonpath='{.metadata.annotations.hami\.io/vgpu-devices-allocated}'

Expected output:

GPU-12345678-1234-1234-1234-123456780003,NVIDIA,12288,100:;

The third field shows 12288 MiB — 30% of 40960 MiB — confirming the scheduler correctly translated the percentage into an absolute memory budget for the allocation.

Step 10: Test Multi-GPU Allocation

kubectl apply -f - <<'EOF'
apiVersion: v1
kind: Pod
metadata:
  name: gpu-multi
spec:
  containers:
  - name: sleep
    image: busybox
    command: ["sleep", "3600"]
    env:
    - name: CUDA_DISABLE_CONTROL
      value: "true"
    resources:
      limits:
        nvidia.com/gpu: "2"
EOF

Verify the Pod is running:

kubectl get pod gpu-multi

Expected output:

NAME        READY   STATUS    RESTARTS   AGE
gpu-multi   1/1     Running   0          64s

Check the scheduler events:

kubectl describe pod gpu-multi | tail -20

Expected output:

Events:
  Type    Reason            Age   From            Message
  ----    ------            ----  ----            -------
  Normal  Scheduled         70s   hami-scheduler  Successfully assigned default/gpu-multi to nvml-mock-test-control-plane
  Normal  FilteringSucceed  70s   hami-scheduler  find fit node(nvml-mock-test-control-plane), 0 nodes not fit, 1 nodes fit(nvml-mock-test-control-plane:13.63)
  Normal  BindingSucceed    70s   hami-scheduler  Successfully binding node [nvml-mock-test-control-plane] to default/gpu-multi
  Normal  Pulling           69s   kubelet         spec.containers{sleep}: Pulling image "busybox"
  Normal  Pulled            67s   kubelet         Successfully pulled image "busybox" in 3.548s
  Normal  Created           67s   kubelet         Container created
  Normal  Started           67s   kubelet         Container started

The hami-scheduler events — FilteringSucceed, Scheduled, and BindingSucceed — confirm HAMi's scheduler handled this Pod and successfully bound it to the node with 2 GPU slots.

Viewing the full vgpu-devices-allocated annotation

kubectl get pod gpu-multi \
  -o jsonpath='{.metadata.annotations.hami\.io/vgpu-devices-allocated}'

You will see two semicolon-separated device entries, one per allocated vGPU slot.

Summary of Verified Features

Feature	Test Pod	How It Is Verified
Basic GPU scheduling	`gpu-test-1`	Annotation shows 1 vGPU UUID + 40960 MiB
GPU sharing (time-slicing)	`gpu-test-1` through `gpu-test-4`	All 4 Pods run concurrently
Memory limit (`gpumem`)	`gpu-limits`	Annotation shows `10` MiB
Core limit (`gpucores`)	`gpu-limits`	Annotation shows `30` cores
Percentage memory (`gpumem-percentage`)	`gpu-mem-30pct`	Annotation shows `12288` MiB (30% of A100)
Multi-GPU allocation	`gpu-multi`	hami-scheduler events show `BindingSucceed`

Real GPU required for the following

Actual CUDA program execution
Runtime enforcement of gpumem and gpucores limits
Real DCGM GPU metrics (temperature, utilisation)
Memory overcommit and memory override features

Cleanup

Delete all test Pods:

kubectl delete pod gpu-test-1 gpu-test-2 gpu-test-3 gpu-test-4 \
  gpu-limits gpu-mem-30pct gpu-multi

Remove the GPU node label:

kubectl label node ${NODE_NAME} gpu-

Uninstall HAMi:

helm uninstall hami -n kube-system

Uninstall nvml-mock:

helm uninstall nvml-mock

Delete the kind cluster:

kind delete cluster --name nvml-mock-test

tip

Skip the cluster deletion step if you want to keep the environment for further experimentation.

Next Steps

Move to a real GPU cluster (see Lab 1: Online HAMi Installation) to test memory and core isolation with actual CUDA workloads.
Add Prometheus and HAMi WebUI for visual resource tracking (see Lab 2: Local Fake GPU Setup).

What You'll Get​

Installation Overview​

Prerequisites​

Step 1: Create the kind Cluster​

Step 2: Build and Deploy nvml-mock​

2.1 Clone and Build​

2.2 Load into kind​

2.3 Install via Helm​

2.4 Verify GPU Discovery​

Step 3: Build HAMi from the main Branch​

3.1 Clone and Initialize Submodules​

3.2 Build the Docker Image​

3.3 Load into kind​

Step 4: Deploy HAMi​

4.1 Install via Helm​

4.2 Label the Node​

4.3 Set the NVML Device Discovery Strategy​

4.4 Roll Out and Verify​

Step 5: Verify GPU Resources​

Step 6: Test Basic GPU Scheduling​

Step 7: Test GPU Sharing (Time-slicing)​

Step 8: Test Memory and Core Limits​

Step 9: Test Percentage-Based Memory Request​

Step 10: Test Multi-GPU Allocation​

Summary of Verified Features​

Cleanup​

Next Steps​

What You'll Get

Installation Overview

Prerequisites

Step 1: Create the kind Cluster

Step 2: Build and Deploy nvml-mock

2.1 Clone and Build

2.2 Load into kind

2.3 Install via Helm

2.4 Verify GPU Discovery

Step 3: Build HAMi from the `main` Branch

3.1 Clone and Initialize Submodules

3.2 Build the Docker Image

3.3 Load into kind

Step 4: Deploy HAMi

4.1 Install via Helm

4.2 Label the Node

4.3 Set the NVML Device Discovery Strategy

4.4 Roll Out and Verify

Step 5: Verify GPU Resources

Step 6: Test Basic GPU Scheduling

Step 7: Test GPU Sharing (Time-slicing)

Step 8: Test Memory and Core Limits

Step 9: Test Percentage-Based Memory Request

Step 10: Test Multi-GPU Allocation

Summary of Verified Features

Cleanup

Next Steps