Skip to main content
Version: latest

GPU Software Stack Overview

When you use a GPU on a server, you are not dealing with a single piece of software or hardware. Instead, you are working with an entire software stack built around NVIDIA GPUs. From the lowest-level physical hardware to the highest-level Kubernetes scheduling, it can be roughly divided into 5 layers:

Hardware LayerLinux Kernel Driver LayerUser-Space Libraries/Tools LayerContainer Runtime LayerKubernetes / HAMi Scheduling Layer

Understanding this layered structure is the foundation for troubleshooting GPU issues and understanding how HAMi works.

5-Layer Architecture Overview

The diagram below illustrates the complete layered structure of the GPU software stack:

Layer 1: Hardware

Layer 2: Kernel Driver

Layer 3: User-Space Libraries/Tools

Layer 4: Container Runtime

Layer 5: Kubernetes Scheduling

Kubernetes Scheduler

HAMi Scheduler

Pod

containerd

NVIDIA Container Toolkit

Container

CUDA Runtime / Toolkit

NVML (libnvidia-ml.so)

nvidia-smi

DCGM

MIG

nvidia.ko (Kernel Module)

nvidia-uvm.ko

nvidia-modeset.ko

NVIDIA GPU

PCIe Bus

NVLink (Optional)

GPU Software Stack 5-Layer Architecture

Layer Details

Layer 1: Hardware

The physical hardware is the foundation of everything:

  • GPU: NVIDIA GPU chips (such as A100, H100, L40, etc.), responsible for parallel computing.
  • PCIe Bus: GPUs communicate with the CPU through PCIe slots, serving as the primary data transfer channel.
  • NVLink (optional): A high-speed interconnect between multiple GPUs, offering far greater bandwidth than PCIe.

Layer 2: Linux Kernel Driver

The kernel driver is the bridge between user-space programs and the GPU:

  • nvidia.ko: The core NVIDIA kernel module that manages GPU hardware resources, video memory allocation, and command submission.
  • nvidia-uvm.ko: The Unified Virtual Memory module, enabling transparent sharing of memory address spaces between the CPU and GPU.
  • nvidia-modeset.ko: The display mode setting module, used for GPU graphics output management.

The kernel driver exposes /dev/nvidia* device nodes upward, through which user-space programs interact with the GPU.

Layer 3: User-Space Libraries/Tools

This layer contains the tools and libraries that developers and administrators interact with most frequently:

  • CUDA: NVIDIA's parallel computing platform and programming model, including the compiler (nvcc), runtime libraries, and development tools. Nearly all GPU applications access the GPU through CUDA.
  • NVML (NVIDIA Management Library): The GPU management library (libnvidia-ml.so), providing APIs for querying GPU status (temperature, video memory, utilization, etc.). Tools such as nvidia-smi and DCGM all depend on it.
  • nvidia-smi: A command-line management tool for viewing GPU status, processes, video memory usage, and more. It is the CLI front-end for NVML.
  • DCGM (Data Center GPU Manager): A data center GPU management tool providing health monitoring, diagnostics, group management, and other capabilities, suitable for large-scale GPU clusters.
  • MIG (Multi-Instance GPU): Multi-Instance GPU technology that physically partitions a single A100/H100 into multiple isolated GPU instances, each with its own dedicated video memory and compute cores.

Layer 4: Container Runtime

To make GPUs usable within containers, additional runtime components are required:

  • containerd: The container runtime responsible for image management and container lifecycle management. Kubernetes uses containerd by default.
  • NVIDIA Container Toolkit (formerly nvidia-docker2): Automatically mounts GPU device nodes, CUDA libraries, and NVIDIA driver libraries into the container at startup. It is the key bridge enabling containers to use GPUs.
  • Container: A running application container that gains GPU access through the Toolkit.

Layer 5: Kubernetes / HAMi Scheduling

Managing GPUs in a Kubernetes cluster requires:

  • NVIDIA Device Plugin: A Kubernetes device plugin that reports GPU resources on the node to kubelet, enabling Kubernetes to be aware of GPUs and schedule GPU workloads.
  • GPU Operator: A Kubernetes Operator provided by NVIDIA that automates the deployment and management of drivers, Container Toolkit, Device Plugin, DCGM, and other components.
  • HAMi Device Plugin: HAMi's device plugin, supporting fine-grained partitioning and sharing of GPU memory and compute resources.
  • HAMi Scheduler: HAMi's scheduler extension, supporting advanced scheduling policies such as Binpack/Spread, priorities, and targeted GPU card scheduling.

Key Call Chains

Understanding the key call chains within the GPU software stack helps with troubleshooting and understanding the relationships between components.

Complete Dependency Chain

From creation to execution, a GPU-using Pod goes through the following dependency chain:

Pod

containerd

NVIDIA Container Toolkit

CUDA

User-Space Driver Library

nvidia.ko

PCIe

GPU

Complete GPU Pod Dependency Chain

nvidia-smi Call Chain

The complete path for nvidia-smi to query GPU information:

nvidia-smi

NVML

libnvidia-ml.so

nvidia.ko

GPU

nvidia-smi Call Chain

Management Tool Call Chain

Multiple management tools all access the GPU through NVML:

nvidia-smi

NVML

DCGM

Device Plugin

libnvidia-ml.so

nvidia.ko

GPU

Management Tool Call Chain

As you can see, whether it is a command-line tool, a monitoring component, or a Kubernetes device plugin, they all ultimately access the hardware through the NVML -> kernel driver -> GPU path.

Kubernetes GPU Scheduling Chain

In Kubernetes, the flow of GPU resources from hardware to Pod:

GPU

Driver

Device Plugin

kubelet

Kubernetes API

K8s Scheduler

Pod

Kubernetes GPU Scheduling Chain

HAMi Enhanced Scheduling Chain

Building on the native scheduling chain, HAMi replaces the Device Plugin and Scheduler to implement GPU partitioning and sharing:

GPU

Driver

HAMi Device Plugin

kubelet

Kubernetes API

HAMi Scheduler

Pod

HAMi Enhanced Scheduling Chain

Component Quick Reference

ComponentSummary
GPUNVIDIA GPU hardware, executes parallel computing tasks
PCIeBus connecting the GPU and CPU, responsible for data transfer
nvidia.koNVIDIA kernel module, manages GPU hardware resources and exposes device nodes
nvidia-smiCommand-line tool for viewing GPU status, video memory, processes, and more
NVMLNVIDIA Management Library, a C-language API for GPU management
libnvidia-ml.soNVML shared library implementation; all management tools communicate with the driver through it
CUDANVIDIA parallel computing platform, the core programming and runtime framework for GPU applications
MIGMulti-Instance GPU, physically partitions a single GPU into multiple isolated instances
DCGMData Center GPU Manager, providing monitoring, diagnostics, and health checks
containerdContainer runtime, manages container images and lifecycle
NVIDIA Container ToolkitContainer GPU support, automatically mounts GPU devices and libraries into containers at startup
Device PluginKubernetes device plugin, reports node GPU resource information to the cluster
GPU OperatorNVIDIA Operator, automates deployment and management of the full GPU software stack
HAMiGPU virtualization middleware, supporting fine-grained partitioning and sharing of memory and compute
HAMi Device PluginHAMi's device plugin, replaces the native Device Plugin and supports GPU partition reporting
CNCFHAMi is a CNCF Sandbox project