Skip to main content
Version: latest

Performance Benchmarks

Three instances from the vLLM benchmark were used to evaluate vGPU-device-plugin performance.

Test Environment

ParameterValue
Kubernetes versionv1.35.4
Docker version29.4.0
GPU TypeA100-SXM4-40GB
GPU Count2

Test Instances

InstanceDescription
Nativek8s + nvidia k8s-device-plugin
Opensource_v280k8s + VGPU k8s-device-plugin, opensource version v280
Opensource_v290k8s + VGPU k8s-device-plugin, opensource version v290

Test Cases

Test IDCaseTypeParameters
6.1Qwen3-8B (vLLM)inferencebatch=1, stream=True, max_model_len=8192

Results

MetricNativeOpensource_v280Opensource_v290
TTFT p50 (s)0.06210.06700.0629
TTFT p95 (s)0.06420.07130.0650
TTFT p99 (s)0.06520.07350.0674
Per-token latency (clean mean, s)0.02850.03100.0291

Reproducing the Benchmark

  1. Install k8s-vGPU-scheduler and configure it properly.

  2. Build the benchmark images:

cd benchmarks/ai-benchmark
sh build.sh
  1. Run the benchmark job:
kubectl apply -f benchmarks/deployments/job-on-nvidia-device-plugin.yml
kubectl apply -f benchmarks/deployments/job-on-hami.yml
  1. View the results:
kubectl cp <pod-name>:/results ./results
python3 benchmarks/ai-benchmark/gen_report.py \
--dataset native ./results/bench_native.jsonl \
--dataset hami ./results/bench_hami.jsonl
CNCFHAMi is a CNCF Sandbox project