Version: v2.9.0

Performance Benchmarks

Three instances from the vLLM benchmark were used to evaluate vGPU-device-plugin performance.

Test Environment

Parameter	Value
Kubernetes version	v1.35.4
Docker version	29.4.0
GPU Type	A100-SXM4-40GB
GPU Count	2

Test Instances

Instance	Description
Native	k8s + nvidia k8s-device-plugin
Opensource_v280	k8s + VGPU k8s-device-plugin, opensource version v280
Opensource_v290	k8s + VGPU k8s-device-plugin, opensource version v290

Test Cases

Test ID	Case	Type	Parameters
6.1	Qwen3-8B (vLLM)	inference	batch=1, stream=True, max_model_len=8192

Results

Metric	Native	Opensource_v280	Opensource_v290
TTFT p50 (s)	0.0621	0.0670	0.0629
TTFT p95 (s)	0.0642	0.0713	0.0650
TTFT p99 (s)	0.0652	0.0735	0.0674
Per-token latency (clean mean, s)	0.0285	0.0310	0.0291

Reproducing the Benchmark

Install k8s-vGPU-scheduler and configure it properly.
Build the benchmark images:

cd benchmarks/ai-benchmark
sh build.sh

Run the benchmark job:

kubectl apply -f benchmarks/deployments/job-on-nvidia-device-plugin.yml
kubectl apply -f benchmarks/deployments/job-on-hami.yml

View the results:

kubectl cp <pod-name>:/results ./results
python3 benchmarks/ai-benchmark/gen_report.py \
    --dataset native ./results/bench_native.jsonl \
    --dataset hami ./results/bench_hami.jsonl

Test Environment​

Test Instances​

Test Cases​

Results​

Reproducing the Benchmark​

Test Environment

Test Instances

Test Cases

Results

Reproducing the Benchmark