Skip to main content
Version: Next

Real-time device usage endpoint

You can get the real-time device memory and core utilization by visiting <GPU-node-ip>:31992/metrics, or add it to a prometheus endpoint, as the command below:

curl <GPU-node-ip>:31992/metrics

It contains the following host-level metrics:

MetricsDescriptionExample
hami_host_gpu_utilization_ratioGPU core utilization ratio on host (0-100){device_index="0",device_type="NVIDIA-NVIDIA H200",device_uuid="GPU-00552014-5c87-89ac-b1a6-7b53aa24b0ec",zone="vGPU"} 0
hami_host_gpu_memory_used_bytesGPU real-time device memory usage on host{device_index="0",device_type="NVIDIA-NVIDIA H200",device_uuid="GPU-00552014-5c87-89ac-b1a6-7b53aa24b0ec",zone="vGPU"} 2.87244288e+08

It also exposes per-container and per-vGPU metrics for each scheduled task:

MetricsDescriptionExample
hami_container_device_utilization_ratioContainer device SM utilization ratio{container="cuda",device_uuid="GPU-00552014-5c87-89ac-b1a6-7b53aa24b0ec",namespace="default",pod="vgpu-share",vdevice_index="0",zone="vGPU"} 0
hami_container_device_memory_bytesContainer device memory usage breakdown in bytes{buffer_size="0",container="cuda",context_size="0",device_uuid="GPU-00552014-5c87-89ac-b1a6-7b53aa24b0ec",module_size="0",namespace="default",offset="0",pod="vgpu-share",vdevice_index="0",zone="vGPU"} 0
hami_container_last_kernel_elapsed_secondsSeconds since last kernel execution in container{container="cuda",device_uuid="GPU-00552014-5c87-89ac-b1a6-7b53aa24b0ec",namespace="default",pod="vgpu-share",vdevice_index="0",zone="vGPU"} 3664
hami_vgpu_memory_used_bytesvGPU device memory usage in bytes{container="cuda",device_uuid="GPU-00552014-5c87-89ac-b1a6-7b53aa24b0ec",namespace="default",pod="vgpu-share",vdevice_index="0",zone="vGPU"} 0
hami_vgpu_memory_limit_bytesvGPU device memory limit in bytes{container="cuda",device_uuid="GPU-00552014-5c87-89ac-b1a6-7b53aa24b0ec",namespace="default",pod="vgpu-share",vdevice_index="0",zone="vGPU"} 2.097152e+10
hami_vgpu_memory_buffer_bytesContainer device memory buffer size in bytes{container="cuda",device_uuid="GPU-00552014-5c87-89ac-b1a6-7b53aa24b0ec",namespace="default",pod="vgpu-share",vdevice_index="0",zone="vGPU"} 6.83935744e+08
hami_vgpu_memory_context_bytesContainer device memory context size in bytes{container="cuda",device_uuid="GPU-00552014-5c87-89ac-b1a6-7b53aa24b0ec",namespace="default",pod="vgpu-share",vdevice_index="0",zone="vGPU"} 0
hami_vgpu_memory_module_bytesContainer device memory module size in bytes{container="cuda",device_uuid="GPU-00552014-5c87-89ac-b1a6-7b53aa24b0ec",namespace="default",pod="vgpu-share",vdevice_index="0",zone="vGPU"} 0
note

The context_size, module_size, buffer_size and offset labels on hami_container_device_memory_bytes will be deprecated in v2.10.0. Use hami_vgpu_memory_context_bytes, hami_vgpu_memory_module_bytes and hami_vgpu_memory_buffer_bytes instead.

CNCFHAMi is a CNCF Incubating project