跳到主要内容
版本:v1.3.0

Cluster device allocation

Cluster device allocation endpoint

You can get the overview of cluster device allocation and limit by visiting {scheduler node ip}:31993/metrics, or add it to a prometheus endpoint, as the command below:

curl {scheduler node ip}:31993/metrics

It contains the following metrics:

MetricsDescriptionExample
GPUDeviceCoreLimitGPUDeviceCoreLimit Device memory core limit for a certain GPU{deviceidx="0",deviceuuid="GPU-00552014-5c87-89ac-b1a6-7b53aa24b0ec",nodeid="aio-node67",zone="vGPU"} 100
GPUDeviceMemoryLimitGPUDeviceMemoryLimit Device memory limit for a certain GPU{deviceidx="0",deviceuuid="GPU-00552014-5c87-89ac-b1a6-7b53aa24b0ec",nodeid="aio-node67",zone="vGPU"} 3.4359738368e+10
GPUDeviceCoreAllocatedDevice core allocated for a certain GPU{deviceidx="0",deviceuuid="GPU-00552014-5c87-89ac-b1a6-7b53aa24b0ec",nodeid="aio-node67",zone="vGPU"} 45
GPUDeviceMemoryAllocatedDevice memory allocated for a certain GPU{devicecores="0",deviceidx="0",deviceuuid="aio-node74-arm-Ascend310P-0",nodeid="aio-node74-arm",zone="vGPU"} 3.221225472e+09
GPUDeviceSharedNumNumber of containers sharing this GPU{deviceidx="0",deviceuuid="GPU-00552014-5c87-89ac-b1a6-7b53aa24b0ec",nodeid="aio-node67",zone="vGPU"} 1
vGPUPodsDeviceAllocatedvGPU Allocated from pods{containeridx="Ascend310P",deviceusedcore="0",deviceuuid="aio-node74-arm-Ascend310P-0",nodename="aio-node74-arm",podname="ascend310p-pod",podnamespace="default",zone="vGPU"} 3.221225472e+09

Note Please note that, this is the overview about device allocation, it is NOT device real-time usage metrics. For that part, see real-time device usage.