Global Config
Device Configs: ConfigMap
All the configurations listed below are managed within the hami-scheduler-device ConfigMap.
You can update these configurations using one of the following methods:
-
Directly edit the ConfigMap: If HAMi has already been successfully installed, you can manually update the hami-scheduler-device ConfigMap using the kubectl edit command to manually update the hami-scheduler-device ConfigMap.
kubectl edit configmap hami-scheduler-device -n <namespace>After making changes, restart the related HAMi components to apply the updated configurations.
-
Modify Helm Chart: Update the corresponding values in the ConfigMap, then reapply the Helm Chart to regenerate the ConfigMap.
Argument Type Description Default nvidia.deviceMemoryScalingFloat The ratio for NVIDIA device memory scaling, can be greater than 1 (enables virtual device memory, experimental feature). For an NVIDIA GPU with M memory, if set to S, vGPUs split from this GPU will get S * Mmemory in Kubernetes.1nvidia.deviceSplitCountInteger Maximum jobs assigned to a single GPU device. 10nvidia.migstrategyString "none" for ignoring MIG features, "mixed" for allocating MIG devices by separate resources. "none"nvidia.disablecorelimitString "true" to disable core limit, "false" to enable core limit. "false"nvidia.defaultMemInteger The default device memory of the current job, in MB. '0' means using 100% of the device memory. 0nvidia.defaultCoresInteger Percentage of GPU cores reserved for the current job. 0allows any GPU with enough memory;100reserves the entire GPU exclusively.0nvidia.defaultGPUNumInteger Default number of GPUs. If set to 0, it will be filtered out. Ifnvidia.com/gpuis not set in the pod resource, the webhook checksnvidia.com/gpumem,resource-mem-percentage, andnvidia.com/gpucores, addingnvidia.com/gpuwith this default value if any of them are set.1nvidia.resourceCountNameString vGPU number resource name. "nvidia.com/gpu"nvidia.resourceMemoryNameString vGPU memory size resource name. "nvidia.com/gpumem"nvidia.resourceMemoryPercentageNameString vGPU memory fraction resource name. "nvidia.com/gpumem-percentage"nvidia.resourceCoreNameString vGPU core resource name. "nvidia.com/cores"nvidia.resourcePriorityNameString vGPU job priority name. "nvidia.com/priority"
Node Configs: ConfigMap
HAMi allows configuring per-node behavior for device plugin. Edit
kubectl -n <namespace> edit cm hami-device-plugin
name: Name of the node.operatingmode: Operating mode of the node, can be "hami-core" or "mig", default: "hami-core".devicememoryscaling: Overcommit ratio of device memory.devicecorescaling: Overcommit ratio of device core.devicesplitcount: Allowed number of tasks sharing a device.filterdevices: Devices that are not registered to HAMi.uuid: UUIDs of devices to ignoreindex: Indexes of devices to ignore.- A device is ignored by HAMi if it's in
uuidorindexlist.
Chart Configs: arguments
You can customize your vGPU support by setting the following arguments using -set, for example
helm install hami hami-charts/hami --set devicePlugin.deviceMemoryScaling=5 ...
| Argument | Type | Description | Default |
|---|---|---|---|
devicePlugin.service.schedulerPort | Integer | Scheduler webhook service nodePort. | 31998 |
scheduler.defaultSchedulerPolicy.nodeSchedulerPolicy | String | GPU node scheduling policy: "binpack" allocates jobs to the same GPU node as much as possible. "spread" allocates jobs to different GPU nodes as much as possible. | "binpack" |
scheduler.defaultSchedulerPolicy.gpuSchedulerPolicy | String | GPU scheduling policy: "binpack" allocates jobs to the same GPU as much as possible. "spread" allocates jobs to different GPUs as much as possible. | "spread" |
Pod configs: annotations
| Argument | Type | Description | Example |
|---|---|---|---|
nvidia.com/use-gpuuuid | String | If set, devices allocated by this pod must be one of the UUIDs defined in this string. | "GPU-AAA,GPU-BBB" |
nvidia.com/nouse-gpuuuid | String | If set, devices allocated by this pod will NOT be in the UUIDs defined in this string. | "GPU-AAA,GPU-BBB" |
nvidia.com/nouse-gputype | String | If set, devices allocated by this pod will NOT be in the types defined in this string. | "Tesla V100-PCIE-32GB, NVIDIA A10" |
nvidia.com/use-gputype | String | If set, devices allocated by this pod MUST be one of the types defined in this string. | "Tesla V100-PCIE-32GB, NVIDIA A10" |
hami.io/node-scheduler-policy | String | GPU node scheduling policy: "binpack" allocates the pod to used GPU nodes for execution. "spread" allocates the pod to different GPU nodes for execution. | "binpack" or "spread" |
hami.io/gpu-scheduler-policy | String | GPU scheduling policy: "binpack" allocates the pod to the same GPU card for execution. "spread" allocates the pod to different GPU cards for execution. | "binpack" or "spread" |
nvidia.com/vgpu-mode | String | The type of vGPU instance this pod wishes to use. | "hami-core" or "mig" |
Container configs: env
| Argument | Type | Description | Default |
|---|---|---|---|
GPU_CORE_UTILIZATION_POLICY | String | Defines GPU core utilization policy:
| "default" |
CUDA_DISABLE_CONTROL | Boolean | If "true", HAMi-core will not be used inside the container, leading to no resource isolation and limitation (for debugging purposes). | false |