跳转到文档内容
版本:v1.3.0

Enable dynamic-mig feature

Introduction

We now support dynamic-mig by using mig-parted to adjust mig-devices dynamically, including:

Dynamic MIG instance management: User don't need to operate on GPU node, using 'nvidia-smi -i 0 -mig 1' or other command to manage MIG instance, all will be done by HAMi-device-plugin.

Dynamic MIG Adjustment: Each MIG device managed by HAMi will dyamically adjust their MIG template according to tasks submitted when necessary.

Device MIG Observation: Each MIG instance generated by HAMi will be shown in scheduler-monitor, including task information. user can get a clear overview of MIG nodes.

Compatable with HAMi-core nodes: HAMi can manage a unified GPU pool of HAMi-core node and mig node. A task can be scheduled to either node if not appointed manually by using nvidia.com/vgpu-mode annotation.

Unified API with HAMi-core: Zero work needs to be done to make the job compatible with dynamic-mig feature.

Prerequisites

  • NVIDIA Blackwell and Hopper™ and Ampere Devices
  • HAMi > v2.5.0
  • Nvidia-container-toolkit

Enabling Dynamic-mig Support

  • Install the chart using helm, See 'enabling vGPU support in kubernetes' section here

  • Configure mode in device-plugin configMap to mig for MIG nodes

kubectl describe cm  hami-device-plugin -n kube-system
{
"nodeconfig": [
{
"name": "MIG-NODE-A",
"operatingmode": "mig",
"filterdevices": {
"uuid": [],
"index": []
}
}
]
}
  • Restart the following pods for the change to take effect:
    • hami-scheduler
    • hami-device-plugin on 'MIG-NODE-A'

Custom mig configuration (Optional)

HAMi currently has a built-in mig configuration for MIG.

You can customize the mig configuration by following the steps below:

Change the content of 'device-configmap.yaml' in charts/hami/templates/scheduler, the as follows

  nvidia:
resourceCountName: {{ .Values.resourceName }}
resourceMemoryName: {{ .Values.resourceMem }}
resourceMemoryPercentageName: {{ .Values.resourceMemPercentage }}
resourceCoreName: {{ .Values.resourceCores }}
resourcePriorityName: {{ .Values.resourcePriority }}
overwriteEnv: false
defaultMemory: 0
defaultCores: 0
defaultGPUNum: 1
deviceSplitCount: {{ .Values.devicePlugin.deviceSplitCount }}
deviceMemoryScaling: {{ .Values.devicePlugin.deviceMemoryScaling }}
deviceCoreScaling: {{ .Values.devicePlugin.deviceCoreScaling }}
knownMigGeometries:
- models: [ "A30" ]
allowedGeometries:
-
- name: 1g.6gb
memory: 6144
count: 4
-
- name: 2g.12gb
memory: 12288
count: 2
-
- name: 4g.24gb
memory: 24576
count: 1
- models: [ "A100-SXM4-40GB", "A100-40GB-PCIe", "A100-PCIE-40GB", "A100-SXM4-40GB" ]
allowedGeometries:
-
- name: 1g.5gb
memory: 5120
count: 7
-
- name: 2g.10gb
memory: 10240
count: 3
- name: 1g.5gb
memory: 5120
count: 1
-
- name: 3g.20gb
memory: 20480
count: 2
-
- name: 7g.40gb
memory: 40960
count: 1
- models: [ "A100-SXM4-80GB", "A100-80GB-PCIe", "A100-PCIE-80GB"]
allowedGeometries:
-
- name: 1g.10gb
memory: 10240
count: 7
-
- name: 2g.20gb
memory: 20480
count: 3
- name: 1g.10gb
memory: 10240
count: 1
-
- name: 3g.40gb
memory: 40960
count: 2
-
- name: 7g.79gb
memory: 80896
count: 1

Note Helm installation and updates will be based on the configuration in this file, overwriting the built-in configuration of Helm

Note Be aware HAMi will find and use the first MIG template suitable to the task in the order of this configMap

Running MIG jobs

MIG instance can now be requested by a container the same way as using hami-core simply by specifying the nvidia.com/gpu and nvidia.com/gpumem resource type.

apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
annotations:
nvidia.com/vgpu-mode: "mig" #(Optional), if not set, this pod can be assigned to a MIG instance or a hami-core instance
spec:
containers:
- name: ubuntu-container
image: ubuntu:18.04
command: ["bash", "-c", "sleep 86400"]
resources:
limits:
nvidia.com/gpu: 2
nvidia.com/gpumem: 8000

In this example above, the task allocates two mig instances, each with at least 8G device memory.

Monitor MIG Instance

MIG Instance managed by HAMi will be displayed in scheduler monitor(scheduler node ip:31993/metrics), as follows:

# HELP nodeGPUMigInstance GPU Sharing mode. 0 for hami-core, 1 for mig, 2 for mps
# TYPE nodeGPUMigInstance gauge
nodeGPUMigInstance{deviceidx="0",deviceuuid="GPU-936619fc-f6a1-74a8-0bc6-ecf6b3269313",migname="3g.20gb-0",nodeid="aio-node15",zone="vGPU"} 1
nodeGPUMigInstance{deviceidx="0",deviceuuid="GPU-936619fc-f6a1-74a8-0bc6-ecf6b3269313",migname="3g.20gb-1",nodeid="aio-node15",zone="vGPU"} 0
nodeGPUMigInstance{deviceidx="1",deviceuuid="GPU-30f90f49-43ab-0a78-bf5c-93ed41ef2da2",migname="3g.20gb-0",nodeid="aio-node15",zone="vGPU"} 1
nodeGPUMigInstance{deviceidx="1",deviceuuid="GPU-30f90f49-43ab-0a78-bf5c-93ed41ef2da2",migname="3g.20gb-1",nodeid="aio-node15",zone="vGPU"} 1

Notes

  1. You don't need to do anything on MIG node, all are managed by mig-parted in hami-device-plugin.

  2. Nvidia devices before Ampere architect can't use 'mig' mode

  3. You won't see any mig resources(ie, nvidia.com/mig-1g.10gb) on node, hami uses a unified resource name for both 'mig' and 'hami-core' node