Enable Cambricon MLU Sharing
HAMi now supports cambricon.com/mlu by implementing most device-sharing features similar to NVIDIA GPUs, including:
-
MLU Sharing: Tasks can request a fraction of an MLU instead of an entire MLU card. This enables multiple tasks to share the same MLU device.
-
Device Memory Control: You can allocate MLUs with a specified memory size, with guaranteed enforcement to ensure usage does not exceed the requested limit.
-
Device Core Control: MLUs can be assigned a specific number of compute cores, and enforcement ensures core usage remains within bounds.
-
MLU Type Selection: You can use annotations to specify which MLU types a task must use or must avoid by setting
cambricon.com/use-mlutypeorcambricon.com/nouse-mlutype.
Prerequisites
- neuware-mlu370-driver > 5.10
- cntoolkit > 2.5.3
Enabling MLU Sharing
-
Install HAMi via Helm
Follow the instructions under the Enabling vGPU Support in Kubernetes section in the HAMi README.
-
Enable SMLU mode on each MLU device
cnmon set -c 0 -smlu on
cnmon set -c 1 -smlu on
# Repeat for all devices... -
Deploy the Cambricon device plugin
Get the
cambricon-device-pluginfrom your device provider, and configure it with the following parameters:mode=dynamic-smlu: Enables dynamic SMLU support.min-dsmlu-unit=256: Sets the minimum allocatable memory unit to 256 MB.
Refer to your provider’s documentation for additional details.
-
Apply the configured plugin
kubectl apply -f cambricon-device-plugin-daemonset.yaml
Running MLU Jobs
To request shared MLU resources in a container, use the following resource types:
cambricon.com/vmlucambricon.com/mlu.smlu.vmemorycambricon.com/mlu.smlu.vcore
Here is an YAML example:
apiVersion: apps/v1
kind: Deployment
metadata:
name: binpack-1
labels:
app: binpack-1
spec:
replicas: 1
selector:
matchLabels:
app: binpack-1
template:
metadata:
labels:
app: binpack-1
spec:
containers:
- name: c-1
image: ubuntu:18.04
command: ["sleep"]
args: ["100000"]
resources:
limits:
cambricon.com/vmlu: "1"
cambricon.com/mlu.smlu.vmemory: "20"
cambricon.com/mlu.smlu.vcore: "10"
-
Init containers are not supported for MLU sharing.
Pods with the
cambricon.com/mlumemresource specified in an init container will not be scheduled. -
Resource constraints only apply to shared mode (
vmlu=1).The
cambricon.com/mlu.smlu.vmemoryandcambricon.com/mlu.smlu.vcoreresources are only effective whencambricon.com/vmluis set to1. Ifvmlu > 1, a full MLU device will be allocated regardless ofvmemoryandvcorevalues.