Version: Next

Enable cambricon MLU sharing

Introduction

We now support cambricon.com/mlu by implementing most device-sharing features as nvidia-GPU, including:

MLU sharing: Each task can allocate a portion of MLU instead of a whole MLU card, thus MLU can be shared among multiple tasks.

Device Memory Control: MLUs can be allocated with certain device memory size and guarantee it that it does not exceed the boundary.

Device Core Control: MLUs can be allocated with certain compute cores and guarantee it that it does not exceed the boundary.

MLU Type Specification: You can specify which type of MLU to use or to avoid for a certain task, by setting "cambricon.com/use-mlutype" or "cambricon.com/nouse-mlutype" annotations.

Prerequisites

neuware-mlu370-driver > 5.10
cntoolkit > 2.5.3

Install the chart using helm, See 'enabling vGPU support in kubernetes' section here
Activate the smlu mode for each MLUs on that node

cnmon set -c 0 -smlu on
cnmon set -c 1 -smlu on
# ...

Get cambricon-device-plugin from your device provider and specify the following parameters during deployment:

mode=dynamic-smlu, min-dsmlu-unit=256

These two parameters represent enabling the dynamic smlu function and setting the minimum allocable memory unit to 256 MB, respectively. You can refer to the document from device provider for more details

Deploy the cambricon-device-plugin you just specified

kubectl apply -f cambricon-device-plugin-daemonset.yaml

Running MLU jobs

Cambricon MLUs can now be requested by a container using the cambricon.com/vmlu ,cambricon.com/mlu.smlu.vmemory and cambricon.com/mlu.smlu.vcore resource type:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: binpack-1
  labels:
    app: binpack-1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: binpack-1
  template:
    metadata:
      labels:
        app: binpack-1
    spec:
      containers:
        - name: c-1
          image: ubuntu:18.04
          command: ["sleep"]
          args: ["100000"]
          resources:
            limits:
              cambricon.com/vmlu: "1"
              cambricon.com/mlu.smlu.vmemory: "20"
              cambricon.com/mlu.smlu.vcore: "10"

Notes

Mlu-sharing in init container is not supported, pods with "combricon.com/mlumem" in init container will never be scheduled.
cambricon.com/mlu.smlu.vmemory, cambricon.com/mlu.smlu.vcore only work when cambricon.com/vmlu=1, otherwise, whole MLUs are allocated when cambricon.com/vmlu>1 regardless of cambricon.com/mlu.smlu.vmemory and cambricon.com/mlu.smlu.vcore.

Introduction​

Prerequisites​

Enabling MLU-sharing Support​

Running MLU jobs​

Notes​

Introduction

Prerequisites

Enabling MLU-sharing Support

Running MLU jobs

Notes