版本:v2.8.0
启用昆仑芯 VXPU
介绍
该组件支持复用昆仑芯 XPU 设备(P800-OAM),并提供以下类似 vGPU 的复用功能:
XPU 共享:每个任务只能占用设备的一部分,允许多个任务共享单个 XPU
内存分配限制:您现在可以使用内存值(例如 24576M)来分配 XPU,组件确保任务不会超过分配的显存限制
设备 UUID 选择:您可以通过注解指定使用或排除特定的 XPU 设备
前置条件
- driver version >= 5.0.21.16
- xpu-container-toolkit >= xpu_container_1.0.2-1
- XPU device type: P800-OAM
启用 XPU 共享支持
- 部署 [vxpu-device-plugin]
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: vxpu-device-plugin
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "update", "watch", "patch"]
- apiGroups: [""]
resources: ["nodes"]
verbs: ["get", "list", "watch", "update", "patch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: vxpu-device-plugin
subjects:
- kind: ServiceAccount
name: vxpu-device-plugin
namespace: kube-system
roleRef:
kind: ClusterRole
name: vxpu-device-plugin
apiGroup: rbac.authorization.k8s.io
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: vxpu-device-plugin
namespace: kube-system
labels:
app.kubernetes.io/component: vxpu-device-plugin
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: vxpu-device-plugin
namespace: kube-system
labels:
app.kubernetes.io/component: vxpu-device-plugin
spec:
selector:
matchLabels:
app.kubernetes.io/component: vxpu-device-plugin
template:
metadata:
labels:
app.kubernetes.io/component: vxpu-device-plugin
hami.io/webhook: ignore
spec:
priorityClassName: "system-node-critical"
serviceAccountName: vxpu-device-plugin
containers:
- image: projecthami/vxpu-device-plugin:v1.0.0
name: device-plugin
resources:
requests:
memory: 500Mi
cpu: 500m
limits:
memory: 500Mi
cpu: 500m
command:
- xpu-device-plugin
- --memory-unit=MiB
- --resource-name=kunlunxin.com/vxpu
- -logtostderr
securityContext:
privileged: true
capabilities:
add: [ "ALL" ]
volumeMounts:
- name: device-plugin
mountPath: /var/lib/kubelet/device-plugins
- name: xre
mountPath: /usr/local/xpu
- name: dev
mountPath: /dev
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: KUBECONFIG
value: /etc/kubernetes/kubelet.conf
volumes:
- name: device-plugin
hostPath:
path: /var/lib/kubelet/device-plugins
- name: xre
hostPath:
path: /usr/local/xpu
- name: dev
hostPath:
path: /dev
nodeSelector:
xpu: "on"
备注
默认资源名称如下:
kunlunxin.com/vxpu用于 VXPU 计数kunlunxin.com/vxpu-memory用于显存分配
您可以使用上述参数自定义这些名称。