Troubleshooting
- If you don't request vGPUs when using the device plugin with NVIDIA images all the GPUs on the machine may be exposed inside your container
- Currently, A100 MIG can be supported in only "none" and "mixed" modes.
- Tasks with the "nodeName" field cannot be scheduled at the moment; please use "nodeSelector" instead.
- Only computing tasks are currently supported; video codec processing is not supported.
- We change
device-plugin
env var name fromNodeName
toNODE_NAME
, if you use the image versionv2.3.9
, you may encounter the situation thatdevice-plugin
cannot start, there are two ways to fix it:- Manually execute
kubectl edit daemonset
to modify thedevice-plugin
env var fromNodeName
toNODE_NAME
. - Upgrade to the latest version using helm, the latest version of
device-plugin
image version isv2.3.10
, executehelm upgrade hami hami/hami -n kube-system
, it will be fixed automatically.
- Manually execute