Troubleshooting
- If you don't request vGPUs when using the device plugin with NVIDIA images all the GPUs on the machine may be exposed inside your container
- Currently, A100 MIG can be supported in only "none" and "mixed" modes.
- Tasks with the "nodeName" field cannot be scheduled at the moment; please use "nodeSelector" instead.
- Only computing tasks are currently supported; video codec processing is not supported.
- We change
device-pluginenv var name fromNodeNametoNODE_NAME, if you use the image versionv2.3.9, you may encounter the situation thatdevice-plugincannot start, there are two ways to fix it:- Manually execute
kubectl edit daemonsetto modify thedevice-pluginenv var fromNodeNametoNODE_NAME. - Upgrade to the latest version using helm, the latest version of
device-pluginimage version isv2.3.10, executehelm upgrade hami hami/hami -n kube-system, it will be fixed automatically.
- Manually execute