v2.9.0
Major features
- Add HAMi-core mode for Ascend devices, enabling user-space virtualization for fine-grained memory and compute sharing.
- Optimize HAMi-core performance and add the latest benchmark data for HAMi-core.
- HAMi-DRA for NVIDIA is ready for use.
- Sync Volcano vGPU Device Plugin with version 0.19 and add CDI support.
- Add HAMi skills for debugging and development workflows.
- Support module-pair allocation for Ascend 910C devices in SuperPod environments by (@ashergaga) in #1610
- Add support for Vast.ai devices by (@DSFans2014) in #1645
- Add Ascend
ResourceCoreNameandAscendxxx-coreresources to support hami-vnpu-core virtualization by (@ashergaga) and (@DSFans2014) in #1771 and #1804 - Support node filtering based on hami-vnpu-core annotations and multi-device requests with hami-vnpu-core enabled by (@ashergaga) in #1812 and #1837
Major bug fixes
- Fix initialization errors when using tensor parallelism on vLLM versions greater than 0.18.
- Fix schedulerName precedence checks by (@hoteye) in #1627
- Add nil checks to prevent leader election panics by (@haitwang-cloud) in #1603
- Fix panic on nil resource requests in scheduler scoring by (@yxxhero) in #1626
- Fix reversed binpack and spread scheduling policies for Iluvatar devices by (@qiangwei1983) in #1631
- Resolve cardinality explosion in
Device_memory_desc_of_containerby (@maishivamhoo123) in #1628 - Handle
GetMemoryInfoERROR_NOT_SUPPORTEDfor unified memory GPUs by (@jsl9208) in #1637 - Optimize nodelock scalability with exponential backoff and listers by (@maishivamhoo123) in #1663
- Fix readiness probes when replicas are greater than one by (@Shouren) in #1677
- Fix scheduler slot usage prediction and device type filtering by (@maishivamhoo123) in #1700
- Retain terminating pods in cache to prevent premature eviction by (@maishivamhoo123) in #1719
- Fix multi-container allocation when init containers are present by (@haitwang-cloud) in #1650
- Align kubelet allocation with scheduler annotations by (@xrwang8) in #1743
- Handle Linux kernel 6.17 handshake edge cases in NVIDIA health checks by (@maishivamhoo123) in #1810
- Fix MIG allocation failures in CDI mode by (@DSFans2014) in #1826
What's changed
Other changes
- Add
vGPUmonitor --metrics-bind-addressflag by (@dongjiang1989) in #1613 - Add Prometheus ServiceMonitor support in Helm charts and device plugins by (@dongjiang1989) in #1614 and #1633
- Check resource quota in webhook by (@DSFans2014) in #1605
- Add namespaceSelector and objectSelector configuration for the webhook Helm chart by (@haitwang-cloud) in #1653
- Align Prometheus metric and label names with best practices by (@MyoungHaSong) in #1644
- Optimize log verbosity and add unit tests by (@haitwang-cloud) in #1710
- Add local-deploy target for minikube and kind clusters by (@anandj91) in #1760
- Add
hami_vgpu_metrics_summarizerandk8s-debug-gpu-podskills by (@haitwang-cloud) in #1755 and #1654 - Add DeepCopy functions for
DeviceUsageand nested types by (@Shouren) in #1818 - Add
enableGetPreferredAllocationflag by (@DSFans2014) in #1824 - Add device type labels to metrics by (@xiyichan) in #1612
- Add
io.LimitReaderto scheduler routes to prevent denial-of-service risks by (@maishivamhoo123) in #1620 - Remove deprecated scheduler policy ConfigMap by (@haitwang-cloud) in #1651
- Update NVIDIA device plugin and NVIDIA container runtime modules by (@archlitchi) in #1731
- Upgrade Go to 1.26.2 and address related security issues by (@luohua13) and (@Shouren) in #1791 and #1772
- Disable host network for the device plugin by (@luohua13) in #1789
- Bump HAMi-DRA version to v0.2.0 by (@FouoF) in #1845
New contributors
- maishivamhoo123 (@maishivamhoo123)
- hoteye (@hoteye)
- jsl9208 (@jsl9208)
- ashergaga (@ashergaga)
- Atroxgod (@Atroxgod)
- MyoungHaSong (@MyoungHaSong)
- charford (@charford)
- jcustenborder (@jcustenborder)
- Nov11 (@Nov11)
- ilia-medvedev (@ilia-medvedev)
- Yonsun-w (@Yonsun-w)
- CFH2436 (@CFH2436)
- kenwoodjw (@kenwoodjw)
- anandj91 (@anandj91)
- ManishSharma1609 (@ManishSharma1609)
- maverick123123 (@maverick123123)
- almazkhalikov (@almazkhalikov)
- lin121291 (@lin121291)
- mesutoezdil (@mesutoezdil)
Committers: Contributors
- anandj91 (@anandj91)
- archlitchi (@archlitchi)
- ashergaga (@ashergaga)
- Atroxgod (@Atroxgod)
- CFH2436 (@CFH2436)
- charford (@charford)
- CoderTH (@CoderTH)
- dongjiang1989 (@dongjiang1989)
- DSFans2014 (@DSFans2014)
- FouoF (@FouoF)
- haitwang-cloud (@haitwang-cloud)
- hoteye (@hoteye)
- ilia-medvedev (@ilia-medvedev)
- jcustenborder (@jcustenborder)
- jsl9208 (@jsl9208)
- kenwoodjw (@kenwoodjw)
- lin121291 (@lin121291)
- luohua13 (@luohua13)
- maishivamhoo123 (@maishivamhoo123)
- ManishSharma1609 (@ManishSharma1609)
- maverick123123 (@maverick123123)
- mesutoezdil (@mesutoezdil)
- MyoungHaSong (@MyoungHaSong)
- Nov11 (@Nov11)
- peachest (@peachest)
- qiangwei1983 (@qiangwei1983)
- saiyam1814 (@saiyam1814)
- Shouren (@Shouren)
- wawa0210 (@wawa0210)
- xiyichan (@xiyichan)
- xrwang8 (@xrwang8)
- Yonsun-w (@Yonsun-w)
- yxxhero (@yxxhero)
Full Changelog: https://github.com/Project-HAMi/HAMi/compare/v2.8.0...v2.9.0









