One doc tagged with "inference"

Lab 6: Run vLLM on HAMi GPU Shares

Install HAMi on a GPU cluster and schedule vLLM inference services with GPU partitioning.