Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
GPU Memory Snapshots: fast container cold boots (modal.com)
9 points by luiscape 4 months ago | hide | past | favorite | 1 comment


Modal eng here.

We have been using the new CUDA Checkpoint API (https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CH...) in combination with gVisor's checkpoint / restore API and our custom file system to greatly reduce container cold boot. This is particularly impactful if you need to warm-up GPUs, for example if you are using torch.compile (i.e. you entirely skip torch.compile on restore cold boot).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: