gpu-checkpointing

#gpu-checkpointing

@charles_irl: Inference isn't everything, but it does require a new stack -- not Kubernetes, not SLURM. At @modal, we dove deep to bu…

X AI KOLs Following ↗ · yesterday Cached

Modal engineers detail their approach to achieving truly serverless GPUs for AI inference, combining cloud buffers, a custom content-addressed filesystem, and CPU/GPU checkpoint/restore to scale replicas in tens of seconds instead of minutes.

0 favorites 0 likes

gpu-checkpointing

@charles_irl: Inference isn't everything, but it does require a new stack -- not Kubernetes, not SLURM. At @modal, we dove deep to bu…

Submit Feedback