Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Not affiliated, but happy modal.com user, which has very fast cold starts for the few demos i run with them.


The coding paradigms that Modal imposes make it very hard to develop for, in comparison to, say, Replicate or Runpod.


Founder of Modal here. We've spent a ton of time on this, including building our own distributed file system optimized for low-latency high-througput workloads. We don't use K8s or Docker and built our own custom infrastructure instead.

Cold starting containers quickly is a fascinating problems. We've gotten a long way but there's still a lot more to do. For GPU-based inference, starting containers isn't enough – you also need to initialize the model GPU quickly. We are working on a long list of things that will bring down cold start latency even further.


Is Modal a good solution for running fine-tuned LLMs and Whisper models? If the cold-start time is low we're more than willing to modify our code to use Modal's infra. Happy to follow up via email but didn't see one in your profile.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: