Not affiliated, but happy modal.com user, which has very fast cold starts for th...

ronyfadel · on Feb 18, 2024

The coding paradigms that Modal imposes make it very hard to develop for, in comparison to, say, Replicate or Runpod.

erikbern · on Feb 18, 2024

Founder of Modal here. We've spent a ton of time on this, including building our own distributed file system optimized for low-latency high-througput workloads. We don't use K8s or Docker and built our own custom infrastructure instead.

Cold starting containers quickly is a fascinating problems. We've gotten a long way but there's still a lot more to do. For GPU-based inference, starting containers isn't enough – you also need to initialize the model GPU quickly. We are working on a long list of things that will bring down cold start latency even further.

hanrelan · on Feb 22, 2024

Is Modal a good solution for running fine-tuned LLMs and Whisper models? If the cold-start time is low we're more than willing to modify our code to use Modal's infra. Happy to follow up via email but didn't see one in your profile.