Hello Hacker News! We're Yangqing, Xiang and JJ from lepton.ai. We are building a platform to run any AI models as easy as writing local code, and to get your favorite models in minutes. It's like container for AI, but without the hassle of actually building a docker image.
We built and contributed to some of the world's most popular AI software - PyTorch 1.0, ONNX, Caffe, etcd, Kubernetes, etc. We also managed hundreds of thousands of computers in our previous jobs. And we found that the AI software stack is usually unnecessarily complex - and we want to change that.
Imagine if you are a developer who sees a good model on github, or HuggingFace. To make it a production ready service, the current solution usually requires you to build a docker image. But think about it - I have a few python code and a few python dependencies. That sounds like a huge overhead, right?
lepton.ai is really a pythonic way to free you from such difficulties. You write a simple python scaffold around your PyTorch / TensorFlow code, and lepton launches it as a full-fledged service callable via python, javascript, or any language that understands OpenAPI. We use containers under the hood, but you don't need to worry about all the infrastructure nuts and bolts.
One of the biggest challenge in AI is that it's really "all-stack": in addition to a plethora of models, AI applications usually involves GPUs, cloud infra, web services, DevOps, and SysOps. But we want you to focus on your job - and we take care of the rest "boring but essential" work.
We're really excited we get to show this to you all! Please let us know your thoughts and questions in the comments.
pip install -U leptonai
lep photon run -n sdxl -m hf:stabilityai/stable-diffusion-xl-base-1.0 --local
And you have a local OpenAPI server that runs it! Go to http://0.0.0.0:8080/docs, or use your favorite OpenAPI client.
We've been building AI API services using such tools ourselves. The easiest way to try out Lepton is to head to https://lepton.ai/playground and use our API service for popular models: Stable Diffusion, LLaMA, WhisperX, and other interesting showcases
We are proud of our performance. For example, we have probably the fastest LLaMA 7B and 70B model APIs, and it costs $0.8 to run 1 million tokens inference - we believe it's the most affordable one in the market. In addition, during the open beta phase, calling these services is free when you sign up for the Lepton AI platform.
Under the hood, we wrote a platform to allow you to run things easily on the cloud with ease. For example, if you find Pygmalion to be a great conversation model but you don't have a GPU, use lepton's Remote() capability to launch a service:
from leptonai import Remote
pygmalion = Remote("hf:PygmalionAI/pygmalion-2-7b", resource_shape="gpu.a10")
Wait a few minutes for the model to be downloaded and run, and you can now use it as if it were a standard python function:
print(pygmalion.run(inputs="Once upon a time", max_new_tokens=128))
If you are interested in the operational details, you can find fine-grained controls at https://dashboard.lepton.ai/ as a fully managed platform - we also support BYOC (bring your own compute) if you are an enterprise needing more autonomy over infrastructure.