Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The hardware is primarily standard Nvidia GPUs (A100s, H100s), but the scale of the infrastructure is on another level entirely. These models currently need clusters of GPU-powered servers to make predictions fast enough. Which explains why OpenAI partnered with Microsoft and got billions in funding to spend on compute.

You can run (much) smaller LLM models on consumer-grade GPUs though. A single Nvidia GPU with 8 GB RAM is enough to get started with models like Zephyr, Mistral or Llama2 in their smallest versions (7B parameters). But it will be both slower and lower quality than anything OpenAI currently offers.



> But it will be both slower and lower quality than anything OpenAI currently offers.

It will definitely not be slower. Local inference with a 7b model on a 3090/4090 will outpace 3.5-turbo and smoke 4-turbo.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: