The hardware is primarily standard Nvidia GPUs (A100s, H100s), but the scale of ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		airspresso on Nov 8, 2023 \| parent \| context \| favorite \| on: Major outages across ChatGPT and API The hardware is primarily standard Nvidia GPUs (A100s, H100s), but the scale of the infrastructure is on another level entirely. These models currently need clusters of GPU-powered servers to make predictions fast enough. Which explains why OpenAI partnered with Microsoft and got billions in funding to spend on compute. You can run (much) smaller LLM models on consumer-grade GPUs though. A single Nvidia GPU with 8 GB RAM is enough to get started with models like Zephyr, Mistral or Llama2 in their smallest versions (7B parameters). But it will be both slower and lower quality than anything OpenAI currently offers.

qeternity on Nov 9, 2023 [–]

> But it will be both slower and lower quality than anything OpenAI currently offers.

It will definitely not be slower. Local inference with a 7b model on a 3090/4090 will outpace 3.5-turbo and smoke 4-turbo.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact