The hardware is primarily standard Nvidia GPUs (A100s, H100s), but the scale of the infrastructure is on another level entirely. These models currently need clusters of GPU-powered servers to make predictions fast enough. Which explains why OpenAI partnered with Microsoft and got billions in funding to spend on compute.
You can run (much) smaller LLM models on consumer-grade GPUs though. A single Nvidia GPU with 8 GB RAM is enough to get started with models like Zephyr, Mistral or Llama2 in their smallest versions (7B parameters). But it will be both slower and lower quality than anything OpenAI currently offers.
You can run (much) smaller LLM models on consumer-grade GPUs though. A single Nvidia GPU with 8 GB RAM is enough to get started with models like Zephyr, Mistral or Llama2 in their smallest versions (7B parameters). But it will be both slower and lower quality than anything OpenAI currently offers.