Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Model API Performance
1 point by hpcaitech 9 days ago | hide | past | favorite | discuss
We’ve been benchmarking a few models on our API platform and got some interesting performance numbers: - MiniMax M2.5 → 0.118s time-to-first-token, 103 tokens/sec - GLM 5.1 → 120 tokens/sec throughput - Kimi K2.5 → 0.643s TTFT, 69 tokens/sec - All models → ~99.9% request success rate The latency difference is especially noticeable, ~0.1s TTFT feels almost instant in interactive apps. Let me know how you're evaluating LLM APIs. Are you optimizing more for latency, throughput, or cost?
 help



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: