Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Layman understanding:

Because as a function of hardware and electricity costs, a “cloud” GPU will be many times more efficient per output token. You aren’t loading/offloading models and don’t have any parts of the GPU waiting for input. Everything is fully saturated always.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: