Only if you'll settle for less than state of the art. The best models still tend... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		fc417fc802 7 months ago \| parent \| context \| favorite \| on: Compiling LLMs into a MegaKernel: A path to low-la... Only if you'll settle for less than state of the art. The best models still tend to be some of the largest ones. Anything that overflows VRAM is going to slow down the response time drastically. "Space heater" is determined by computational horsepower rather than available RAM. How big a context window do you want? Last I checked that was very expensive in terms of RAM and having a large one was highly desirable.

otabdeveloper4 7 months ago [–]

State of the art is achieved by finetuning. Increasing parameter counts is a dead end.

Large contexts are very important but they are cheap compared in terms of RAM compared to the costs of increasing parameter count.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact