Anything that overflows VRAM is going to slow down the response time drastically.
"Space heater" is determined by computational horsepower rather than available RAM.
How big a context window do you want? Last I checked that was very expensive in terms of RAM and having a large one was highly desirable.
Large contexts are very important but they are cheap compared in terms of RAM compared to the costs of increasing parameter count.
Anything that overflows VRAM is going to slow down the response time drastically.
"Space heater" is determined by computational horsepower rather than available RAM.
How big a context window do you want? Last I checked that was very expensive in terms of RAM and having a large one was highly desirable.