But on the ryzen the vram allocation can be entirely dynamically allocated. I saw a review showing excellent full GPU usage during inference with the bios vram allocation set to the minimum level, using a very large model. So it's not so simple as you describe (I used to think this was the case too).
Beyond that, seems like the 395 in practice smashes the dgx spark in inference speeds for most models. I haven't seen nvfp4 comparisons yet and would be very interested to.
That's what I'm saying, in the review video I saw they allocated as little memory as possible to the GPU in the bios, then used some kind of kernel level dynamic control.
Beyond that, seems like the 395 in practice smashes the dgx spark in inference speeds for most models. I haven't seen nvfp4 comparisons yet and would be very interested to.