Thanks, this is a good sign. I had a bad experience with Jetson Nano to realize a few years later that it will be stuck forever to Ubuntu 20.04. I don't want to repeat the experience especially with a more expensive device.
Grayskull is make before LLMs being a thing. And their plan is like Groq, to distribute the compute graph across multiple processors to get higher effective memory and throughput by pipelining. But better-ish by having RAM so you can fit models on much less cards. Grayskull doesn't have this ability. The next generation Wormhole does by having 100GbE interfaces on the cards.
Also the CPUs on Grayskull is 32bit. Memory is addressed through the bank address so it works for now. But they'll have to upgrade to 64bit soon.
Not fully, 8 bits has 256 values. It's easy to keep a look up table in the L1 cache of any CPU and constant cache of any GPU. For ASICs and FPGAs, it's a simple 256-value LUT. It's not ideal, yes, but not a deal breaker. Epically considering LLMs are memory bound. GGML dequantizes weights on-the-fly and still gets near linear scaling on GPUs.
I've wanted to play with large VLIW processors (Itanium is a disaster what we don't talk about. And there's litte support anyway). It's almost my dream. I hope this is real!