see this blog for a reference on Blackwell:
https://hazyresearch.stanford.edu/blog/2025-03-15-tk-blackwe...
reply
If not, what's fundamentally difficult about doing 32 vs 256 here?