Hmm, but supposing the accelerated NVIDIA specific inference data types were available for Triton, then you would just use that? Why not contribute to Triton, they accept PRs? Like so what if you do free product ecosystem development for NVIDIA and giant corporations by contributing to Triton?
I mean, you can look at the most recent commit and see that the infrastructure is being built out for this right now (of course OpenAI doesn't care about sm_120, though).
By all means, the guy could have written the triton fixes he needs and NOT sent it up stream. It would still make more sense to do that! He’s obviously an expert, and I was sincerely wondering, why bother with the C++ stuff if he already knew the better way, and also has the chops to implement it?