Unfortunately it's hardly progress. MoE expert models are still large, have to b...

Unfortunately it's hardly progress. MoE expert models are still large, have to be trained in usual, linear way, this approach requires training set classification upfront, each expert model is completely independent, each has to relearn concepts, your overall model is as good as dedicated expert, scale is in low numbers ie. 8, not thousands (otherwise you'd have to run inference on beefed up cluster only, experts still have to be loaded when used) etc.