As in it's processing at the speed that data can be fed into the CPU. This particular use case was files coming in from the network on 10Gbps hardware but that was about the speed the AES HW ran via openssl perf tests. How many sessions and message sizes are irrelevant. Hardware was AMD EPYC 7642.
If there’s no HW that demonstrates a speed difference, then maybe the theoretical CS concerns aren’t properly modeled? Also, the approach I outlined has a strength whereby there’s no nonce to mismanage which is a big strength.