The limitations on processing letters aren’t in the math, they are in the encoding. Language is the map, and concepts are the territory. You may as well complain that someone doesn’t really understand their neighborhood if they can’t find it on a map.
It's math, but specifically an independent piece you could swap out for a different one that does much better on this problem (e.g. use characters instead of tokens) - it's just doing so would make training and inference much more expensive (read: much worse model performance for a given training/compute budget), so it's not worth the trade-off.
It's not like humans read letter by letter either, at least not past the age of 6 or such. They can, if needed, but it requires extra effort. Same is true with LLMs.
But that's really what I meant. When you say the limitation on processing is not in the math. I would say it is a mathematical limitation of processing because they had to choose a math that works on parts of words instead of letters due to the limitation of the power of the math that can be done for training and inference.
They chose to use some limiting math which prevents the LLM from being able to easily answer questions like this.
It's not a limitation of math in general. It's a limitation of the math they chose to build the LLM on which is what was going through my head when I was writing it.
The LLM only sees tokens. The limitation is in the E2E product because of the encoder chosen. Change the encoder, keep the LLM, different limitations appear.
Perhaps it’s a pedantic difference, but to someone in the field the complaint reads like saying TCP/IP is deficient because it doesn’t support encryption: technically true but missing context about the whole stack.