DeepMind published a system that does sort this with a backend theorem prover a ...

DeepMind published a system that does sort this with a backend theorem prover a year ago. My point is, I don’t think transformer based text prediction systems are the right model here. I could be wrong, but it think about how formal systems work, they seem a far cry from what decoder architectures are doing.

https://www.nature.com/articles/s41586-021-04086-x