Hmm, I'm not sure that's quite right. ARMv8 supports per TTBR translation granules [1] and so you can have 4K and 16K user processes coexisting under an arbitrary page size kernel by just context switching TCR.TG0 at the same time as TTBR0. There is no such thing as a global granule size.
Well, if you want to run headfirst into the magical land of hardware errata, I guess you could go around creating heterogeneous, switched mappings.
I doubt the TCRs were ever intended to support rapid runtime switching or that the TLBs were ever intended to support heterogeneous entrys even with ASID tagging.
You've listed things that could go wrong without citing specific errata. Should we just assume that hardware doesn't work as documented? It seems premature to deem the feature buggy without having tried it.
[1]: https://arm.jonpalmisc.com/2023_09_sysreg/AArch64-tcr_el2#fi...