As an aside, it's shame that hardware page table walking won out over software filled TLBs, as some older computers had. I wonder what clever and wonderful hacks we might have been able to invent had we not needed to give the CPU a raw pointer to a data structure the layout of which is fixed forever.
Software table walk performance is bad on modern out of order processors because it has to finish every older instruction in flight and redirect the front end to the exception vector. This can take several hundred cycles. Hardware table walk can take <20 cycles to hit in the next level cache.