I too read the book and was just searching for anyone mentioning it.
Why can such an idea be called new, when someone else already described it decades ago?
I'm running a totally usable 13b parameters llama model in my macbook air, which seems to give outputs equivalent to what I was getting from GPT3 in June 2022.
How much more hardware would it really be needed for GPT-4 level outputs natively? Perhaps software optimizations alone could do most of the trick.