The thing about ML models is the penalty is a function of parameters and precision. It sounds like the researchers cranked them to max to try to get the very best compression. Maybe later they will take that same model, and flatten layers and quantize the weights to can get it running 100x faster and see how well it still compresses. I feel like neural networks have a lot of potential in compression. Their whole job is finding patterns.