There was an earlier article (Sep 20, 2022) about using the Stable Diffusion VAE...

dheera · on March 18, 2024

I've done a bunch of experiments on my own on the Stable Diffusion VAE.

Even when going down to 4-6 bits per latent space pixel the results are surprisingly good.

It's also interesting what happens if you ablate individual channels; ablating channel 0 results in faithful color but shitty edges, ablating channel 2 results in shitty color but good edges, etc.

The one thing it fails catastrophically on though is small text in images. The Stable Diffusion VAE is not designed to represent text faithfully. (It's possible to train a VAE that does slightly better at this, though.)

3abiton · on March 18, 2024

How does the type of image (Anime, vs Photo realistic, vs Painting vs etc .m) affect the compression results? Is there a noticable difference?

dheera · on March 18, 2024

I haven't noticed much difference between these. They're all well-represented in the VAE training set.