Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There was an earlier article (Sep 20, 2022) about using the Stable Diffusion VAE to perform image compression. Uses the VAE to change from pixel space to latent space, dithers the latent space down to 256 colors, then when it's time to decompress it, it de-noises that.

https://pub.towardsai.net/stable-diffusion-based-image-compr...

HN discussion: https://news.ycombinator.com/item?id=32907494



I've done a bunch of experiments on my own on the Stable Diffusion VAE.

Even when going down to 4-6 bits per latent space pixel the results are surprisingly good.

It's also interesting what happens if you ablate individual channels; ablating channel 0 results in faithful color but shitty edges, ablating channel 2 results in shitty color but good edges, etc.

The one thing it fails catastrophically on though is small text in images. The Stable Diffusion VAE is not designed to represent text faithfully. (It's possible to train a VAE that does slightly better at this, though.)


How does the type of image (Anime, vs Photo realistic, vs Painting vs etc .m) affect the compression results? Is there a noticable difference?


I haven't noticed much difference between these. They're all well-represented in the VAE training set.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: