so say i have a site with 3000 images, 2M pixel each. How many GPU-months it wou...

hnuser123456 · on Nov 13, 2024

That amount of compute was used for training. For inference (applying the watermarks), hopefully no more than a few seconds per image.

Llama 3 70B took 6.4M GPU hours to train, emitting 1900 tons of CO2 equivalent.

Jaxan · on Nov 13, 2024

Thanks! I was not at all aware of the scale of training! To me those are crazy amounts of gpu time and resources.

pierrefdz · on Nov 13, 2024

The amounts of gpu time in the paper are for all experiments, not just training the last model that is OSS (which is usually reported). People don't just oneshot the final model.

GaggiX · on Nov 13, 2024

The embedder is only 1.1M parameters, so it should run extremely fast.

pierrefdz · on Nov 13, 2024

Yes, although the number of parameters is not directly linked with the flops/speed of inference. What's nice about this AE architecture is that most of the compute (message embedding, and merging) is done at low resolution, same idea as behind latent diffusion models