I worked on the Stable Diffusion and GPT-J integrations on NLP Cloud (https://nlpcloud.com/). Both can be used in FP16 without any noticeable quality drop (in my opinion).
Stable diffusion requires 7GB of VRAM on a Tesla T4 GPU.
GPT-J requires 12GB of VRAM (but if you really try to use the 2048 tokens context, the VRAM will go up and reach something like 20GB of VRAM).