The example of the cat and detective hat shows that even with the latest update,...

porphyra · 2025-04-09T18:04:02 1744221842

Chatgpt 4o's advanced image generation seems to have a low-resolution autoregressive part that generates tokens directly, and an image upscaling decoding step that turns the (perhaps 100 px wide) token-image into the actual 1024 px wide final result. The former step is able to almost nail things perfectly, but the latter step will always change things slightly. That's why it is so good at, say, generating large text but still struggles with fine text, and will always introduce subtle variations when you ask it to edit an existing image.

BriggyDwiggs42 · 2025-04-09T19:47:02 1744228022

Has anyone tried putting in a model that selects the editing region prior to the process? Training data would probably be hard, but maybe existing image recognition tech that draws rectangles would be a start.

ilynd · 2025-04-10T09:31:51 1744277511

Genuine question - how would such a model "edit" the image, besides manipulating the binary? I.e. changing pixel values programmatically