Show HN: Playing Telephone with GPT-4 and Dall-E-3

mcbrit · on Dec 20, 2023

This is funny and interesting AND GPT4 gives such a bad photo for the prompt.

Is this because GPT4 is out of a highly stylized space? I really haven't played with it. My objections include: water droplets on the mug, bad framing, B- latte art, mug looks basic (and the handle has geometry that makes you question geometry), table surface looks sus, unidentified brown splotches on the table, what's going on with the chairs, what's up with the floor, what's the small bright vertical plane that descends from halfway in the plane about, and so on. I don't know if the caption is random, but here is the caption I'm responding to:

  An exquisitely detailed photograph captures the essence of a calming coffee break. In the foreground, the star of the scene is a matte black ceramic mug with a lustrous outer finish that gently reflects the ambient light, its thick walls hinting at the mug's thermal-retentive quality. Nestled inside the rim, a creamy surface of latte art presents a delicate feathered pattern, comprised of swirling strokes with varying shades ranging from the rich dark brown of the freshly brewed coffee to the light beige of the perfectly frothed milk. Each line in the pattern is smooth and purposeful, evidence of a skilled pour, and comes to a focal point with a singular, tiny coffee bubble, an accidental yet charming centerpiece. A soft shadow wraps around the mug's rim, adding depth and dimension, while the subtle sheen on the milk's frothy canvas catches the light, drawing the eye to the textural contrasts.

  In the softly blurred background, everyday domestic tranquility is hinted at—a pristine white countertop underpinning the scene, a red-rimmed napkin holder casually placed to the left hints at a touch of color and homeliness. Behind, the clinical metallic sheen of a laptop lies closed, inviting its user to savor the moment of respite it offers. The space is intimately peaceful, a gentle invitation to pause and enjoy the craftsmanship and simple pleasure of a well-made cup of coffee.

elijahbenizzy · on Dec 20, 2023

Hey now, that's a latte I made :) But yes, agreed, B- art without question. The caption is based on the original image, which is one I snapped a few mornings ago. Feels like it only moves towards an overdramatic representation, which probably has to do with its training data? Also likely with how I prompted it -- you can see the prompt + caption + embeddings here: https://d1lf8m1wnxcl0a.cloudfront.net/latte_2_20231217/metad... (it'll download a json).

Note the prompt is meant to keep it interesting (ish) -- this is all parameterizable by the code, but my guess is if I removed the second sentence it would be a little more boring:

"Please provide a caption for this image. The caption should be obsessively descriptive"

mcbrit · on Dec 21, 2023

If GPT4 spits out wildly different captions when prompted 'the caption should be descriptive' vs 'the caption should be obsessively descriptive', ie picks up on 'obsessive', then that's pretty funny.

Also I didn't understand the game of telephone and am still synthesizing what it probably is/was via your comment so sorry about that; I did not mean to be rude.

elijahbenizzy · on Dec 21, 2023

Wasn’t interpreted that way! And yeah, part of it is stochastic, but part is just that it has crazy associations and no social cues so it kinda just does whatever.

tivert · on Dec 20, 2023

This one is kind of interesting:

It very quickly goes to nuclear cooling towers; then a weird nuclear cooling towers/Thomas Kincaid mashup; then Thomas Kincaid cabin; then Orthodox church; then Orthodox church, ice-palace edition; then finally cathedral interior.

https://image-telephone.streamlit.app/?seed_image=first_phot...

elijahbenizzy · on Dec 20, 2023

Yeah! It went through quite a few creative spots before ending at the starry cathedral, which plenty others also ended up at as well (seems like a local minima/maxima...)

temp00345 · on Dec 20, 2023

I like how the images seem to go towards magical, mystical and psychedelic with more iterations. Mushrooms just start appearing out of the blue..

elijahbenizzy · on Dec 20, 2023

Yeah, its kind of crazy sometimes. Quite possibly training data on DeviantArt?

xp84 · on Dec 21, 2023

Oh, my goodness. DALL-E really loves bombing cities. Both the Golden Gate and the "DAG Diagram" starting image turn into disturbing mushroom-cloud series.

It looks like "Sun centered over city -> Eye of Sauron -> Energy Beam of Destruction -> Mushroom Cloud" is a reliable pathway.

Oh well, at least it didn't blow up Royce [Taj Ma]Hall.

elijahbenizzy · on Dec 22, 2023

Yeah! Kind of terrifying... The only thing giving me comfort here is that these were all done in a single day, and I can't help but wonder if there's some bug or correlation. That said, even if its a bug in the model that they rolled back, it's still scary.

mkgeorge7 · on Dec 20, 2023

I am VERY curious to see how this recursion plays out, but I'm seeing the below error.

RuntimeError: This app has encountered an error. The original error message is redacted to prevent data leaks. Full error details have been recorded in the logs (if you're on Streamlit Cloud, click on 'Manage app' in the lower right of your app).

elijahbenizzy · on Dec 20, 2023

Yeah, scaling issues. Hold tight, I'm reaching out to see if I can get help. Otherwise refreshing should work (or just DDOS it more :))

joshstrange · on Dec 20, 2023

There was a previous HN discussion [0] about a similar service named Dall-E Party [1]

[0] https://news.ycombinator.com/item?id=38432486

[1] https://dalle.party/

elijahbenizzy · on Dec 20, 2023

No kidding! That's great to know, and they run it for you. Thanks! The difference is that I've run really long chains (and have a few nuances in captioning/prompting), but essentially its the same idea. Thank you for sending!

flir · on Dec 20, 2023

Just in case you haven't realised, ChatGPT's responses are influenced by the filename you hand it. If you're sticking with the default filename it gives you, you're passing hints to the next generation via a backchannel.

elijahbenizzy · on Dec 21, 2023

Good point -- in this case I'm using the URL it passes back, which was the only information it had. The original one was passed either as base64 encoded or a URL. IIRC the URL they passed back was random-feeling, but I'd have to look to check it out. Its possible that the initial URL encoded a little more, depending on how the filename works, but my guess is that the model is good enough that, say, "royce_hall.jpeg" wouldn't change it too drastically, especially not over time...

gerash · on Dec 20, 2023

This was an interesting experiment. The images tend to converge to a specific style

elijahbenizzy · on Dec 20, 2023

Yeah, and some specific images even... Part of this is likely due to my prompting (I use an adverb to prompt, E.G. "describe this image obsessively"), which likely leads to some sort of clustering...

elijahbenizzy · on Dec 20, 2023

Hey folks! Hitting some scaling issues. It should work if you refresh enough (and also probably make it worse :)).

Hoping to get some help from Streamlit (see if they'll be nice enough to press the magic scale button), but in the meanwhile I've just made the source data public. Not as pretty of a viz, but feel free to download/play around with it! Has prompts, embeddings, etc. In the process of uploading -- not a small dataset. 11gb, currently uploading...

https://drive.google.com/drive/u/0/folders/1pZBRHxvygHPAjACS...

elijahbenizzy · on Dec 20, 2023

Looks to be back up and running faster! Possibly cause this is no longer on the front page. Also I have confirmation that someone at snowflake is looking into it :) Enjoy the dataset -- still uploading...

webmaven · on Dec 20, 2023

I can't tell what this is. Other than a couple of incidental UI elements that are part of the hosting platform, I just see a grey placeholder that pulses forever...

elijahbenizzy · on Dec 20, 2023

Ughh that’s streamlit/community edition. Refresh and it’ll work — I’ll see if I can quickly spin up a deployment. Otherwise I’ll post static resources :)

webmaven · on Dec 20, 2023

Is there an option to kick off a new sequence by providing an image?

elijahbenizzy · on Dec 20, 2023

Unfortunately nope, see the description above. Costs about $15/sequence of 150, but if you have one you'd like to see I'm happy to kick it off...

That said, all the instructions for getting started and running your own are there (including a jupyter notebook!). If you have some data you generate I'll happily add it to the platform.