Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Playing Telephone with GPT-4 and Dall-E-3 (streamlit.io)
34 points by elijahbenizzy on Dec 20, 2023 | hide | past | favorite | 24 comments
Hello Hacker News!

I’ve been having some fun recently and I wanted to share with you. My idea was to feed ChatGPT an image and ask for a caption, feed that back to DallE, then call this in a loop, and observe how the image that was generated changed over time. Similar to playing the game “telephone”. The results were really intriguing, so I built something that you can play with.

Note this is best on a desktop (streamlit is optimized for a large screen), but if you’re on mobile you’ll want to expand the sidebar to start — it’s the carrot on top.

My initial intention was for you to be able to submit seed images on the site and watch them progress, but that was too expensive and slow. So instead I built an explorer. While this also illustrates a framework I’m developing that helps structure code, it should be interesting regardless of whether or not you’re looking for new tools.

I’m happy to add more images — feel free to suggest starting ones you’d like to see!



This is funny and interesting AND GPT4 gives such a bad photo for the prompt.

Is this because GPT4 is out of a highly stylized space? I really haven't played with it. My objections include: water droplets on the mug, bad framing, B- latte art, mug looks basic (and the handle has geometry that makes you question geometry), table surface looks sus, unidentified brown splotches on the table, what's going on with the chairs, what's up with the floor, what's the small bright vertical plane that descends from halfway in the plane about, and so on. I don't know if the caption is random, but here is the caption I'm responding to:

  An exquisitely detailed photograph captures the essence of a calming coffee break. In the foreground, the star of the scene is a matte black ceramic mug with a lustrous outer finish that gently reflects the ambient light, its thick walls hinting at the mug's thermal-retentive quality. Nestled inside the rim, a creamy surface of latte art presents a delicate feathered pattern, comprised of swirling strokes with varying shades ranging from the rich dark brown of the freshly brewed coffee to the light beige of the perfectly frothed milk. Each line in the pattern is smooth and purposeful, evidence of a skilled pour, and comes to a focal point with a singular, tiny coffee bubble, an accidental yet charming centerpiece. A soft shadow wraps around the mug's rim, adding depth and dimension, while the subtle sheen on the milk's frothy canvas catches the light, drawing the eye to the textural contrasts.

  In the softly blurred background, everyday domestic tranquility is hinted at—a pristine white countertop underpinning the scene, a red-rimmed napkin holder casually placed to the left hints at a touch of color and homeliness. Behind, the clinical metallic sheen of a laptop lies closed, inviting its user to savor the moment of respite it offers. The space is intimately peaceful, a gentle invitation to pause and enjoy the craftsmanship and simple pleasure of a well-made cup of coffee.


Hey now, that's a latte I made :) But yes, agreed, B- art without question. The caption is based on the original image, which is one I snapped a few mornings ago. Feels like it only moves towards an overdramatic representation, which probably has to do with its training data? Also likely with how I prompted it -- you can see the prompt + caption + embeddings here: https://d1lf8m1wnxcl0a.cloudfront.net/latte_2_20231217/metad... (it'll download a json).

Note the prompt is meant to keep it interesting (ish) -- this is all parameterizable by the code, but my guess is if I removed the second sentence it would be a little more boring:

"Please provide a caption for this image. The caption should be obsessively descriptive"


If GPT4 spits out wildly different captions when prompted 'the caption should be descriptive' vs 'the caption should be obsessively descriptive', ie picks up on 'obsessive', then that's pretty funny.

Also I didn't understand the game of telephone and am still synthesizing what it probably is/was via your comment so sorry about that; I did not mean to be rude.


Wasn’t interpreted that way! And yeah, part of it is stochastic, but part is just that it has crazy associations and no social cues so it kinda just does whatever.


This one is kind of interesting:

It very quickly goes to nuclear cooling towers; then a weird nuclear cooling towers/Thomas Kincaid mashup; then Thomas Kincaid cabin; then Orthodox church; then Orthodox church, ice-palace edition; then finally cathedral interior.

https://image-telephone.streamlit.app/?seed_image=first_phot...


Yeah! It went through quite a few creative spots before ending at the starry cathedral, which plenty others also ended up at as well (seems like a local minima/maxima...)


I like how the images seem to go towards magical, mystical and psychedelic with more iterations. Mushrooms just start appearing out of the blue..


Yeah, its kind of crazy sometimes. Quite possibly training data on DeviantArt?


Oh, my goodness. DALL-E really loves bombing cities. Both the Golden Gate and the "DAG Diagram" starting image turn into disturbing mushroom-cloud series.

It looks like "Sun centered over city -> Eye of Sauron -> Energy Beam of Destruction -> Mushroom Cloud" is a reliable pathway.

Oh well, at least it didn't blow up Royce [Taj Ma]Hall.


Yeah! Kind of terrifying... The only thing giving me comfort here is that these were all done in a single day, and I can't help but wonder if there's some bug or correlation. That said, even if its a bug in the model that they rolled back, it's still scary.


I am VERY curious to see how this recursion plays out, but I'm seeing the below error.

RuntimeError: This app has encountered an error. The original error message is redacted to prevent data leaks. Full error details have been recorded in the logs (if you're on Streamlit Cloud, click on 'Manage app' in the lower right of your app).


Yeah, scaling issues. Hold tight, I'm reaching out to see if I can get help. Otherwise refreshing should work (or just DDOS it more :))


There was a previous HN discussion [0] about a similar service named Dall-E Party [1]

[0] https://news.ycombinator.com/item?id=38432486

[1] https://dalle.party/


No kidding! That's great to know, and they run it for you. Thanks! The difference is that I've run really long chains (and have a few nuances in captioning/prompting), but essentially its the same idea. Thank you for sending!


Just in case you haven't realised, ChatGPT's responses are influenced by the filename you hand it. If you're sticking with the default filename it gives you, you're passing hints to the next generation via a backchannel.


Good point -- in this case I'm using the URL it passes back, which was the only information it had. The original one was passed either as base64 encoded or a URL. IIRC the URL they passed back was random-feeling, but I'd have to look to check it out. Its possible that the initial URL encoded a little more, depending on how the filename works, but my guess is that the model is good enough that, say, "royce_hall.jpeg" wouldn't change it too drastically, especially not over time...


This was an interesting experiment. The images tend to converge to a specific style


Yeah, and some specific images even... Part of this is likely due to my prompting (I use an adverb to prompt, E.G. "describe this image obsessively"), which likely leads to some sort of clustering...


Hey folks! Hitting some scaling issues. It should work if you refresh enough (and also probably make it worse :)).

Hoping to get some help from Streamlit (see if they'll be nice enough to press the magic scale button), but in the meanwhile I've just made the source data public. Not as pretty of a viz, but feel free to download/play around with it! Has prompts, embeddings, etc. In the process of uploading -- not a small dataset. 11gb, currently uploading...

https://drive.google.com/drive/u/0/folders/1pZBRHxvygHPAjACS...


Looks to be back up and running faster! Possibly cause this is no longer on the front page. Also I have confirmation that someone at snowflake is looking into it :) Enjoy the dataset -- still uploading...


I can't tell what this is. Other than a couple of incidental UI elements that are part of the hosting platform, I just see a grey placeholder that pulses forever...


Ughh that’s streamlit/community edition. Refresh and it’ll work — I’ll see if I can quickly spin up a deployment. Otherwise I’ll post static resources :)


Is there an option to kick off a new sequence by providing an image?


Unfortunately nope, see the description above. Costs about $15/sequence of 150, but if you have one you'd like to see I'm happy to kick it off...

That said, all the instructions for getting started and running your own are there (including a jupyter notebook!). If you have some data you generate I'll happily add it to the platform.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: