First image of each pair done on the new model, the second the old one. Same prompt for both.

  • Hey /u/NegativeEspathra!

    If your post is a screenshot of a ChatGPT conversation, please reply to this message with the conversation link or prompt.

    If your post is a DALL-E 3 image post, please reply with the prompt used to make this image.

    Consider joining our public discord server! We have free bots with GPT-4 (with vision), image generators, and more!

    🤖

    Note: For any ChatGPT-related concerns, email support@openai.com

    I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

  • the models are not 5.2 nor 4o. they are gpt image 1.5 and gpt image 1, respectively

    yep, LLM just order separate model to generate image in chat

    I thought these new image generation models were also transformer based, but they just output image tokens instead of text ones, meaning it could be integrated into the same model that outputs text.

    the model transforms the prompt and utilizes the text2im function to pass the prompt to the image gen model. you can check what exactly accept and output different models here

    The entire point of this new image generation phases is that it is the LLMs themselves generating the images. Gemini 2.0 flash, 2.5 flash, 3.0 pro, gpt-4o (now GPT-5.2?) are generating the images themselves, at least to a degree (GPT-4o image gen or "gpt-image-1 might also use diffusion to help upscale the base image generation from the models). The first instance we saw of this is:

    gpt-4o (search up "Hello GPT-4o" blog post on google, every time I try to link it in the post reddit just corrupts the entire text box for some reason)

    it accepts as input any combination of text, audio, image, and video and generates any combination of text, audio, and image outputs

    And if you scroll down you see examples of what the model itself is able to generate. This wasn’t released until like August 2025 though.

    It is kind of returning to roots though, the first big image generation model OpenAI trained was imageGPT, a variant of the GPT-2 architecture trained to generate images, and DALLE-1 was a variant of the GPT-3 architecture (12 billion params) trained to generate images, then with DALLE-2 and 3 we went to diffusion but now we are back to autoregressive image generation with Gemini and (partially) gpt-images, except the image generation can now just be an interleaved modality feature with regular text or speech generation instead of there being several distinct models.

    But I do think the models have a tool they trigger to help them come up with a good and context relevant prompt for themselves in addition to conversational context that is then passed to be used with the image generation,

    you clearly didn't check the resource I sent 😔

    It might be something to do with multi modality, meaning that the endpoint is same but text generation and image generation is separate in the backend. This started long ago when image vision was added to models since you first need to proccess the image with another tool and then send readable data to the text model. Or OAI made some black magic and it actually works in a single input/output.

    Edit: not sure what the text output is used for. Every API request response shows that text tokens were used in them but the response doesn't show any text that was generated. Maybe is some internal reasoning or to help render text in images, dunno. Theres also no documentation about it and we cant disable it.

    Well gpt-image models are fine-tuned/post-trained variants of the LLMs themselves to be good at image gen which is why they appear as separate models in the API docs. GPT-5 and GPT-5.1 are completely seperate models even though the only distinction is post training, same thing with image gen models.

    In the DALLE-3 era, sure. But the entire point of these image generators now is that it’s the LLMs generating the images themselves now, their are probably variations though like the Gemini-3-pro-image model probably uses the same base model Gemini 3 was built off of but they finetuned it more for better image generation (instruction following/prompt adherence, boosts in quality etc.) and in the Gemini interface Gemini probably helps refine a prompt and triggers the Gemini 3 image gen model (which is kind of itself) to generate the image.

    gpt image 1 was GPT-4o trained to generate images, just like how it can generate voices in advanced voice mode. But the gpt image 1 does make it easily distinguishable in the API. But that was the entire advance. The “o” in GPT-4o means omnimodal, it takes in many modalities and outputs many modalities, the phase shift in image gen is that LLMs are generating a lot of the image itself now. Gemini-flash-2.0-image-gen was the first of these model, gpt-image-1 was the next (polished version of the GPT-4o image generation demod in May 2024), Gemini-flash-2.5-image-gen (Nano Banana) was Googles next iteration and the Gemini-3-image-gen (Nano Banana Pro) and now we have gpt-1.5-image which could very well be a tuned version of GPT-5.2 for imagegen, although it’s hard to know exactly which LLM is the base for this specific model.

  • Very generic and unsuspicious choice of prompts

    wtf is that piggy thing

    Damn I'm bad with sarcasm.

  • I still feel like nano banana is better i can clearly tell each of the gpt 1.5 images are AI. Some nano banana images I cant tell

    I'd love to see the prompts used because that could be playing a factor here, but yeah I'm with you so far. Need to see more testing. I'm really curious how it handles photorealism of just candid shots, because that's an area with nano banana pro is crazy good.

    I also need to see the prompts for reasons.

    Yeah like there's a major difference in output between "generate an image of Timothee Chalamet laying down and sleeping with King Shark in a messy apartment" and a detailed prompt. All good though I'm sure we'll get plenty more posts to see the comparisons with.

    Nah I agree with you 100%. There is so much noise on GPT's pics too!

    My experience with nano banana is that it really sucks with illustrations and drawings, so I would actually expect it to do worse on this particularly specific set of prompts...

    I still prefer nano banana because GPT can’t get rid of the piss filter. It’s better in 1.5 for sure, but you still see it in some pictures.

    gpt-images-2 probably coming out around January with GPT-5.(5?), that should be a much improved version of image generation even over gemini-3-pro-image-preview.

  • Its a big improvement, but its still behind Nano Banana Pro.

  • Loving the Hannibal comic! 😄

  • Should have tested some photorealistic imagining prompts.

  • I see what you were trying to achieve in the last prompts...

  • Isaac referenced on image 4!

  • Mama’s thang is hanging

  • You can tell it steals zacians design from pokemon when prompted with the sword dog thing

  • Piss filter is gone

    I thought it would never happen

  • Image #11 cracks me up. So many eyes, yet the ones that are STILL fucked up are the actual ones. Some things never change.

  • So. Basically the same. Shite...

  • My first thoughts after seeing DivaWaluigi was "ohhh good grief, my eyes!"...followed by a pic of someone covered in eyes. What comedic timing 😅

  • Give it a week before the anime and studio ghibli filters are back on every image generated

  • Sims 2 Dawrfes and Giants

    Take my money!

  • Our technology is god-like. Our brains are stone-age.

  • Is that a fucking

    Waluginetta

  • how can you use it ? do you just use chatgpt ?

  • Yeah we all know what gonna happen next