New image model(GPT 5.2) vs old Image model (GPT 4o):

First image of each pair done on the new model, the second the old one. Same prompt for both.

1 points AutoModerator

Hey /u/NegativeEspathra!

If your post is a screenshot of a ChatGPT conversation, please reply to this message with the conversation link or prompt.

If your post is a DALL-E 3 image post, please reply with the prompt used to make this image.

Consider joining our public discord server! We have free bots with GPT-4 (with vision), image generators, and more!

🤖

Note: For any ChatGPT-related concerns, email support@openai.com

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

parent
125 points Popular_Lab5573

the models are not 5.2 nor 4o. they are gpt image 1.5 and gpt image 1, respectively

parent

20 points chlebseby

yep, LLM just order separate model to generate image in chat

parent root

6 points Popular_Lab5573

precisely 😊

parent root

2 points Realistic_Cancel2697

I thought these new image generation models were also transformer based, but they just output image tokens instead of text ones, meaning it could be integrated into the same model that outputs text.

parent root

0 points Popular_Lab5573

the model transforms the prompt and utilizes the text2im function to pass the prompt to the image gen model. you can check what exactly accept and output different models here

parent root

0 points FeltSteam

The entire point of this new image generation phases is that it is the LLMs themselves generating the images. Gemini 2.0 flash, 2.5 flash, 3.0 pro, gpt-4o (now GPT-5.2?) are generating the images themselves, at least to a degree (GPT-4o image gen or "gpt-image-1 might also use diffusion to help upscale the base image generation from the models). The first instance we saw of this is:

gpt-4o (search up "Hello GPT-4o" blog post on google, every time I try to link it in the post reddit just corrupts the entire text box for some reason)

it accepts as input any combination of text, audio, image, and video and generates any combination of text, audio, and image outputs

And if you scroll down you see examples of what the model itself is able to generate. This wasn’t released until like August 2025 though.

It is kind of returning to roots though, the first big image generation model OpenAI trained was imageGPT, a variant of the GPT-2 architecture trained to generate images, and DALLE-1 was a variant of the GPT-3 architecture (12 billion params) trained to generate images, then with DALLE-2 and 3 we went to diffusion but now we are back to autoregressive image generation with Gemini and (partially) gpt-images, except the image generation can now just be an interleaved modality feature with regular text or speech generation instead of there being several distinct models.

But I do think the models have a tool they trigger to help them come up with a good and context relevant prompt for themselves in addition to conversational context that is then passed to be used with the image generation,

parent root

0 points Popular_Lab5573

you clearly didn't check the resource I sent 😔

parent root

2 points FeltSteam

Plus OAI actually kept the text generation feature with GPT Image 1.5

https://preview.redd.it/rknzod3qzm7g1.png?width=1060&format=png&auto=webp&s=9e2b1ff457c9a52cc5765a59edc5ba19e46e2d94

parent root

0 points -irx

It might be something to do with multi modality, meaning that the endpoint is same but text generation and image generation is separate in the backend. This started long ago when image vision was added to models since you first need to proccess the image with another tool and then send readable data to the text model. Or OAI made some black magic and it actually works in a single input/output.

Edit: not sure what the text output is used for. Every API request response shows that text tokens were used in them but the response doesn't show any text that was generated. Maybe is some internal reasoning or to help render text in images, dunno. Theres also no documentation about it and we cant disable it.

parent root

1 points FeltSteam

Well gpt-image models are fine-tuned/post-trained variants of the LLMs themselves to be good at image gen which is why they appear as separate models in the API docs. GPT-5 and GPT-5.1 are completely seperate models even though the only distinction is post training, same thing with image gen models.

parent root

2 points FeltSteam

In the DALLE-3 era, sure. But the entire point of these image generators now is that it’s the LLMs generating the images themselves now, their are probably variations though like the Gemini-3-pro-image model probably uses the same base model Gemini 3 was built off of but they finetuned it more for better image generation (instruction following/prompt adherence, boosts in quality etc.) and in the Gemini interface Gemini probably helps refine a prompt and triggers the Gemini 3 image gen model (which is kind of itself) to generate the image.

parent root

0 points FeltSteam

gpt image 1 was GPT-4o trained to generate images, just like how it can generate voices in advanced voice mode. But the gpt image 1 does make it easily distinguishable in the API. But that was the entire advance. The “o” in GPT-4o means omnimodal, it takes in many modalities and outputs many modalities, the phase shift in image gen is that LLMs are generating a lot of the image itself now. Gemini-flash-2.0-image-gen was the first of these model, gpt-image-1 was the next (polished version of the GPT-4o image generation demod in May 2024), Gemini-flash-2.5-image-gen (Nano Banana) was Googles next iteration and the Gemini-3-image-gen (Nano Banana Pro) and now we have gpt-1.5-image which could very well be a tuned version of GPT-5.2 for imagegen, although it’s hard to know exactly which LLM is the base for this specific model.

parent root
37 points FlareonXIII

https://preview.redd.it/ysyvkbcj9m7g1.jpeg?width=246&format=pjpg&auto=webp&s=47f99959ee2c9ed598996f04e6f2f5f81f346d17

Good day, gentlemen

parent

12 points yahoo_determines

https://i.redd.it/wy2xufvmim7g1.gif

parent root
71 points chlebseby

Very generic and unsuspicious choice of prompts

parent

7 points nemzylannister

wtf is that piggy thing

parent root

2 points LycanWolfe

Damn I'm bad with sarcasm.

parent root
32 points Calm_Hedgehog8296

I still feel like nano banana is better i can clearly tell each of the gpt 1.5 images are AI. Some nano banana images I cant tell

parent

4 points WanderWut

I'd love to see the prompts used because that could be playing a factor here, but yeah I'm with you so far. Need to see more testing. I'm really curious how it handles photorealism of just candid shots, because that's an area with nano banana pro is crazy good.

parent root

2 points Intelligent-Baker448

I also need to see the prompts for reasons.

parent root

1 points WanderWut

Yeah like there's a major difference in output between "generate an image of Timothee Chalamet laying down and sleeping with King Shark in a messy apartment" and a detailed prompt. All good though I'm sure we'll get plenty more posts to see the comparisons with.

parent root

3 points NegativeEspathra

Nah I agree with you 100%. There is so much noise on GPT's pics too!

parent root

1 points SomeoneGMForMe

My experience with nano banana is that it really sucks with illustrations and drawings, so I would actually expect it to do worse on this particularly specific set of prompts...

parent root

1 points Jan0y_Cresva

I still prefer nano banana because GPT can’t get rid of the piss filter. It’s better in 1.5 for sure, but you still see it in some pictures.

parent root

0 points FeltSteam

gpt-images-2 probably coming out around January with GPT-5.(5?), that should be a much improved version of image generation even over gemini-3-pro-image-preview.

parent root
16 points Lordpresident6

parent
5 points Safe-Ad7491

Its a big improvement, but its still behind Nano Banana Pro.

parent
5 points Direct_Bluebird7482

Loving the Hannibal comic! 😄

parent
8 points RecycledAccountName

Should have tested some photorealistic imagining prompts.

parent
7 points Illfury

I see what you were trying to achieve in the last prompts...

parent
3 points Happyhaha2000

Isaac referenced on image 4!

parent
2 points Ireallydonedidit

Mama’s thang is hanging

parent
2 points spokeyess

You can tell it steals zacians design from pokemon when prompted with the sword dog thing

parent
1 points KalzK

Piss filter is gone

parent

1 points ihaveacrushonmercy

I thought it would never happen

parent root
1 points Netsuko

Image #11 cracks me up. So many eyes, yet the ones that are STILL fucked up are the actual ones. Some things never change.

parent
1 points showmethemundy

So. Basically the same. Shite...

parent
1 points cellshock7

My first thoughts after seeing DivaWaluigi was "ohhh good grief, my eyes!"...followed by a pic of someone covered in eyes. What comedic timing 😅

parent
1 points Daymanic

Give it a week before the anime and studio ghibli filters are back on every image generated

parent
1 points FischiPiSti

Sims 2 Dawrfes and Giants

Take my money!

parent
1 points JacquesAttaque

Our technology is god-like. Our brains are stone-age.

parent
1 points nuker0S

Is that a fucking

Waluginetta

parent
1 points ShadowyBathrobe51706

https://preview.redd.it/t6kuj4gj7o7g1.jpeg?width=1079&format=pjpg&auto=webp&s=61f1909f1cd1d1053a788e3a8513295eb37447bb

what's with them last two....

parent
1 points CartographerWorth

how can you use it ? do you just use chatgpt ?

parent
-1 points VemoM667

Yeah we all know what gonna happen next

parent