Among the things that pleasantly surprised me about Z-Image is how well it understands emotions and turns them into facial expressions. It’s not perfect (it doesn’t know all of them), but it handles a wider range of emotions than I expected—maybe because there’s no censorship in the dataset or training process.
I decided to run a test with 30 different feelings to see how it performed, and I really liked the results. Here’s what came out of it. I've used 9 steps, euler/simple, 1024x1024, and the prompt was:
Portrait of a middle-aged man with a <FEELING> expression on his face.
At the bottom of the image there is black text on a white background: “<FEELING>”
visible skin texture and micro-details, pronounced pore detail, minimal light diffusion, compact camera flash aesthetic, late 2000s to early 2010s digital photo style, cool-to-neutral white balance, moderate digital noise in shadow areas, flat background separation, no cinematic grading, raw unfiltered realism, documentary snapshot look, true-to-life color but with flash-driven saturation, unsoftened texture.
Where, of course, <FEELING> was replaced by each emotion.
PS: This same test also exposed one of Z-Image’s biggest weaknesses: the lack of variation (faces, composition, etc.) when the same prompt is repeated. Aside from a couple of outliers, it almost looks like I used a LoRa to keep the same person across every render.

https://preview.redd.it/vwfw98c42a6g1.jpeg?width=1043&format=pjpg&auto=webp&s=544dc9b931591c8e1988dcb0249adc7e75512aef
mugshot lol
the mug is full
lmfao I knew Dr. Aroused has criminal ties but not like this
So this is how half of the sub looks, hmm.
LOLOLOL
Well, my "aroused" is definitely not like that, lol
https://preview.redd.it/lr7ot6pg2a6g1.png?width=275&format=png&auto=webp&s=b6a76589e1d1e2949288394a79ec3b72955d7c37
anti west bias in “menacing” lol
'Menacing' ethnicity straight up changed lol
How about the NSFW face expressions? 😉
https://preview.redd.it/px74jr1ax96g1.png?width=1079&format=png&auto=webp&s=35cd49ec8b916c1a5aa8f73a3d3f2915ec2c67f1
I am sure there is a lora for that already .. dripping off the tongue
lmao love that the distracted guy is the only one not facing the camera
Exactly! He was so distracted that he missed the click! The aroused one is also funny, he is somewhere between "this woman is nice" and the "O face" from the "Office Space" movie.
Ill split the difference between the sfw and the nsfw. Try sultry or flirty.
Shouldn't the fun guy have a cap?
What I find most surprising about this is that I keep seeing how people still think one of this model's best features is actually its weakness.
This depends on what you want to do. I know that if you give a detailed description of the composition, scene, etc, in the prompt, it will do what you ask for with remarkable precision (therefore solving the problem of the lack of variation for compositions). But the face is not that easy, I've tried random names (mostly don't have any effect), nationalities (they work, but every nationality has an almost identical face between renders), detailing the facial features (somewhat works, but not for face format, etc)... The only real solution is a LoRa, but then the LoRa bleeds to all faces in the render.
I'm absolutely LOVING the model, don't get me wrong, but this can be a feature or a weakness, it depends heavily of what you want to do with the model.
I have got great variation on the faces by prompt alone. You don't need LoRas at all. Maybe there is a limit on how much variation you can get, but so far I haven't found it. Remember that real humans are not as varied either. We are made of archetypes.
Would a bit more context help? Seeing how this model likes detailed prompts. Instead of just 'surprised' you could say surprised as he's found out his bank account is empty :D or terrified as he witnesses a giant monster ripping someone's head off. Hehe. Some people think you don't mention things that arent visible but I think it's often very helpful to provide emotional context.
love your analysis, couldn't agree more
Good thing that the LLM it uses can figure out most our spelling mistakes. "Irritatd" is up there. Although I think it is basically a higher definition version of angry.
In fact I wrote it correctly (IRRITATED) but tried twice and the Z-Image misspelled it twice (the other misspelling was way worse), so I gave up. 😂
menacing turns into a white guy. LOL
Kinda irrelevant to the whole Z image thing but I find it interesting that 100 years ago that guy wouldn't have been considered white. Whiteness is a political project. Italians / Mediterranean people were only allowed in the whiteness club when it became useful for the Anglo Saxons in the US. (I'm not Italian, not trying to get oppression points)
Turns out a menacing asian is a white man.
Gonna generate a serious + determined + blank stare and see what results its going to give me
I've tried some combinations. Most of them gave me nothing different from one of the feelings. Some of them (for example "sad smile") worked as intended.
Haha yeah, that's expected. Just joking around to see if the same facial expression will somehow generate something entirely different 😂
You forgot embarrassed.
That menancing person doesn't looks like an asian while the rest of them asian 😆
Some don't work and become "neutral". You could also try "amused", exhausted, sour, disdain, smug, etc.