Full res comparisons and images with embedded workflows available here.
I had multiple people insist to me over the last few hours that CFG and negative prompts do not work with Z-Image Turbo.
Based on my own cursory experience to the contrary, I decided to investigate this further, and I feel I can fairly definitively say that CFG and and negative prompting absolutely have an impact (and a potentially useful one) on Z-Image turbo outputs.
Granted: you really have to up the steps for high guidance not to totally fry the image; some scheduler/sampler combos work better with higher CFG than others; and Z-image negative prompting works less well/reliably than it did for SDXL.
Nevertheless, it does seem to work to an extent.
Thats a rough 38...This guy is at least 48 yrs old tho.
Don't smoke, kids
Same way all asians look young to us, we must look really old to them
It’s in the skincare and SPF.
In the States, skincare products tend to be viewed as an older woman thing, whereas in many Asian countries (especially Korea), it’s viewed as a standard step of personal hygiene.
Also doesn’t help that the US has effectively banned any new SPF products so we’re about 40 years behind the curve unless you import it
Yeah dude I’m like mid 30s this dude looks old AF to me I had to double check the mirror
Yes, the main problem is hair too many white hair.
Some peoples hair starts going grey at 18 others at 50 , it is totally individual.
I've been nearly as grey as him since 25 mate, his looks particularly "bad" for his age because his non grey hair colour is light brown
He dyes it gray.
no. this looks about right. I know plenty of white people around this age to 45 and they're old like this. they get mad all the time when i guess their age around 50. lol. i'm in my mid 30s and people keep guessing my around around mid to late 20s.
It's still better to use NAG on distilled models though.
https://www.reddit.com/r/StableDiffusion/comments/1pbrbrt/nag_normalized_attention_guidance_works_on_zimage/
https://preview.redd.it/bc84vrwh966g1.png?width=3072&format=png&auto=webp&s=8cb57c7beafc6b9899e2a781c6df947a12b017fc
Try removing items by putting it in negative, just like OP did, just to prove NAG has the same effect.
For me. usually getting the CFG to 1.2 is enough to preserve style and allow negs to work.
In my tests, something I found is that the more negs you add the higher you need to take your CFG. Based on my (puny) understanding of the multidimensional latent space, this is not surprising.
It really drops in speed when you go past CFG 1 tho.
It's good at around 1.4-7 cfg, actually improves the images and prompt adherence a decent bit too. Who decided cfg didn't work other than people who didn't actually try it.
Also any robust lora that isn't a single concept will remove some of the distillation requiring more steps and using cfg. So if you use a high end lora you might have to do these things anyways.
CFG can work. But is on average and usually harmful to the distilled model.
You're bruteforcing it (while also making render time higher) to go against its training.
A distilled model mimics the teacher model CFG, basically mimicking the scale taught by the base/teacher model. It allows it to get the guidance down faster, with the tradeoff of little variation/versatility.
In other words, CFG is already "baked in" the model, making it "useless" to toggle.
By using it, you're pretty much losing the benefits of having a distilled model in the first place while arguably not gaining much.
Do you mean that e.g. negatives are baked in? Like a distilled model would have difficulty producing 6 or 4 fingers, because unwanted elements were kinda baked in as negatives?
What about nodes such as skimmed CFG?
I mean, it's clearly not ideal, especially compared to the way it works with something like sdxl.
Nevertheless, it does work in a pinch and, somewhat interestingly, does seem to help create a smidge more output diversity.
This node generates great image variance with Z-Image and is tuneable: https://github.com/ChangeTheConstants/SeedVarianceEnhancer
Of course it will create diversity. The whole point of a distilled model is to ramp up speed by killing off the CFG interference.
Please look up what's CFG and how distilled models work. You'll understand why people are telling you "it doesn't work"
SDXL base (and most models used by the community) isn't distilled so yes it is made in mind to have CFG used.
In the case of Z image turbo, it being distilled, you're fighting a losing battle by enabling the CFG. Once the training backed the base model CFG into the distilled model weights, it's actually quite detrimental (speed and quality wise) to use it back again.
Sure if you don't care about either of those, and absolutely want to get rid of a random detail, go for it.
Tired of these distilled models purists popping up everywhere where cfg>1 is mentioned and being like, "Uhhhh, ACTUALLY, you are not supposed to do it🤓." Yes, I know, and it doesn't matter if the image is better.
I got downvoted for saying negative prompts work fine in ZIT when it first came out even though I posted examples. Because "it's distilled, so it's not possible" decided the scientists on this sub.
I mean a large group of people on this sub seem to think previous prompts will influence later prompts and there's something more than just math happening in the models. 🤷
That can sometimes happen, but I think it has something to do with cache-ing in some WebUIs.
With the former, that makes sense if they come from using ChatGPT because it absolutely does. It doesn’t here, but I can see the confusion.
The other part… ugh people who try to personify AI are so irritating
Wouldnt say that "fine", as it often ignores them and gets polluted by previous generations. But they definitely kinda of work lol
The resetksampler us quite useful with the model
Lol, yes it's a bit like they are saying birds can't fly while standing at a beach watching them in the sky.
I don't think they're wrong with the technical aspects, but from the images we can clearly see it has an effect. Unless OP is faking it, you can remove stuff by putting some words in the negative.
Right or wrong, I see birds fly, and therefore I believe birds can fly. If I saw a flying car I would believe that too (after some investigating).
that's the problem with most people. they don't try it for themselves. literally, the first couple days zimage came out they already stated that negatives don't work, but i noticed one can go above the 1 CFG. so i tried it and it worked. no one wanted to listen to me, so there's that. lol
Nobody said negatives don't work. What we are saying is, if you turn CFG above 1, it will burn almost instantly.
It takes double the time, but it doesn't burn in my case. Actually handles the very greyish images for me. I use low CFG values like 1.5-2.5.
It also takes double the time to generate if you include the negative with cfg > 1.
ZIT's already thrice as fast as Flux on my machine, so twice as slow is still faster.
My examples above prove that they do not necessarily burn almost instantly, especially if you change other settings to compensate.
They clearly work, and increasing the CFG scale along with using more steps can significantly improve the quality of the final image. Combining LoRAs also works very effectively, even applying negative strength to LoRAs, though it feels like we have to rediscover the same techniques over and over again.
Tell that to the people in the other post of mine that keep insisting I was doing generations "wrong" 😜
If by "wrong" you mean "out of spec" then yes. The problem was that YOU WERE DOING COMPARISONS while using parameters outside those indicated by the model creators.
you may try using scheduled cfg node from kjnodes to avoid overbaked image (and faster than cfg>1), or NAG is another option.
This is very interesting. In your conclusion, do you think 2.5 is the lower limit for reflecting negative prompts?
Great question! I do not think it's the lower limit. Based on a variety of tests I think that 1.1 is (as you might expect) the ultimate lower limit. However the more negatives you want to include and the more closely associated the thing you want to remove is associated with the subject of your image, the higher you will need to crank up CFG.
And at some point, though, negative prompting will not work. For example Z-image believes very strongly that dogs should have collars at all times. So if you try to negative prompt away the collar it is very difficult, even with high CFG.
I haven't use CFG lower than 2 since ever. It increases the contrast, which is something I like.
The use of negatives to remove objects in the scene sounds very useful.
Anytime a “distill brigade” members tells you doing it wrong by going past one, ask them since when has any creative tool had only one way to use them. You don’t criticise a painter for using a particular brush stroke telling them their faces will be less accurate, because those outside the creative process for a given piece are not privy to the creators intention and should to be honest stfu. As long as people know what’s the “defaults” are , let them explore the edges where creatively and not conformity is found.
Ah but you see I was saying nice things about Flux 2 and pointing out that there are at least some subjects where it has better model knowledge than Z-image, so naturally it must be because I was simply using the wrong generation settings or prompts that Z-image doesn't know Jabba the Hutt or what a hood hair dryer look like. 😛
I used automatic CFG warp drive and CFG norm, then I could raise CFG without burning and have negative prompts. Unfortunately it slowed down the gens way too much for my daily use.
Try just applying it to the first 2 steps.
Heh.. that's a good idea.
I'm using CFG most of the time tbh.
Are you detailing 'Positive' and 'Negative' in the same or separate nodes?
Separate. You can follow the link to download the pngs with embedded workflows
Great. Thanks.
Lmao those are not 38 year old men... those are like 54 year old
I mean, I'm told that Z-image can do no wrong, so I guess 38 is the new 54.
This guy again. We get it, you love flux, go marry it.
Nobody said negatives don't work. What we are saying is, if you turn CFG above 1, it will burn almost instantly. So don't use it! The negative prompt should not be used because of this.
Examples above are not burnt
"The champagne is buhrned..."
Yeah, but its not true at all. CFG around 2 doesn't usually result in burned images, with or without negative prompt; I’ve seen workflows that split generation into multiple phases and use CFG up to 4 for parts of the process and do very well.
The Z-Image-Turbo paper says the model uses CFG
That's a reference to Z-Image Base (“our standard SFT model") that uses 100 NFEs for generation in their preferred configuration (50 steps, since you double NFEs per step with CFG); Z-Image Turbo they state uses 9 NFEs (9 steps without CFG), but you can obviously set more steps and use CFG, and CFG around 2 does seem to have benefit for some generations, IME.
Can you read? They are talking about the BASE model not the TURBO.
Jesus fucking christ.