Software engineer of 23 years and music producer of 15 years. Extensively iterated with SUNO for a month now, and from my actual testing and some intuitive knowledge from other areas of my life, I’d like to share an important discovery:
AI audio models tend to be similar to dyslexic humans. They read and pronounce things phonetically.
So if SUNO struggles with my words I then spell them phonetically. If I want to change syllable stressing I use capital letters, and to affect the cadence I use hyphens or ellipses.
Sometimes cues for things like whispering and screaming work, sometimes the cues get pronounced. That is still very hit or miss.
Iteration is key here. Experimentation it key.
Yep, I’ve found the same. A couple examples:
“Vee hick ole” — Ensuring vehicle is pronounced to rhyme with “roll”
“En urge ee” - Ensuring energy is pronounced to rhyme with “me”
“En ur jay” - Ensuring energy is pronounced to rhyme with “away”
“Cru-stay-shin lev-aye-a-thin risin’ from out the oh-shy-inn” — Adding a weird texture to an already weird sentence for added flavour!
I’ve included your suggestions in a guide compilation for the group on Facebook.
And thus “Kaiju crab” was written into the historical record 😂
Which group if I may ask?
SUNO Music Creator’s Universe - https://www.facebook.com/share/g/1AbUyQyJ8m/?mibextid=wwXIfr
Google the linguistic pronunciation schema and put that in the style prompt
The International Phonetic Alphabet?
'Pronunciation' - IPA (International Phonetic Alphabet):
/prəˌnʌn.siˈeɪ.ʃən/
Ok that’s what I thought
Yea, the phonetics thing works for me, and things like “(oh-ohhhhh-oh OH OH OH!)”
Brackets are instructions [ ] and parentheses () for ad-libs
I've been doing the same, intuitively. Using dots in the middle of the verse to get certain pace, too
Included yours, too.
Cadence can be helped with hyphens. eg. "in-cog-nition" will help to force timing those parts to the rhythm. Without that you sometimes get "incognition", "in-cognition", "incog-nition" or "incogni-tion".
Sometimes ellipsis ... gets a longer pause than a comma but sometimes not.
Capitalising can help shift the emphasis but it doesn't always work. e.g. "SAID it was beginner's luck" will get a more natural result (and "said" doesn't actually get emphasised) as without it you get emphasis like "said IT was beginner's luck" or "said it WAS beginner's luck" which sound odd.
Vocals cues in square brackets [] seem to be more reliable than in parentheses () with less chance to be accidentally sung. Sometimes it is damned near impossible to get them followed. Eg. Adding a spoken verse at the start of a country song. Works easily at the end but not at the start (it will be fully or half sung).
Yeah I also find that if you’re remixing or replacing a section Suno can get absolutely enamoured with the prior pronunciation. You may have to start fresh or use a DAW to paste in a correctly pronounced iteration for any hope to break it free. Especially if there is a difference in syllables.
Suno really wanted to pronounce “viper” as “vie-ah-per” for me lol. No amount of “vie-per”, “vy-per”, “vyper”, etc seemed to salvage it.
And yours
Pro tip
If you use language "translators" for accents, you can sometimes get the model to use a particular accent. For example, Jamaican Patois works really well, I've had mixed success with Scottish and some English accents may work. I haven't tried other languages yet but the same may apply. To be clear, use their phonetic spelling in the lyrics to test it out.
I am just doing my first flip of a Brazilian phonk song and exploring foreign language vocals so this is great. Thank you