Tips and tricks to get your presenter voice and pronunciation exactly as you want it.
(Not yet Synthesia user? Create your account now.)
Occasionally the AI system will pronounce words incorrectly or have strange pauses and/or breaks. Specifically pronouncing names and acronyms can be a challenge. Below are some quick tips to get your video presenter voice just right.
- 🔈 Audio preview: use the new audio preview feature. Type in your script and press "Listen" to preview your voice. This will help you fix most obvious problems and prevent generating faulty videos.
- 📝 Correct spelling: make sure you have used correct spelling in your script.
- 🚫 Don't mix languages: eg. using English words in a Spanish script
- 🗣️ Correct pronunciation of words, acronyms and numbers: please see the section below on how to tweak this.
- 💬 Tweak rhythm & breaks: please see the section below on how to add breaks and tweak the rhythm of your speech.
Improving results is a matter of creatively using phonetic spelling, periods/commas and sometimes rearranging sentence structure to get a better result.
🙋 We are here to help: if you have any issues with your videos, simply let the Synthesia team know via chat. We will delete the faulty videos and add additional video credits to your account.
Add additional breaks to the video
Some of our voices now support a so-called SSML markup language. This markup has quite a few different tags but, for now, the most important one is the ability to instruct the voice to create breaks.
Wherever you want an additional break in your text simply input (2s is an example here, you can specify time in seconds or milliseconds):
<break time="2s" />
The break can be up to 5 minutes long.
So, for example I have the following text in Synthesia:
Hey John! How are you doing today?
Let's say I'm not happy with the default break after "John!". Breaks are especially useful to better separate sentences.
I can now simply input the following markup to add a break:
Hey John!<break time="50ms" />How are you doing today?
⚠️ SSML tags should not span multiple lines: if you're testing the use of other SSML tags, please note that currently they should not span multiple lines. Please note that other SSML tags are only supported on a few voices. Here's an example.
✅ This will work:
<prosody> text </prosody>
⛔ This will not work:
Pronouncing company names, acronyms, business terms or slang can sometimes be difficult for the AI because they are ambiguous. Getting the pronunciation right is a matter of spelling the word phonetically or using spaces, dashes and quotation marks for emphasis.
- For words, try spelling the word the way it sounds. This takes a bit of practice and you can use the table below to spell out your words:
- [ Desert → de zert ]
- For acronyms, if you want them to be pronounced like a word, try spelling it like it would sound:
- [ AI → a-eye ]
- [ AWS → a-"double you"-s ]
- If you want the acronym to be pronounced word by word make space between the letters: [ NYC → N Y C ]
- For numbers, change how you spell them depending on how you you want them to sound:
- [ Ten eighty-nine -> 10 89 ]
- [ Two five eight six -> 2 5 8 6 ]
- [ One hundred and forty eight: -> 148 ]
Prosody, breaks and sentence rhythm
If you are having issues with the rhythm of the sentence, try adding commas/periods, quotes or re-arranging the sentence.
To get the result you want, try editing your text in ways that might not be grammatically correct or would read well as text.
- Commas will add shorter breaks than a period.
- Periods will add a longer break and downwards inflection.
- "Quotes" will add emphasis to that part of the sentence (for some voices only).
For example these two examples will result in different rhythm and pauses:
[ Here’s a demonstration of how a sentence without any breaks or commas at all compare to a sentence that has as you can see the video without can be difficult to follow because there are no breaks or pauses in it. ]
[ Here’s a demonstration of how a sentence, without any breaks or commas at all, compare to a sentence that has. As you can see, the video without can be difficult to follow, because there are no breaks or pauses in it. ]