How do I get audio pronunciation right?

Here's some tips to get your avatar's voice and pronunciation exactly as you want it.

πŸ’‘ Not yet a Synthesia user? Create your account now.

AI system that produces our voices is great - but not always perfect. Occasionally it mispronounces words or makes strange pauses. Below are some quick tips to get your avatar's voice just right.

Quick Tips:

  • πŸ”ˆ Use Audio previews: Type in your script and press "Play script" to preview your voice. To listen to a particular part of your script, simply highlight the text and press "Play selected".
  • πŸ”  Use short paragraphs: separate long scripts into smaller paragraphs to avoid errors when generating videos.
  • πŸ“ Spell words correctly: make sure you have used correct spelling in your script.
  • 🚫 Don't mix languages: eg. don't use English words in a Spanish script.
  • ℹ️ Language detection problems: If the AI system doesn't recognise your chosen language, e.g. your script is in Croatian but Bosnian is detected instead. Type in a few typical Croatian words in front of your script to help the system recognise. Once the right language is detected, delete those words. 

More Voice Improvements:

  • πŸ’¬ Insert breaks if needed: you can add additional breaks into the script by inserting break tags - for example <break time="2s" />. Please see the section below for more info.
  • ✍️ Use punctuation marks: a script without proper use of commas and periods would sound too fast and hard to listen to. Use periods, commas, hyphens, question marks to help our AI system sound as you would want it to sound. More on this below.
  • πŸ—£οΈ Fix pronunciation of words, acronyms and numbers if needed: for example, it's useful to sometimes split words with a hyphen sign to help our AI system pronounce them correctly. So an example would to write "con-tent" instead of normal "content". More tips on this below.


πŸ™‹ Most importantly - use breaks and hyphens: Improving voice results is a matter of creatively using periods/commas and sometimes rearranging sentence structure. We highly recommend getting used to sometimes inserting break tags and influencing specific word pronunciation by splitting words with hyphens.

Add additional breaks to the video

Our voices support a so-called SSML markup language. This markup has quite a few different tags but, for now, the most important one is the ability to instruct the voice to create breaks. 

Wherever you want an additional break in your text simply input (2s is an example here, you can specify time in seconds or milliseconds):

<break time="2s" />

πŸ’‘ The break can be up to 5 minutes long.

So, for example I have the following text in Synthesia:

Hey John! How are you doing today?

Let's say I'm not happy with the default break after "John!". Breaks are especially useful to better separate sentences.

I can now simply input the following markup to add a break:

Hey John!<break time="50ms"/>How are you doing today?

Correcting pronunciation

Pronouncing company names, acronyms, business terms or slang can sometimes be difficult for the AI because they are ambiguous. Getting the pronunciation right is a matter of inserting hyphens or spelling the word phonetically.


Try inserting hyphens to make the word sound like you want. Example:

  • [ Content β†’ con-tent ]

Alternatively, you can also help the system by using phonetic spelling of the words. You can read more on this below.


if you want them to be pronounced like a word, try spelling it like it would sound. Examples:

  • [ AI β†’ a-eye ]‍
  • [ AWS β†’ a-"double you"-s ]‍
  • If you want the acronym to be pronounced word by word make space between the letters: [ NYC β†’ N Y C ]


Change how you spell them depending on how you you want them to sound. Examples:

    • [ Ten eighty-nine -> 10 89 ]
    • [ Two five eight six -> 2 5 8 6 ]
    • [ One hundred and forty eight: -> 148 ]

    Using punctuation marks

    If you are having issues with the rhythm of the sentence, try adding commas/periods, quotes or re-arranging the sentence:

    • Commas will add shorter breaks than a period
    • Periods will add a longer break and downwards inflection
    • "Quotes" will add emphasis to that part of the sentence

    For example these two examples will result in different rhythm and pauses:

    [ Here’s a demonstration of how a sentence without any breaks or commas at all compare to a sentence that has as you can see the video without can be difficult to follow because there are no breaks or pauses in it. ]


    [ Here’s a demonstration of how a sentence, without any breaks or commas at all, compare to a sentence that has. As you can see, the video without can be difficult to follow, because there are no breaks or pauses in it. ]

    Video example of a video without and with breaks:

    Advanced: fix pronunciation by using phonetic spelling

    You can sometimes fix word pronunciation by using their phonetic spelling. Below we've included a handy table to help you replace letters with phonetic alternatives. Example:

    • [ Desert β†’ de-zert ]

    5f2aa75714ff3e0847b6ca23_Screenshot 2020-08-04 at 20.07.48