Search the Moravia Blog

Blog

How to Leverage High Quality TTS Voices for a Localization-to-Vocalization Solution

Posted by Doug McGowan on Mon, Mar 27, 2017 @ 05:00 PM

Leveraging High Quality TTS Voices for a Localization-to-Vocalization Solution

Have you ever been frustrated at having to find and audition narrators for each language, and book the recording studios, only to have an unexpected revision require re-recording at a later date? For e-learning and training videos, the solution may just be Text-to-Speech (TTS). And for languages other than English, that solution might best be sourced from a local entity that really knows the language.

As my colleague Jill Polanycia wrote in her article “Text-to-Speech: Trendy Tech That’s Surprisingly Everywhere,” TTS can take trainings to the next level, since people can perform tasks while listening to instructions instead of having to keep their eyes glued to the screen. And when you consider that a TTS system can churn out in five minutes what it would take a voice talent about eight hours to produce, you could be looking at some major savings in time and money.

Voices that say it like you mean it

TTS from a decade ago sounded quite robotic. But nowadays, TTS sounds more human (well, some of them do). This is because the voices are based on sampled human voices instead of 100% algorithm-based synthetic audio waves. 

There are many online sources that let you test-drive their voices with your own text strings, so I went through and tested Acapela Group, NaturalReader, Voice Reader by Linguatec, VoiceText by HOYA, and AITalk. By paying for these services, you get access to more voices as well as the ability to tweak the voices and fine-tune their delivery. But for this article I stuck to what’s free.

The text I fed these TTS engines was a couple of lines from our website: “The world has gone digital and it’s changed the way content is created and distributed. To keep up, your content has to be agile, omni-channel, multi-media, and quick enough to move at the speed of social.” Great message, right? Here are the TTS voices that I found to be closest to human quality, beginning with the ladies.

Top five female English TTS voices

As you probably guessed, this was not a scientific study, and the selections may have been skewed due to personal preference. But in our business we’ve seen that the criteria for what sounds robotic and what sounds human tends to be preferential, so it’s always the client that makes the final decision on the voice. 

All of the services had great and not-so-great voices, so my top 5 voices were dispersed. Acapela accounted for two, with one each for NaturalReader, Voice Reader, and VoiceText.

Top five male English TTS voices

For some reason, the male voices seemed to show a lot more character. Or was that just because many of the high quality voices were UK English?

Always a strong performer, Acapela provided two of the top five voices that you just heard. Perhaps surprising is that the remaining three were from VoiceText by HOYA, based in Japan, and Voice Reader by Linguatec, based in Germany.

If these companies based in Germany and Japan have a lot of know-how related to their own native languages, which they surely do, one might assume that their voices for those languages are top-notch. And although I cannot speak for German, the Japanese selection for VoiceText and AITalk were indeed comprehensive.

Top five female Japanese TTS voices

The text I had them read was the Japan localized version of the previous English text: デジタル化が進み、コンテンツの作成や発信の方法にも変化が生じています。この動きに乗り遅れないためには、オムニチャネルとマルチメディアに対応したアジャイルなコンテンツを作成し、世の中のスピードにあわせて提供することが必要です。

Much as expected, all of my Top 5 selections for best female Japanese voice were from Japanese providers—four from VoiceText and one from AITalk. Undoubtedly, AITalk would have been much stronger in a head-to-head comparison of the full version software applications, but as far as their free online samplers go, VoiceText had the upper hand.

Although Acapela’s Sakura was not bad, it was overshadowed by the quality and rich selection offered by the Japanese vendors. Acapela’s voice did find its way into the humanoid robot Kokoro being tested at Narita Airport, but the voice used is the US English voice Sharon (one of my Top 5) while the Japanese voice is provided by AITalk.

Kokoro.png

Source: http://www.acapela-group.com/humanoid-assistance-test-kokoro/

 

Top five male Japanese TTS voices

This was not by design, but the Top 5 selections for best male Japanese turned out to be four from VoiceText and one from AITalk, exactly as it was for the female voices.

Perhaps I should mention that the voice named SHOW by VoiceText offers a slightly non-standard intonation, but that is due to its fictitious persona having “come from Kumamoto Prefecture. It is being used as the narration for the TV show Moya-Moya Summers, and its slightly off intonation adds a unique accent to the show (pun intended).

Many other possibilities

A number of TTS providers do not offer online samples but have downloadable software you can try for free, such as Balabolka, AudioBookMaker, and NaturalReader (which also offers online sampling).

Depending on your particular needs, you may wish to go all the way and integrate with IBM Watson (which uses a male voice for English and a female voice for Japanese), or try Amazon Polly.

The internet is a wonderful place—so look around (like I did) to see what solutions are available for you. If you haven’t done so already, check out these great articles here and here by eLearning Industry.

Almost forgot…

Acapela’s rich selection of voices includes a number of unique offerings that can spice up your projects in interesting ways.

And if you want your voices to not only talk but to sing too, you can go to VoiceText and have digital voice actors (seiyu) sing tunes for you. By the time we reach Vocaliod by Yamaha, we will have veered completely off topic, but in a sense we are talking about the same technology. 

There used to be a time when human singers sang to digital music. Now you can experience projected digital characters singing with their digital voices with a live band in front of the most enthusiastic live audiences.

Digital voices never grow old, nor do they catch cold. They are reliable, dependable, and capable of reproducing identical quality indefinitely. (Ever have a voice talent show up for a follow-up job and sounding completely different? No such problem with digital voices.)

To some, toony digital characters in concert may be a bit too far out there, as we can see in these reactions from kids and elders. But whatever your gig, the technology is out there and ready for you to leverage it to your advantage. If you’re looking to localize, why not go a step further and vocalize?

Topics: Localization