Microsoft's Leaps in Speech Recognition Boost Entire Language Sector

Posted by Libor Safar on Fri, Aug 25, 2017 @ 11:47 AM

Speech Recognition

Language technology lovers have cause for celebration this week. Microsoft announced that its conversational speech recognition technology has actually surpassed parity with professional human transcribers. With a 5.1 percent error rate, it is a 12 percent leap in error reduction over just last year’s measurements, sets a new industry standard, and is expected to be a boon to a wealth of Microsoft business services, including those in the translation space.

According to Xuedong Huang, the chief speech scientist of Microsoft’s Speech and Dialog Research Group, one star of this success story is Microsoft Cognitive Toolkit 2.1. The tool, distributed for free on Github under an open-source license, is built for processing massive datasets. In this case, it was trained to tackle Switchboard, a dataset of 260 hours of recorded American English telephone conversations. They were collected for Texas Instruments in 1990 and 1991 and, since then, made available to a wide variety of industry and academic projects in the speech recognition sector.

A number of Microsoft products have already benefited from its research group’s work. Among them is Presentation Translator, which was just launched in July. A PowerPoint add-in powered by the Microsoft Translator live feature, Presentation Translator translates live presentations from ten spoken languages—specifically Arabic, Chinese (Mandarin), English, French, German, Italian, Japanese, Portuguese, Russian, and Spanish—into 60 supported text languages, output as slide subtitles. Moreover, for English and Chinese speakers, Presentation Translator allows users to customize the speech input to handle their industry-specific jargon and terminology, boosting accuracy by as much as 30 percent according to Microsoft.

Getting started with Presentation Translator. Source: Microsoft Research

As Huang notes in his blog post on the group’s achievement, such improved accuracy in conversational speech recognition comes with some caveats. Heavily accented speech, multilingual and multi-party conversations, and even noisy background environments continue to challenge the technology. Additionally, as machine translation users can readily attest, not all languages are as well supported as the world’s most spoken languages.

Nevertheless, what this and other successes in the speech recognition space mean for translation and localization customers is impressive. Global players, including the likes of Microsoft, Apple, and Google, are bringing together AI, deep learning tech, and machine translation engines to offer seamless multilingual product and service delivery—and accompanying multilingual marketing—to business customers worldwide.

Even end consumers are benefiting, because these speech recognition systems drive intelligent virtual assistants (IVAs) such as Microsoft’s Cortana (for Windows 10), Apple’s Siri, and Amazon’s Alexa, and are making their way into an increasing number of homes.

Just last week, in fact, Amazon announced the launch of the Alexa Voice Service Device SDK, opening Alexa to outside developers. Also recently, Mozilla announced a project called Common Voice, for which it is seeking volunteers to contribute to an open-source voice recognition system as a non-proprietary alternative. According to research firm Global Market Insights, the multilingual and global IVA market will reach more than USD 7.5 billion by 2024, driven (unsurprisingly) by developments in voice recognition technology and growth in mobile technology markets worldwide.

Getting started with the Alexa Voice Service Device SDK. Source: Amazon Alexa Developers

Whether developed for the private sector or the public, and whether used in our workplaces or automobiles, advances in speech recognition technology are set to transform our multilingual markets worldwide. Kudos to the Microsoft research team for their contribution.

Topics: Localization Technology

All

Read more from our blog

Featured Post
This is What a Highly Mature Localization Program Looks Like
Moravia and Microsoft Office team up to share how localization program challenges were met through program change and evolution—for both supplier and customer.

Lee Densmer

Most Popular Post
English is Weird: Starting With the Word 'UP'
English is difficult to learn. One example is the use of the word 'UP', and the vast number of concepts that contain it.

Lee Densmer

Subscribe

Follow us    

Other Moravia Blogs

What You Need to Know About Marketing Medical Devices in Emerging Markets: Part 2
イエスノーで簡単診断:貴社のローカリゼーションは何型?
Renato 说全球:像 Netflix 一样同时在 130 个国家或地区发布本地化产品是一个什么样...
Globally Speaking is a program for and from localization professionals.