Google’s decision to open up its 80-language Cloud Speech API to developers is considered by some to be a direct challenge to Nuance, the market leader in speech recognition and voice services. But there may be another reason for the announcement, driven by the success of the Amazon Echo last year. Whatever it turns out to be, it’s certainly ignited the discussion about the merits and capabilities of voice user interfaces.
Last year, Amazon’s Alexa Voice Services (AVS) came to public attention last year following the launch of the Amazon Echo, a breakthrough product in a completely new category, the smart speaker. The Echo is an embedded device that handles voice capture, microphone and voice DSP, and encapsulation of signals that are sent to the AVS servers as an MP3 file over a “thin-client” interface. Speech recognition and access to supporting services happens in the cloud, where Alexa sends back MP3-encoded voice replies to the Echo for playback.
While the Echo was the first consumer implementation of AVS, Amazon clearly wants developers to use the Alexa Skills Kit, a collection of APIs, to develop Alexa-based, voice-controlled domestic devices for everything from home automation to AV systems, and even white goods. New Alexa-aware products are appearing every week. As AVS terminals proliferate, Amazon will gain point-of-presence in the home through which profitable goods and services may be delivered.
The smart speaker and point-of-presence opportunities have not gone unnoticed in China. Baidu’s Duer, iFlytek’s Ding Dong, and Rokid’s beautifully designed units have all appeared in recent months. However, none of these companies, except Baidu, yet have an end-to-end service offering or Amazon’s ecosystem support.
Until now there’s been no announcement from Google, so the new API strategy is significant because it looks like the Internet giant realizes that it needs to catch up, just like established speaker manufacturers Bose and Sonos have to catch up with the Amazon Echo which captured 26% of online sales in the wireless speakers market in 2015, according to 1010data.
If Google just wants to challenge Nuance, the APIs are a step in the right direction, but it seems more likely that Google wants to create an ecosystem to rival Amazon’s AVS, or it risks losing significant market opportunities. It will take considerable investment to replicate the Alexa Skills Kit, and there’s currently no access for embedded devices to the cloud speech API, but the prize of having a control hub in people’s homes is crucial.
For providers of embedded voice recognition and control devices, Google’s announcement looks set to give the market a significant boost, particularly if the intention is to drive ecosystem development. But to accelerate the impact, Google now needs to open up embedded device access to its cloud speech services, too. We await that announcement with bated breath.
Huw Geddes, Director of Marketing for XMOS, has an extensive background in the delivery of technology to designers, developers, and engineers. Prior to joining XMOS as a Information/Documentation Manager, Huw worked as Technology Transfer Manager at Superscape, and Technical Author at VideoLogic. He also has a strong background and interest in fine art and exhibition management.