Sound is getting smarter, from voice activation to spatial audio

October 17, 2017 Moshe Sheier, CEVA

Thanks to the explosion of new smart speakers and other voice activated devices, you’re likely comfortable talking to your devices. What about the use of language-translating earphones while traveling abroad? These aren’t so common yet, but Google just made that available too with its Pixel Buds.

In a recent post, I reviewed the second wave of smart speakers, characterized by efficient production and mass shipment. As I speculated there, this phase will put pressure on the market leaders to innovate and create new product categories, raising the bar further for the rest of the pack. That’s exactly what we’ve been seeing this month, with a slew of new audio and voice products announced by Google, Amazon, Apple, and others.

[Google's new earbuds offer real-time translation in 40 different languages.]

The Google Home was introduced last year as an answer to the enormously successful Amazon Echo. Now, Google is expanding the line to include answers to the Echo Dot, the Apple HomePod, and the Apple AirPods. This suits the voice-activated Google assistant for more use cases and for a wider range of budgets.

The new pin-cushion-shaped Google Home Mini is a smaller and cheaper alternative to the Google Home. For those looking for an improved sound experience and are willing to spend more, the Google Home Max delivers a high-end sound experience with all the functionality of the Google Home. The new Pixel Buds offer an on-the-go experience, pairing with a smartphone to create an in-ear personal assistant, using Bluetooth audio streaming, similar to the Apple AirPods. One of the most intriguing features of these buds, as you can see in the video, is the integration of Google Translate to offer in-ear simultaneous translation. Google says that it’ll support 40 different languages, a number that will undoubtedly grow as the product evolves.

[Figure 1 | From top to bottom: AirPods vs. Pixel Buds, Echo Dot vs. Home Mini, and HomePod vs. Home Max.]

Another exciting development in voice activation is that the new GoPro Hero6 action camera can be turned on with a voice command. This is GoPro’s second-generation device with a voice interface, first introduced in the GoPro Hero5. Most of the voice commands aren’t new, like “GoPro start/stop recording,” and so on.

The innovation of the new model is that you can use your voice to power on the camera by saying “GoPro turn on.” The feature is optional through the settings, and is active for eight hours after the camera was powered off. While this isn’t a fully always-listening device, it’s a major step in the direction. The voice-activated GoPro is another example of something we’ve been envisaging for a while, that voice will become the main user interface. Voice is the most natural and intuitive interface for human-machine interaction, and soon all our devices will be always-listening, awaiting our command.

Cars are next for Alexa, then your face

Amazon is making that vision a reality, and is still a step ahead of the “newcomer” (Google) in the smart speaker market. Preceding the Google announcement, Amazon announced a bunch of new Echo products, including some new sleekly designed Echo models and a smart alarm clock, called Echo Spot, which continues the line of screen-integrated models, started by the Echo Show. Amazon also revealed that Alexa will soon be integrated into BMW 2018 models. This might be the first step to Alexa being a favorite road trip companion, after being the most popular in-home smart speaker.

More interesting than anything that was officially announced are the speculations about an upcoming Amazon glasses product. The rumors are rampant about the Alexa-based hands-free, battery-powered glasses. Per the speculation, the glasses would not offer visuals, but serve as a wearable, allowing the user to speak to Alexa anytime, anywhere. This would be a huge step forward for Amazon, both entering wearables and becoming always-on. It’s obvious that the power outlet must be eliminated to unleash the full potential of voice assistants, and the technology is here to implement it.

Spatial audio can make or break virtual/augmented reality

Having Amazon and Google enter this market will directly lead to an overall improvement in hearables. There are already many interesting concepts in this category, such as a Kickstarter project for an A.I. personal trainer named Vi, which learns the user’s biometrics and customizes training to achieve athletic goals. Now, imagine adding multi-dimensional spatial audio to create the impression that Vi is ahead of or behind the user, giving an extra motivational boost to break a personal record (like in this patent application from a decade ago).

[Figure 2 | There’s a lot of hype about visuals in AR/VR applications, but for an immersive experience, audio is crucial.]

That’s the idea behind the latest audio innovation in Apple’s newest iPhones. For the first time, the iPhone 8 includes dual speakers. This a big deal because dual speakers enable multi-dimensional spatial audio, which means that Apple is betting on augmented and virtual reality (AR/VR). To create an immersive AR/VR experience, spatial audio is a must. Otherwise, all the amazing graphics won’t be convincing enough to generate a realistic reality.

Next is neural-network-enabled sound sensing

What’s the next step for audio? The iPhone 8 and X already include a dedicated neural network engine. For hearables and voice activated devices, neural networks could be used for sound sensing and audio analytics. This is already used in the home to identify certain sounds like a doorbell or breaking of glass, to trigger the appropriate response. In hearables, this could improve the safety of personal trainers, like Vi, or other immersive AR/VR applications. The neural nets could sense certain important sounds and let these get through, such as sirens, or barking dogs, and notify the user. This way, neither safety nor enjoyment needs to be compromised for the ultimate user experience.

If you’re looking for more information, on how to design an ultra-low power always-listening voice interface, check out CEVA’s audio, voice and speech products and resources.

Moshe Sheier is the Director of Strategic Marketing at CEVA. He oversees corporate development and strategic partnerships for the company’s core target markets and future growth areas.

Previous Article
CEVA and Brodmann17 Partner to Deliver 20 Times more AI Performance for Edge Devices

Combination of CEVA-XM imaging and vision platform and Brodmann17's deep learning technology brings ultra-l...

Next White Paper
SMARC 2.0 - At the Heart of Next Generation IoT Embedded Solutions
SMARC 2.0 - At the Heart of Next Generation IoT Embedded Solutions

The SMARC - Smart Mobility Architecture standard has in a matter of a few years become a major driving forc...

How to Develop Cross-Industry IoT Interoperability

Multi-Part Series