Low power solutions for always-on, always-aware voice command systems, part 1

By Paul Beckmann

DSP Concepts

May 07, 2018

Story

Explore new applications, techniques, hardware, and software that will make always-on voice command systems possible.

This is part one of a three part series. Catch part two next week.

Recent advances in hardware and software have made it possible for compact, battery-powered products to include always-on voice command systems, which have already proven their reliability and appeal in tens of millions of smart speakers. This paper describes new applications, techniques, hardware, and software that will make these products possible.

What is always-on voice command?

In an always-on voice command system, the user’s voice activates – or “wakes up” – the system using a specific wake word (also called a trigger word), so it can then respond to the voice commands. The user does not have to push a button to make the system active; it is always “listening” for the wake word. This is how most current “smart speakers,” such as the Amazon Echo, Apple HomePod and Google Home, work.

Consumers vastly prefer always-on voice command to push-button voice command because the user’s hands remain free, and the user does not need to interact physically with the device. Always-on systems have, to date, been implemented mostly in devices designed for home use and powered from an AC wall outlet. However, recent technological developments have made it possible to add always-on voice command to portable/mobile products powered by batteries.

[Figure 1 | Power consumption of an Ambiq Micro MCU during Wake on Sound and keyword processing.]

Applications for always-on voice command in portable products

Because the concept of using always-on voice command in portable and battery-powered products is new, the applications for this technology are just starting to emerge. Possibilities include:

Hearables: Always-on voice command allows users to start and stop playback of audio program material; to select the material; to skip or repeat music tracks; to answer phone calls; and to access personal assistant features when the headphone or headset is tethered (via Bluetooth) to an Internet-connected smartphone. Because no button push is required, the user’s hands remain free for other tasks, making a voice-command headphone ideal for sports/fitness and office work or as an in-ear personal assistant.

Remote controls: Most current voice-command remote controls require the user to push a button to wake the system before speaking a command. Many require the user to hold the remote close to his or her mouth. An always-on system would allow the user to access the remote’s functions when the remote is out of reach (or even misplaced).

Smart home devices: Equipping an on-wall control panel with always-on voice command would allow the user to control home systems (HVAC, lighting, security, etc.) from anywhere within voice range, instead of having to go up to the panel or pull out a smartphone and call up the necessary control app. This can also reduce cost by eliminating the need for an expensive touchscreen. Battery power is desirable in these products because it eliminates the need to hire an electrician to run dedicated power lines, and it opens up voice command to a wider variety of smart home products—such as auto-opening trash cans, and voice-controlled shades and drapes.

Automotive: Always-on voice command frees the driver from having to feel around for the wake button currently used in most automotive voice-command systems, thus potentially offering safer vehicle operation. It also allows other vehicle occupants to access the system, triggering such functions as remote operation of the rear hatch.

Wearables: Always-on voice command systems can benefit many types of wearables. In a fitness tracker or collar-mounted device, always-on voice command would allow a user to control the device while running or walking, without having to fumble for the controls. Voice command is especially practical is smaller products, which may not have room for a sizeable display and control buttons. A small wearable device could serve as a clip-on personal assistant, or as an interface with a smart speaker or other device that is out of voice range. Data could be communicated via Bluetooth to a tethered phone, or through WiFi to a local network.

Challenges for always-on voice command in portable products

There are many good reasons why voice command has not, to date, been implemented in many portable and battery-powered products. The challenges are considerable.

Power consumption: For a voice command system to be able to receive commands at any time, it must be active at all times. This is no problem for smart speakers plugged into AC power, but for battery-powered products it can be a big problem—especially when battery run time is one of the primary concerns of consumers buying portable tech products, and when engineers must often minimize battery size in order to maintain a compact form factor.

In a voice-command system, at least one microphone must always be active, and the processor tasked with recognizing the wake word must also be active. In larger systems, some of these functions may be isolated to special-purpose components, allowing most of the device’s other components to be powered down when the device is idling. Smaller portable products tend to rely on a system-on-a-chip (SoC), in which a single component performs almost all of the device’s functions, so there may be few or no inactive components that can be shut down.

Battery life expectations: As noted above, battery life is often a primary concern for tech consumers. Most will expect to get at least a full day’s use (8 hours) from a product without recharging or replacing batteries. Most active headphones and earphones now run for 18 to 20 hours; even inexpensive models can usually manage 10 hours. While some of the latest wearables, such as “true wireless” earphones and clip-on wireless speakers, have battery run times in the range of 5 hours, manufacturers are under pressure from consumers and reviewers to improve this performance.

The challenges are even greater in voice command products that serve as control interfaces. Consumers currently expect the batteries in a remote control to last 6 months to a year; even remotes with rechargeable batteries and charging bases need to last at least a couple of weeks on a charge, as the remote will often be left on the couch instead of being returned to the charging base. On-wall control panels for smart home systems tend to run for about a year (or even two) on a set of AA or AAA batteries; it’s unrealistic to expect consumers to change these batteries frequently, and impractical to use rechargeable batteries in an on-wall control panel.

Questionable Internet connection: While home products can rely on a nearly always-present Internet connection, allowing most voice recognition processing to be off-loaded to external servers, portable products cannot. Most such products need to be tethered to a smartphone through Bluetooth Low Energy to achieve an Internet connection, and in many locations, cellular data connections are unreliable or even impossible.

Because of the unreliability of Internet connections in portable applications, portable products using voice command must recognize and process a small vocabulary of voice commands on their own, without help from external servers. This requirement demands more powerful processing, while also limiting the functions that can be controlled through voice command.

Form factor: The compact size of most portable products may demand compromises in the number of microphones used in an array, and may also force engineers to position microphones in a way that compromises their performance and makes precise matching of response and sensitivity of multiple microphones difficult or impossible.

Form factor of wearables and other compact portable products also forces product designers to choose smaller batteries, which offer less power. For example, a typical AA alkaline battery might offer 3000 milliamp-hours (mAH) of power, while a CR2032 lithium “coin cell” of the type used in many tiny tech products offers only 220 mAH of power. This means that a product drawing 10 mA (or 10,000 µA) will run for 22 hours when powered by a CR2032.

Environmental factors: Portable products are exposed to far more challenging environments than home products. Wearable products must be at least sweatproof (requiring an ingress protection rating of IPx5), and portable products intended for rugged outdoor use are expected to be fully immersible (a rating of IPx7). The seals required to achieve these ratings may impair the function of microphones and place limitations on the configuration of microphone arrays.

Paul Beckmann, PhD, is Founder/CTO of DSP Concepts. Dr. Paul Beckmann has extensive experience developing audio products and implementing numerically intensive algorithms. Paul spent 9 years at Bose Corporation where he developed the first Lifestyle digital home theater product and was awarded the “Best of What’s New” award from Popular Science for contributions made to the Videostage decoding algorithm. Paul was tasked by Dr. Bose to charter Bose Institute with industry courses on digital signal processing, and holds a variety of patents in signal processing techniques. He received BS and MS degrees in Electrical Engineering from MIT in 1989 and a PhD in Electrical Engineering, also from MIT, in 1992. He was a Rockwell Fellow while in graduate school. 

Aaron Grassian is Vice President of Marketing at Ambiq Micro. Aaron is an international business development executive with experience in building, organizing, and managing global teams and channel partners for both public and private technology companies.  He has repeated success in driving ultra-low power solutions through multiple channels, targeting and winning initial designs, and quickly growing revenue through customer engagements and partnerships.  His career, which started in product marketing at Motorola, includes 15 years of experience in building and managing Asia sales for startup companies including SigmaTel, Luminary Micro (acquired by Texas Instruments), and Calxeda.   He has global customer experience including managing worldwide distribution for SigmaTel while being stationed in Hong Kong.  Aaron holds a BSEE from the University of Florida. 

Matt Crowley is CEO at Vesper. Matt’s passion for building great teams to bring disruptive technologies to market is fully evident at Vesper – which introduced its first product to market nearly five times faster than the industry average. Prior to joining Vesper, Matt was founder and vice president of business development at Sand 9. At Sand 9, Matt pioneered the mass commercialization of piezoelectric MEMS devices and led partnerships with industry leaders such as Intel, Ericsson, Analog Devices and CSR. Analog Devices acquired Sand 9 in 2015. Sand 9 was spun out of Boston University Office of Technology Development, where Matt was responsible for evaluating new inventions, forming new companies to commercialize those technologies and managing a venture capital program. Before joining BU, Matt worked at Mars & Co strategy consulting, where he advised Fortune 500 companies on operational and strategic issues. Matt received an interdisciplinary degree in Physics and the Philosophy of Science from Princeton University. He is fluent in Japanese and has lived in Japan. 

Focusing on technology as DSP Concepts continues to grow. We specialize in audio processing solutions for streamlining audio product development with our Audio Weaver tools and solutions.

More from Paul