Voice rec is terrifying

November 15, 2017 Brandon Lewis

In a world obsessed with Internet privacy it’s surprising how little we talk about always-listening devices like the Amazon Echo. After all, a company that wants to learn intimate details about your life in order to sell you more stuff has a microphone permanently fired up in your kitchen.

If you own an Echo and weren’t aware of this feature, open up your Alexa app, select the “Settings” menu, and then select “History.” Take a listen. Were all of those recordings intended for the Echo?

I guess privacy is the price of convenience in modern consumerism. And things are about to get a whole lot more convenient.

Cacophonies, cocktail parties, convenience, and Christmas

XMOS is a fabless semiconductor company that spun out of the University of Bristol to focus on voice and music processing ICs. Among those ICs, devices based on the 32-bit xCORE MCU architecture have had notable success in the voice recognition market, delivering 16 programmable cores (partitioned into two tiles of eight cores with a shared address space for each) with DSP functions integrated in the same chip.

XMOS recently parlayed the xCORE architecture into the VocalFusion 4-Mic Dev Kit for Amazon’s Alexa Voice Service (AVS). The kit is designed around the VocalFusion XVF3000 integrated far-field voice processor and four high signal-to-noise-ratio (SNR) MEMS microphones from Infineon (Figure 1). XMOS claims the kit is the first far-field linear microphone array solution available on the market.

Figure 1. The XMOS VocalFusion 4-Mic Dev Kit for Amazon’s Alexa Voice Service (AVS) is based on the XVF3000 integrated far-field voice processor and a linear MEMS microphone array from Infineon.

Outside of range, far-field voice processing gets really interesting when combating the “cocktail party” problem, or situations in which a platform needs to isolate the voice of a single speaker from a noisy environment. At distances of 5 m or more, the VocalFusion 4-Mic Dev Kit uses a combination of acoustic echo cancellation (AEC), adaptive beamforming, dynamic de-reverberation, and automatic gain control (AGC) to isolate and extract the voice signal of a primary speaker. Beyond this is where things start to get spooky.

Earlier this year, XMOS acquired Setem Technologies, Inc. of Boston, MA, who develops massive Fourier transforms for blind-source signal separation. These blind-source separation algorithms mathematically decompose elements of source signals from a set of signals and then reconstruct them, either individually or as groups (Figure 2). In voice recognition this can be applied to an individual speaker, or even a conversation.

Figure 2. Setem Technologies, now a part of XMOS, develops blind-source separation algorithms that can be used to isolate a speaker or speakers in noisy environments.

Now, in theory (and perhaps in practice), blind-source separation can be used to isolate the voice frequencies of multiple speakers in a room, and thereby establish a biometric identity for each. As you can imagine, the application of such technology could be widespread, and not just in the sense that Amazon wants to know what every member of your family wants for Christmas. Surveillance, for instance, immediately comes to mind.

This takes us back to the VocalFusion 4-Mic Dev Kit’s linear microphone array. While many platforms such as the Amazon Echo and Google Home use a circular array of omni-directional microphones to provide 360-degree room coverage, a linear array is designed for 180-degree arcs. This is of interest because leaders in the voice recognition space envision a future where the tower-based virtual assistants of today recede into everyday objects like TVs, refrigerators, sofas, walls – you name it.

This future is designed to be ultra-convenient, delivering service by the syllable. But be careful. You probably won’t know who, or what, is listening.


About the Author

Brandon Lewis

Brandon Lewis, Editor-in-Chief of Embedded Computing Design, is responsible for guiding the property's content strategy, editorial direction, and engineering community engagement, which includes IoT Design, Automotive Embedded Systems, the Power Page, Industrial AI & Machine Learning, and other publications. As an experienced technical journalist, editor, and reporter with an aptitude for identifying key technologies, products, and market trends in the embedded technology sector, he enjoys covering topics that range from development kits and tools to cyber security and technology business models. Brandon received a BA in English Literature from Arizona State University, where he graduated cum laude. He can be reached by email at brandon.lewis@opensysmedia.com.

Follow on Twitter Follow on Linkedin Visit Website More Content by Brandon Lewis
Previous Article
Solicitation for Voters on ANSI Ballot of VITA 48.4

VITA is calling for persons interested in balloting on VITA 48.4-201x, Liquid Flow Through VPX Plug-In Modu...

Next Article
Hardware and software engineers designing SoC FPGAs stand to profit from Aldec QEMU Bridge

Aldec supports an integrated co-simulation environment with a virtual processor emulator.