The idea of delivering an audio product from start to finish in one day may seem ambitious, but that’s what the Audio Engineering Society (AES) set out to do in an all-day seminar at the AES 2016 show in Los Angeles on October 1.
The goal: to design “Speak2Me,” a voice command speaker similar in concept to the Amazon Echo, but with added capabilities and improved sound. Specifically, Speak2Me added a driver array that steers the speaker’s voice-synthesized command responses in a tight beam directly back at the user; and included the ability to understand natural language commands rather than the relatively small set of specific commands that the Echo understands. It was also designed with a larger form factor and better audio components so it could deliver deeper bass and fuller sound than the Echo.
The seminar featured presentations from experts in the many disciplines required to create such a complex product. These include product management, user experience, industrial design, acoustic design, digital signal processing (DSP), natural voice processing, validation and testing, and sourcing and supply. The idea, according to Scott Leslie, chairman of the product development track, was to combine lecture and lab so attendees could “get a learning experience in all aspects of product development.”
The seminar featured live demos of the Speak2Me prototype. Although the seminar explained how to design the product “in one day,” delivering the working prototype of course took much longer; the team collaborated for about three months before AES.
The process for creating Speak2Me began with a product definition created by Leslie. The next step was to come up with a pleasing product look. Designer Myk Lum of LDA created an industrial design that loosely resembled a vase, which was 3D-printed for the prototype presented at the seminar.
With the product’s configuration determined, it was up to acoustical designer Mark Trainer of Minimum Phase LLC and Dr. Paul Beckmann of DSP Concepts to engineer quality audio. Trainer created a driver layout and amplifier specification, then provided Beckmann with a block diagram of the required signal processing functions, as well as the specific settings required for the various filters, limiters, and other processing functions needed. Using Audio Weaver, a set of graphical interface audio DSP tools developed by DSP Concepts, Beckmann implemented Trainer’s signal path design and settings.
Beckmann also used Audio Weaver to improve the performance of the voice-recognition microphone array in Speak2me, and to implement the product’s most distinct feature: the beamforming array of six small drivers that used phase processing to beam voice responses from Speak2Me directly at the person who issued the voice command. This helped limit noise pollution.
“We had less than three months to design the prototype. The design team is scattered around the West Coast so we never actually met face-to-face until three days before AES,” said Trainer. “My job was to design the audio system. As everyone is aware, systems continue to get more compact and audio is often the first element to be compromised. Working on such a tight deadline was tough. It would have been extremely difficult to complete the project on time without Audio Weaver.”
“The Speak2Me project was the perfect way to highlight Audio Weaver’s strengths,” said Beckmann. “We were working under so many constraints and had to incorporate several key features: music playback, voice input, and output. Mark did an amazing job with measurements and providing the target equalizations. He also worked with Audio Weaver during the design phase. This allowed me to complete my portion of the work quickly.”
The next step was to add the voice recognition and command features, a process explained by Dan Carter, VP Engineering of VoiceBox Technologies, a company that has been developing voice recognition algorithms for a decade. Carter explained some of the challenges of voice recognition. “I have a house my team runs with four different rooms set up, and we basically go in there and read scripts. We move around to make different acoustic patterns. We get as much variance as we can with different people, with different accents, speaking to the devices at different distances.”
The day concluded with presentations by Jonathan Novick and Paul Messick of Avermetrics, a designer and manufacturer of test and measurement equipment. Novick explained how the performance of a product like Speak2Me can be measured and evaluated during development and, later, on the production line. Messick explained the process of selecting the best original design manufacturer (ODM) for a product like Speak2Me, and suggested some guidelines for getting optimum results and consistency from an ODM.
A demo of a Speak2Me prototype showed attendees that not only did the “designed-in-one-day” product improve on the Amazon Echo’s sound quality, but that its voice recognition and input and output beaming technologies worked well despite a glitch here and there. For a product created by a team of scattered professionals, some of whom had never met before and all of whom were working on an extremely tight deadline, it was an achievement, to say the least.