Beamforming is the processing of signals from multiple omnidirectional microphones to focus on the sound coming from the direction of the most prominent source (i.e., the user’s voice) and disregard sounds coming from other directions. The traditional way of evaluating beamformers is to look at their beam patterns. However, our experience at DSP Concepts has been that the ratio of the user’s voice to the background noise is the ultimate determiner of the performance of a voice UI system. This is like a signal-to-noise ratio (SNR); the signal is the level of speech and the noise corresponds to interfering sounds in the room.
In voice UI applications, the user’s voice may be at the same level as the music playback coming from a smartspeaker – an SNR of 0 dB. Thus, processing that can elevate a user’s voice a few dB above the noise can produce a large improvement in voice-recognition accuracy. For example, a 6 dB improvement in voice UI system SNR allows reliable operation from twice as far away.
The following tests show how different parameters of microphone array design affect SNR, and thus voice UI reliability. To give product designers an idea of how the different parameters of microphone array design affect SNR (and thus voice UI reliability), we recently studied a wide variety of microphone arrays, using different numbers and types of microphones and various microphone spacings. Testing was done in an environment with diffuse-field noise at 50 dB SPL, with a speech signal at average 60 dB SPL. Signal processing was performed using DSP Concepts’ Audio Weaver Voice UI algorithm package.
Test #1: Number of microphones
The polar plots below show the pickup patterns of circular arrays using two to six microphones. Ideally, the pickup pattern should show a tight beam pointed directly to the right, with little variation at different frequencies.
As the plots show, increasing the number of microphones generally allows for a tighter, more focused beam. The two-microphone array does a relatively poor job of rejecting sounds from 180°. This error can be especially problematic if the unit is placed near a wall or other large sound-reflecting object, where the reflection might cause the voice UI system to think the user’s voice is coming from the wall instead of from the user. The six-mic array produces the best results, with tightly focused beams on the 0° axis, negligible off-axis lobing, and excellent rejection of sounds from 180°.
Test #2: Microphone SNR
Because system SNR is critical to voice recognition, it’s tempting to assume that using microphones with higher SNR would improve voice UI performance. To test this assumption, total system SNR was tested with microphones rated at 64 and 70 dB SNR, each type arranged in arrays comprising one to six microphones. The main advantage of a high-SNR mic would occur at low frequencies, because the improved SNR would permit more aggressive processing of low frequencies, which is where most environmental noise in homes and autos occurs.
The following graph shows how microphone signal-to-noise ratio affected the performance of the different microphone arrays. The higher the trace is on the chart, the better the SNR and the better the performance of the voice UI system should be. Solid lines show the results with the 64 dB SNR microphone; dotted lines show the result with the 70 dB SNR microphone.
The graph above shows results in 50 dB SPL ambient noise, which is what would be encountered in a typical residential living room with common levels of noise. In this case, the improvement gained by using microphones with better SNR would not noticeably improve voice UI performance.
We also conducted tests in a background noise level of 35 dB, which corresponds to a very quiet home environment. Under these conditions, using microphones increases system SNR by as much as 1 dB. However, the reduction in ambient noise already improves SNR by about 14 dB, so the benefit of an addtional 1 dB improvement would be insignificant.
Test #3: Microphone gain matching
It is common for the gain of two samples of the same microphone to vary as much as ±3 dB, depending on the specified tolerance. To evaluate the effects of microphone gain mismatch on system SNR, models of theoretical arrays of one to six perfectly matched microphones were tested.
The following graph shows how microphone gain tolerance affected the performance of the different arrays with a ±3 dB gain mismatch, with system SNR shown relative to frequency. The higher the trace is on the chart, the better the SNR and the better the performance of the voice UI system should be. Solid lines show the results with perfect gain matching; dotted lines show the result with the gain mismatched ±3 dB.
These charts show that gain mismatches in arrayed microphones can have a large negative impact on system SNR, often comparable to the impact that reducing the number of microphones might have.
These tests were performed on a theoretical array without an enclosure. Once the mics are mounted in an enclosure, gain and frequency response will change depending on how and where the mics are mounted and the consistency of the acoustic seals around the mics. For this reason, using mics of better consistency, or supplied with factory calibration data, may not produce an optimal result because the acoustical effects of the enclosure and mounting may introduce performance inconsistencies.
The best solution in this case is for gain to be measured with the mics installed, and the gain for each mic adjusted in software. Ideally, each unit would be individually measured and calibrated in the factory after the product is assembled.
Test #4: Microphone spacing
Increasing mic spacing in an array might be expected to create greater differences in level among all the mics because the source-to-mic distances will be greater. It will also alter the relative phase among the mics. To find out how spacing affects SNR, arrays using two to six mics were tested, with mics placed on circles ranging from 5 to 71mm in diameter.
The following graph shows how mic spacing affected performance of a six-mic array, with system SNR shown relative to frequency. The higher the trace is on the chart, the better the system SNR and the better the performance of the voice UI should be.
Results were similar with fewer microphones; in most cases, 71mm spacing delivered the best results. We also tested a three-mic array with even wider mic spacings of up to 320mm, but did not measure a significant improvement when mic spacing increased beyond 80mm.
For more information and specific recommendations
Complete results of our tests can be found in our Optimizing Performance of Microphone Arrays for Voice UI Systems white paper. The paper also includes in-depth testing of the voice UI performance of the Amazon Echo and Google Home smartspeakers, along with specific recommendations and guidelines for voice UI microphone array designs.