When most of us play word association and get the phrase “user interface”, we say “touchscreen” or some derivative. What ever happened to voice interfaces? There’s news that might cause a rethink of that.
Sensory has come out with their new NLP-5X speech chip, which provides a natural-language interface. I had a great conversation with Todd Mozer, CEO of Sensory, on a bit of history and this new direction.
Sensory’s product line to date has been based on an 8-bit MCU. The NLP-5X is a serious departure, going to a 16-bit DSP engine, but still keeping the low cost point (around $2, 100k units). The goal was to provide more flexibility in recognizing speech. They’re cheating in a sense – this isn’t free form speech recognition, this is looking for context-sensitive phrases such as one would apply to a specific device, like a convection oven – bake, 375, 1 hour, and with order independence and flexibility (like 3-70-5 versus three hundred seventy five).

But that’s just the start. The improved processing speed allows truly hands-free triggers. The NLP-5x can be listening for a phrase, and get the rest of the system to wake up when needed, with nothing but being addressed by a voice. No training is needed to recognize a particular voice. Mozer says they’ve implemented some “math breakthroughs” to avoid false triggering and promote quick, accurate startup.
Mozer and the Sensory team is also looking farther forward at truly enabled SCIDs – speech controlled Internet devices. The NLP-5X can not only recognize speech and sound, it can produce speech and sound, and do text-to-speech (TTS). They are looking at things like Wi-Fi connections into the cloud to do some real heavy-lifting for more complex recognition.
The low cost point is certainly of note, but a faster, more accurate, more flexible, and power and step saving natural voice interface could apply to many embedded devices. This is something to listen to.









