NVIDIA Enables Era of Interactive Conversational AI with New Inference Software

January 2, 2020 Tiera Oliver

NVIDIA, a technology company that designs graphics processing units for gaming and professional markets, and system on a chip units for the mobile computing and automotive market, introduced inference software that developers can use to deliver conversational AI applications, inference latency, and interactive engagement.

NVIDIA TensorRT 7, according to the company, opens the door to smarter human-to-AI interactions, enabling real-time engagement with applications such as voice agents, chatbots and recommendation engines.

It is also estimated that there are 3.25 billion digital voice assistants being used in devices around the world, according to Juniper Research. By 2023, that number is expected to reach 8 billion, more than the world’s total population.

TensorRT 7 features a new deep learning compiler designed to optimize and accelerate the recurrent and transformer-based neural networks needed for AI speech applications. According to the company, this speeds the components of conversational AI by more than 10x compared to when run on CPUs, driving latency below the 300-millisecond threshold considered necessary for real-time interactions.

Some companies are already taking advantage of NVIDIA’s conversational AI acceleration capabilities. Among these is Sogou, which provides search services to WeChat, a frequently used application on mobile phones.

Rising Importance of Recurrent Neural Networks
With TensorRT’s new deep learning compiler, developers everywhere now have the ability to automatically optimize these networks, such as bespoke automatic speech recognition networks, and WaveRNN and Tacotron 2 for text-to-speech, and to deliver performance and low latencies. 

The new compiler also optimizes transformer-based models like BERT for natural language processing.

Accelerating Inference from Edge to Cloud
According to NVIDIA, TensorRT 7 can optimize, validate and deploy a trained neural network for inference by hyperscale data centers, embedded or automotive GPU platforms.

NVIDIA’s inference platform, which includes TensorRT, as well as several NVIDIA CUDA-X AI libraries and NVIDIA GPUs, delivers low-latency, high-throughput inference for applications beyond conversational AI, including image classification, fraud detection, segmentation, object detection and recommendation engines. Its capabilities are used by some of the world’s leading enterprise and consumer technology companies, including Alibaba, American Express, Baidu, PayPal, Pinterest, Snap, Tencent and Twitter.

TensorRT 7 will be available in the coming days for development and deployment, without charge to members of the NVIDIA Developer program from the TensorRT webpage. The latest versions of plug-ins, parsers and samples are also available as open source from the TensorRT GitHub repository.

For more information, please visit: https://www.nvidia.com/en-us/#source=pr

About the Author

Tiera Oliver, edtorial intern for Embedded Computing Design, is responsible for web content edits as well as newsletter updates. She also assists in news content as far as constructing and editing stories. Before interning for ECD, Tiera had recently graduated from Northern Arizona University where she received her B.A. in journalism and political science and worked as a news reporter for the university's student led newspaper, The Lumberjack.

Follow on Twitter Follow on Linkedin Visit Website More Content by Tiera Oliver
Previous Article
Industrial Power on Module from AAEON
Industrial Power on Module from AAEON

AAEON, an embedded computing company, released the COM-WHUC6 COM Express module.

Next Article
Three Most Valuable Benefits that Encourage Companies to Adopt Serverless Architecture
Three Most Valuable Benefits that Encourage Companies to Adopt Serverless Architecture

According to Flexera’s RightScale 2019 State of the Cloud Report, serverless computing, for the second year...