Smart OCR Solution Using Xilinx Ultrascale+ and Vitis AI

October 05, 2020

Story

Smart OCR Solution Using Xilinx Ultrascale+ and Vitis AI

Automatic text reading from natural environments, also known as scene text detection/recognition or PhotoOCR, has become an increasingly popular and an important research topic in computer vision.

The text is among the most brilliant and influential creations of humankind. The rich, precise high-level semantics embodied in the text helps understand the world around us and build autonomous-capable solutions that can be deployed in a live environment. Therefore, automatic text reading from natural environments, also known as scene text detection/recognition or PhotoOCR, has become an increasingly popular and an important research topic in computer vision.

As the written form of human languages evolved, we developed thousands of unique font-families. When we add case (capitals/lower case/uni-case/small caps), skew (italic/roman), proportion (horizontal scale), weight, size-specific (display/text), swash, and serifization (serif/sans in super-families), the number grows in millions, and it makes text identification an exciting discipline for Machine Learning.

Xilinx as a choice for OCR solutions

Today, Xilinx powers 7 out of 10 new developments through its wide variety of powerful platforms and leads the FPGA-based system design trends. Softnautics chose Xilinx for implementing this solution because of the integrated Vitis™ AI stack and strong hardware capabilities.

Xilinx Vitis™ is a free and open-source development platform that packages hardware modules as software-callable functions and is compatible with standard development environments, tools, and open-source libraries. It automatically adapts software and algorithms to Xilinx hardware without the need for VHDL or Verilog expertise.

Selecting the right Xilinx Platform

The comprehensive and rich Xilinx toolset and ecosystem make prototyping a very predictable process and expedites the development of the solutions to reduce overall development time by up to 70%.

Xilinx Ultrascale+ platform as it offers the best of application processing and FPGA acceleration capabilities. It also provides impressive high-level synthesis capability resulting in 5x system-level performance per watt compared to earlier variants. It supports Xilinx Vitis AI that offers a wide range of capabilities to build AI inferencing using acceleration libraries.

Xilinx Vitis AI stack and acceleration utilizing the software to create a hybrid application and implemented LSTM functionality for effective sequence prediction by porting/migrating TensorFlow-lite to ARM. It is running on Processing Side (PS) using the N2Cube Software. Image pre- and post-processing was achieved using HLS through Vivado, and Vitis was used for inferencing using CTPN (Connectionist Text Proposal Network). We eventually graduated the solution to real-time scene text detection with video pipeline and improved the model with a robust dataset.

Scene Text Detection

There are many implementations available, and new ones are being researched. Still, a series of grand challenges may still be encountered when detecting and recognizing text in the wild. The difficulties in natural scene mainly stem from three differences when compared to scripts in documents:

Diversity and Variability are arising from languages, colors, fonts, sizes, orientations, etc.
Vibrant background on which text is written
The aspect ratios and layouts of scene text may vary significantly

This type of solution has extensive applicability in various fields requiring real-time text detection on a video stream with higher accuracy and quick recognition. Few of these application areas are:

Parking validation — Cities and towns are using mobile OCR to validate if cars are parked according to city regulations automatically. Parking inspectors can use a mobile device with OCR to scan license plates of vehicles and check with an online database to see if they are permitted to park.

Mobile document scanning — A variety of mobile applications allow users to take a photo of a document and convert it to text. This OCR task is more challenging than traditional document scanners because photos have unpredictable image angles, lighting conditions, and text quality.

Digital asset management - The software helps organize rich media assets such as images, videos, and animations. A key aspect of DAM systems is the search-ability of rich media. By running OCR on uploaded images and video frames, DAM can make rich media searchable and enrich it with meaningful tags.

Softnautics team has been working on Xilinx FPGA based solutions that require design and software framework implementation. Our vast experience with Xilinx and understanding of intricacies ensured we took this solution from conceptualization to proof-of-concept within 4 weeks. Using our end-to-end solution building expertise, you can visualize your ideas with the fastest concept realization service on Xilinx Platforms and achieve greatly reduced time-to-market.

Get in touch with Softnautics to explore and build your next accelerated AI solution. Read our success stories.

About Author: Prasant Agarwal

Prasant is Marketing Director at Softnautics. He has 15+ years of experience in developing cutting-edge multimedia and connectivity products for STMicroelectronics, Samsung, and Solarflare Communications (Now Xilinx) and led corporate rebranding for Persistent Systems. Leveraging his technology domain and experience, he is now focusing on enabling technology buyers to make the right business choices by bridging business challenges and best-fit technology solutions.

Embedded Computing Design

Smart OCR Solution Using Xilinx Ultrascale+ and Vitis AI

Automatic text reading from natural environments, also known as scene text detection/recognition or PhotoOCR, has become an increasingly popular and an important research topic in computer vision.

Categories

AI & Machine Learning - Predictive Maintenance

AI & Machine Learning - AI Development Tools & Frameworks

AI & Machine Learning - AI Logic Devices & Workload Acceleration

Trending Articles

The Evolution of Processor Cores, and Embedded World 2024

Back to Basics: Innovation is More than Marketing

SYSGO Supports RISC-V with its Embedded Linux ELinOS Version 7.2

RTOS Functional Safety Certification – Table Stakes or not?

At embedded world, CEVA Accelerates Innovative Connectivity in MCUs and SOCs for IoT and Smart Edge AI Applications

Debug & Test

Embedded Testing Vs Software Testing – Key Differences

Storage

Embedded World 2024: High-Endurance, Robust Cross-Temp Reliability 176-Layer Storage, DDR5-5600 Solutions Take Center Stage at ATP Electronics’ Exhibit

Networking & 5G

Fibocom’s 5G Premium Smart Module SC171 Awarded Best in Show by Embedded Computing Design at Embedded World 2024

Open Source

Semidynamics Drops its All-In-One AI IP On Us