The Internet of Things is being humanized as electronic device manufacturers develop more products based on human sensory input. The “voice-first revolution” is fully underway as tech companies continue to improve the audio signal processing chain in connected objects. With advances in computer vision, engineers now want to give these intelligent devices eyes.
Capturing raw images and video data is nothing new, but giving cameras enough local intelligence to do something with that imaging data is. Because camera-based systems are often compact, perhaps mobile, and almost always restricted in terms of size, weight, power consumption, and cost, enabling this intelligence starts with optimized hardware.
Three Axes of Advanced Vision
The Mali-C32 and -C52 cores were designed to improve frame rate, pixel area, and pixel resolution for intelligent vision systems in applications such as access control and surveillance, car dashboard cameras, high-end drones, and robotic personal assistants. To achieve greater frame rates and larger pixel areas, Arm implemented a new pipeline design in the ISPs capable of processing 600 million pixels per second (essentially DSLR quality), which enables a maximum frame size of 16 MP and 4K resolution at up to 60 frames per second (fps).
To improve pixel resolution, the Mali ISPs also integrate more than 25 image processing algorithms in hardware that yield better dynamic range, noise reduction, and color management. These algorithms include:
- High Dynamic Range (HDR) Precision Management – Arm Iridix, Mesh Shading, Radial Shading
- Color Management – White Balance, Color Noise Reduction, 3D Color Enhancement
- RAW Noise Reduction – Arm Sinter, Arm Temper, Chromatic Aberration Correction
Higher Dynamic Range, More Colors, Less Noise
With respect to dynamic range (or the ability of a vision system to effectively render contrast in an image), Thomas Ensergueix, Senior Director of Embedded at Arm explains the traditional challenges in a recent blog.
“To faithfully represent [an] HDR scene, image technology requires 20 or 24 bits of precision per pixel, but digital system displays are typically 8-bit or 10-bit, thus limiting the amount of data they can handle,” Ensergueix writes. “This also means that any computer vision engine post the ISP is bounded to 8 to 10-bit precision.
“In this scenario, the ISP needs to process HDR data (at higher bit depth), then compress it and utilize it further. Ultimately, if the dynamic range is not managed properly, then the details in the shadows are lost,” he adds.
As you can see in the histogram comparison below, the dynamic range compression helps balance an image, but in doing so eliminates detail from lighter portions of the exposure.
For a better visual, before and after compression looks like this:
To overcome the lossiness associated with dynamic range compression, the Mali ISPs leverage dynamic range management and tone mapping algorithms like Arm Iridix version 8. Iridix technology is based on research into how the human eye responds to dynamically lit environments, and processes each pixel in real-time during compression to deliver maximum resolution.
Going back to the dynamic comparison histogram benchmark, local tone mapping (or contrast adaptation) using Arm Iridix generates an image that is closer to the perception of human eyes.
And the visual result:
While the image above produced “with Arm technology” makes heavy use of Iridix, innovations in noise reduction and color management have also been applied. These assist with “clipping” that occurs when the light captured by an image sensor exceeds the sensor’s range and distorts the resulting image.
The -C52 address clipping with a 9 x 9 x 9 3D RGB lookup table (LUT), which performs real-time one-to-one pixel mapping to restore color to washed out areas of an image. Both devices also equip Arm’s Sinter and Temper algorithms, which provide 2D spatial and motion-compensated temporal noise reduction, respectively.
These technologies improve RAW image quality and lay the groundwork for more advanced post processing in neural network-based computer vision applications.
The ISPs target 16 nm process technology, and are delivered with a software package that includes auto-exposure, auto-white balance, and auto-focus libraries, as well as tuning and calibration tools for bare metal or Linux environments.
The Internet of Eyes
The main difference between the two Mali ISPs is a tradeoff between image quality and resource utilization, as the -C52 delivers more advanced color management and noise reduction capabilities while the -C32 is optimized for low-power, cost-sensitive devices.
With such scalable ISP solutions coming to market, the Internet of Everything could soon be the Internet of Eyes.
About the AuthorFollow on Twitter Follow on Linkedin Visit Website More Content by Brandon Lewis