Optimization, Stabilization, and Tuning of BSPs for Production
Introduction
Intrinsyc Technologies Corp. offers a variety of Open-Q System on Modules (SOMs) and Single Board Computers (SBCs) for intelligent computing at the edge of IOT networks. Many of these SOMs and SBCs employ Qualcomm Snapdragon-based processing solutions (SoCs) that are ideal platforms for making low power smart vision systems by collaboratively using FastCV and SNPE. The Fast Computer-vision Library (FastCV) is an optimal computer vision solution for Qualcomm platforms, including an image processing library that can be run on the Snapdragon Hexagon DSP (including use of the HVX module). The Snapdragon Neural Processing Engine (SNPE), from Qualcomm, includes a Deep Neural Network (DNN) library supporting the Hexagon DSP and can translate generic network structures (including Tensorflow, Caffe and ONNX) to a SNPE-compatible format.
Overview
Many Qualcomm Snapdragon-series SoCs contain multiple Hexagon DSPs. These Hexagon DSPs, as a subsystem of the SoC, have the same level of access to peripherals as the Kryo/ARM cores and use a real time operating system from Qualcomm, called QuRT. QuRT provides a POSIX platform for this multi-hardware thread environment and most of the Qualcomm software frameworks (like Elite, FastRPC, etc.) run in the user space of this OS. In this article, we will take a quick look at FastCV and SNPE SDKs, and finish with a simple example of a handwriting digit recognition system.
FastCV
The FastCV SDK is a collection of computer vision algorithms implemented for ARM and optimized for Qualcomm’s Snapdragon processor. You can find this SDK on the Qualcomm Developer Network (QDN) website: https://developer.qualcomm.com/software/fastcv-sdk/.
The libraries currently supported are:
Android 32 bit and 64-bit libraries
IA-32 (x86) Win32 and MS Visual C++ 2010, 2012, and 2013.
IA-32 (x86) Win64 and MS Visual C++ 2012, and 2013.
To take advantage of FastCV algorithms implemented for Qualcomm’s Snapdragon processor, APIs should be called as part of the initialization and de-initialization processes. For the initialization process, the below API should be called:
FASTCV_API int fcvSetOperationMode( fcvOperationMode mode )
A suitable Operation Mode option should be selected based on the application goal.
Below are the available fcvOperationMode options:
Operation mode
| Description |
FASTCV_OP_LOW_POWER | The QDSP implementation will be used unless the QDSP speed is 3 times slower than CPU speed. |
FASTCV_OP_PERFORMANCE | The fastest implementation will be used. |
FASTCV_OP_CPU_OFFLOAD | The QDSP implementation will be used when it’s available, otherwise it will find for GPU and CPU implementation. |
FASTCV_OP_CPU_PERFORMANCE | The CPU fastest implementation will be used. |
Here is the link to a complete list of FastCV APIs:
https://developer.qualcomm.com/docs/fastcv/api/index.html
SNPE
The SNPE SDK is provided by Qualcomm and contains tools and examples on how to convert and deploy DNNs on Qualcomm’s Snapdragon processor.
More information about this SDK is available on QDN:
https://developer.qualcomm.com/docs/snpe/model_conv_tensorflow.html
The “snpe-sample” is an example that can be used for loading a DLC (network model file) and testing it. As per standard SNPE capabilities, you can run a DNN on the Snapdragon’s ARM, GPU or DSP cores. Depending on which language your target application is based on, you can refer to related example SNPE solutions in the SDK.
Example
This example is based on a previous article about DNN, which you can find on Intrinsyc’s website: https://www.intrinsyc.com/artificial-neural-networks-ann-on-snapdragon-based-edge-devices/. In that article we went through the steps on how to create an ONNX network for a handwritten digit recognition system, using Matlab. The goal of this example is to use that network and create an Android application for this system using the SNPE and FastCV SDKs.
Figure 1 shows a block diagram for this example.
Figure 1 - Block Diagram of System
In the first block (screen capture), we use the canvas in Android to create a bitmap screen with painting brush:
drawPath = new Path();
drawPaint = new Paint();
drawPaint.setColor(Color.WHITE);
drawPaint.setAntiAlias(true);
drawPaint.setStrokeWidth(20);
drawPaint.setStyle(Paint.Style.STROKE);
drawPaint.setStrokeJoin(Paint.Join.ROUND);
drawPaint.setStrokeCap(Paint.Cap.ROUND);
canvasPaint = new Paint(Paint.DITHER_FLAG);
After capturing the bitmap buffer, we pass it to the resizing module (erosion is optional) to convert it to the size of 28x28 image for our DNN.
short[] tout = resizeImage(pixelsBatched.array(), image.getWidth(), image.getHeight());
The resizeImage function is a Java Native Interface (JNI) function that calls FastCV functions:
fcvScaleDownBLu8(pJimgData, w, h, 0, pJimgDataOut, 28, 28, 0);
And finally, pass the resulting image to our DNN network. To do that we need to load the model first:
File modelFile = new File(networkModel);
builder.setDebugEnabled(false);
builder.setCpuFallbackEnabled(true);
builder.setUseUserSuppliedBuffers(false);
builder.setRuntimeOrder(NeuralNetwork.Runtime.DSP);
And then inject the input to the network:
tensor.write(rgbBitmapAsFloat, 0, rgbBitmapAsFloat.length);
inputs.put(mInputLayer, tensor);
final Map<String, FloatTensor> outputs = network.execute(inputs);
And finally, the result will be saved in the Map structure.
Figure 2 - Example Android Application
This example was tested using Intrinsyc's Open-Q 820 development kit running Android 8 release v4.0 BSP software. To compare the result on different types of processing cores, you can use the fcvOperationMode API for different modes on FastCV and also different runtime processing for networks, by changing the NeuralNetwork.Runtime property.
Example results we measured for some of the runtime options in this example are as follows:
SNPE on CPU and FastCV on Performance mode ~ 105 ms
SNPE on DSP and FastCV on LOW_POWER mode ~ 75 ms
SNPE on GPU_FLOAT16 and FastCV on GPU ~ 84 ms
Summary
As this article describes, heterogenous Snapdragon SOC’s are a suitable platform for many systems that target low-power and highly efficient intelligent system operation. Using FastCV in conjunction with SNPE lets the application take advantage of Hexagon DSP subsystems for lower power consumption and, in many cases, better performance. Source code and the package for this example is available from Intrinsyc.
Intrinsyc’s software engineers can help you to design and develop many kinds of neural networks, AI and signal processing systems on different series of Qualcomm Snapdragon processors. It is possible to customize FastCV, train a model, set up DNNs, and port the solution to different DSPs on these processors (audio DSP, modem DSP, compute DSP and sensor low-power islands). Intrinsyc has the expertise and tools for designing and developing many kinds of neural networking and AI systems on Qualcomm Snapdragon processors. Contact us for more information at [email protected]
Author
Shahrad Payandeh is an embedded software engineer at Intrinsyc Technologies. He has been working as an embedded engineer for almost 15 years, with more than 6 years’ experience on DSP development for audio/voice and video processing. He has worked on different SOCs from different vendors and has experience on Linux, Android, QNX, QuRT and GHS OS on embedded platforms, from device driver to HLOS level. In the last 3 years he has been working extensively on Qualcomm’s Hexagon DSP platforms.