12 十一月 2020
Posted by Oli Gaymond, Product Manager Android Machine Learning
On-Device Machine Learning enables cutting edge features to run locally without transmitting data to a server. Processing the data on-device enables lower latency, can improve privacy and allows features to work without connectivity. Achieving the best performance and power efficiency requires taking advantage of all available hardware.
The Android Neural Networks API (NNAPI) is designed for running computationally intensive operations for machine learning on Android devices. It provides a single set of APIs to benefit from available hardware accelerators including GPUs, DSPs and NPUs.
In Android 11, we released Neural Networks API 1.3 adding support for Quality of Service APIs, Memory Domains and expanded quantization support. This release builds on the comprehensive support for over 100 operations, floating point and quantized data types and hardware implementations from partners across the Android ecosystem.
Hardware acceleration is particularly beneficial for always-on, real-time models such as on-device computer vision or audio enhancement. These models tend to be compute-intensive, latency-sensitive and power-hungry. One such use case is in segmenting the user from the background in video calls. Facebook is now testing NNAPI within the Messenger application to enable the immersive 360 backgrounds feature. Utilising NNAPI, Facebook saw a 2x speedup and 2x reduction in power requirements. This is in addition to offloading work from the CPU, allowing it to perform other critical tasks.
NNAPI can be accessed directly via an Android C API or via higher level frameworks such as TensorFlow Lite. Today, PyTorch Mobile announced a new prototype feature supporting NNAPI that enables developers to use hardware accelerated inference with the PyTorch framework.
Today’s initial release includes support for well-known linear convolutional and multilayer perceptron models on Android 10 and above. Performance testing using the MobileNetV2 model shows up to a 10x speedup compared to single-threaded CPU. As part of the development towards a full stable release, future updates will include support for additional operators and model architectures including Mask R-CNN, a popular object detection and instance segmentation model.
We would like to thank the PyTorch Mobile team at Facebook for their partnership and commitment to bringing accelerated neural networks to millions of Android users.