Enabling machine learning on the edge using SRAM conserving efficient neural networks execution approach

Sudharsan, Bharath; Patel, Pankesh; Breslin, John G.; Ali, Muhammad Intizar

View/Open

ECML_21_Efficient_Execution_of_Neural_Networks.pdf (1.027Mb)

Date

2021-09-13

Author

Sudharsan, Bharath

Patel, Pankesh

Breslin, John G.

Ali, Muhammad Intizar

Metadata

Show full item record

Usage

This item's downloads: 77 (view details)

Recommended Citation

Sudharsan, Bharath, Patel, Pankesh, Breslin, John G., & Ali, Muhammad Intizar. (2021). Enabling machine learning on the edge using SRAM conserving efficient neural networks execution approach. Paper presented at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), Bilbao, Spain, Virtual, 13-17 September. DOI: 10.13025/azew-5w09

Published Version

https://doi.org/10.13025/azew-5w09

Abstract

Edge analytics refers to the application of data analytics and Machine Learning (ML) algorithms on IoT devices. The concept of edge analytics is gaining popularity due to its ability to perform AI-based analytics at the device level, enabling autonomous decision-making, without depending on the cloud. However, the majority of Internet of Things (IoT) devices are embedded systems with a low-cost microcontroller unit (MCU) or a small CPU as its brain, which often are incapable of handling complex ML algorithms. In this paper, we propose an approach for the efficient execution of already deeply compressed, large neural networks (NNs) on tiny IoT devices. After optimizing NNs using state-of-the-art deep model compression methods, when the resultant models are executed by MCUs or small CPUs using the model execution sequence produced by our approach, higher levels of conserved SRAM can be achieved. During the evaluation for nine popular models, when comparing the default NN execution sequence with the sequence produced by our approach, we found that 1.61-38.06% less SRAM was used to produce inference results, the inference time was reduced by 0.28-4.9 ms, and energy consumption was reduced by 4-84 mJ. Despite achieving such high conserved levels of SRAM, our meth

URI

http://hdl.handle.net/10379/16827

Collections

Data Science Institute (Conference Papers)

Except where otherwise noted, this item's license is described as Attribution 4.0 International (CC BY 4.0)