An Accurate, Error-Tolerant, and Energy-Efficient Neural Network Inference Engine Based on SONOS Analog Memory
Abstract
We demonstrate SONOS (silicon-oxide-nitride-oxide-silicon) analog memory arrays that are optimized for neural network inference. The devices are fabricated in a 40nm process and operated in the subthreshold regime for in-memory matrix multiplication. Subthreshold operation enables low conductances to be implemented with low error, which matches the typical weight distribution of neural networks, which is heavily skewed toward near-zero values. This leads to high accuracy in the presence of programming errors and process variations. We simulate the end-to-end neural network inference accuracy, accounting for the measured programming error, read noise, and retention loss in a fabricated SONOS array. Evaluated on the ImageNet dataset using ResNet50, the accuracy using a SONOS system is within 2.16% of floating-point accuracy without any retraining. The unique error properties and high On/Off ratio of the SONOS device allow scaling to large arrays without bit slicing, and enable an inference architecture that achieves 20 TOPS/W on ResNet50, a >10× gain in energy efficiency over state-of-the-art digital and analog inference accelerators.
BibTeX
@article{xiao2022sonos,
author = {T. Patrick Xiao and Ben Feinberg and Christopher H. Bennett and Vineet Agrawal and Prashant Saxena and Venkatraman Prabhakar and Krishnaswamy Ramkumar and Harsha Medu and Vijay Raghavan and Ramesh Chettuvetty and Sapan Agarwal and Matthew J. Marinella},
title = {{An Accurate, Error-Tolerant, and Energy-Efficient Neural Network Inference Engine Based on SONOS Analog Memory}},
journal = {IEEE Transactions on Circuits and Systems I: Regular Papers},
year = {2022},
volume = {69},
number = {4},
pages = {1480--1493},
doi = {10.1109/TCSI.2021.3134313}
}