Making Memristive Neural Network Accelerators Reliable

Abstract

Honorable Mention for IEEE MICRO Top Picks.

Deep neural networks (DNNs) have attracted substantial interest in recent years due to their superior performance on many classification and regression tasks as compared to other supervised learning models. DNNs often require a large amount of data movement, resulting in performance and energy overheads. One promising way to address this problem is to design an accelerator based on in-situ analog computing that leverages the fundamental electrical properties of memristive circuits to perform matrix-vector multiplication. Recent work on analog neural network accelerators has shown great potential in improving both the system performance and the energy efficiency. However, detecting and correcting the errors that occur during in-memory analog computation remains largely unexplored. The same electrical properties that provide the performance and energy improvements make these systems especially susceptible to errors, which can severely hurt the accuracy of the neural network accelerators.

This paper examines a new error correction scheme for analog neural network accelerators based on arithmetic codes. The proposed scheme encodes the data through multiplication by an integer, which preserves addition operations through the distributive property. Error detection and correction are performed through a modulus operation and a correction table lookup. This basic scheme is further improved by data-aware encoding to exploit the state dependence of the errors, and by knowledge of how critical each portion of the computation is to overall system accuracy. By leveraging the observation that a physical row that contains fewer 1s is less susceptible to an error, the proposed scheme increases the effective error correction capability with less than 4.5% area and less than 4.7% energy overheads. When applied to a memristive DNN accelerator performing inference on the MNIST and ILSVRC-2012 datasets, the proposed technique reduces the respective misclassification rates by 1.5x and 1.1x.

Publication
In International Symposium on High-Performance Computer Architecture (HPCA) 2018
Date