Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks

doi:10.1109/JSSC.2016.2616357

Open AccessJournal ArticleDOI

Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks

Yu-Hsin Chen, +3 more

- 01 Jan 2017 -

IEEE Journal of Solid-state Circuits

- Vol. 52, Iss: 1, pp 127-138

TLDR

Eyeriss as mentioned in this paper is an accelerator for state-of-the-art deep convolutional neural networks (CNNs) that optimizes for the energy efficiency of the entire system, including the accelerator chip and off-chip DRAM, by reconfiguring the architecture.

Abstract:

Eyeriss is an accelerator for state-of-the-art deep convolutional neural networks (CNNs). It optimizes for the energy efficiency of the entire system, including the accelerator chip and off-chip DRAM, for various CNN shapes by reconfiguring the architecture. CNNs are widely used in modern AI systems but also bring challenges on throughput and energy efficiency to the underlying hardware. This is because its computation requires a large amount of data, creating significant data movement from on-chip and off-chip that is more energy-consuming than computation. Minimizing data movement energy cost for any CNN shape, therefore, is the key to high throughput and energy efficiency. Eyeriss achieves these goals by using a proposed processing dataflow, called row stationary (RS), on a spatial architecture with 168 processing elements. RS dataflow reconfigures the computation mapping of a given shape, which optimizes energy efficiency by maximally reusing data locally to reduce expensive data movement, such as DRAM accesses. Compression and data gating are also applied to further improve energy efficiency. Eyeriss processes the convolutional layers at 35 frames/s and 0.0029 DRAM access/multiply and accumulation (MAC) for AlexNet at 278 mW (batch size $N = 4$ ), and 0.7 frames/s and 0.0035 DRAM access/MAC for VGG-16 at 236 mW ( $N = 3$ ).

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

An Optimized Hardware Implementation of Deep Learning Inference for Diabetes Prediction

Maruf Hossain Shuvo, +4 more

TL;DR: In this article, the authors presented an implementation of a deep learning inference in Field Programmable Gate Array (FPGA) to predict diabetic Mellitus (DM) and achieved an accuracy of 91.15%.

...read moreread less

Posted Content

Mitigating Edge Machine Learning Inference Bottlenecks: An Empirical Study on Accelerating Google Edge Models.

Amirali Boroumand, +7 more

- 01 Mar 2021 -

arXiv: Hardware Architecture

TL;DR: Mensa as discussed by the authors proposes a new acceleration framework called Mensa, which incorporates multiple heterogeneous ML edge accelerators (including both on-chip and near-data accelerators), each of which caters to the characteristics of a particular subset of models and schedules each layer to run on the best-suited accelerator, accounting for both efficiency and inter-layer dependencies.

...read moreread less

Journal ArticleDOI

An Edge 3D CNN Accelerator for Low-Power Activity Recognition

Ying Wang, +5 more

- 01 May 2021 -

IEEE Transactions on Computer-Aided Desi...

TL;DR: The proposed accelerator architecture is equipped with a redundancy detection and elimination mechanism, capable of skipping the computations with the same activations and parameters when reusing the convolutional filters along the temporal dimension, which contributes to a considerable energy-efficiency boost for state-of-the-art activity-recognition benchmarks and datasets.

...read moreread less

Journal ArticleDOI

Efficient Hardware Architectures for 1D- and MD-LSTM Networks

Vladimir Rybalkin, +5 more

- 01 Nov 2020 -

Journal of Signal Processing Systems

TL;DR: This article presents for the first time a hardware architecture for MD-LSTM, and shows a trade-off analysis for accuracy and hardware cost for various precisions, and presents a new DRAM-PIM architecture for 1D-L STM targeting energy efficient compute platforms such as portable devices.

...read moreread less

Proceedings ArticleDOI

DRACO: Co-Optimizing Hardware Utilization, and Performance of DNNs on Systolic Accelerator

Nandan Kumar Jha, +5 more

TL;DR: This work proposes data reuse aware co-optimization (DRACO), the first work that resolves the resource underutilization challenge at the algorithm level and demonstrates a trade-off between computational efficiency, PE utilization, and predictive performance of DNN.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

Journal ArticleDOI

Deep learning

Yann LeCun, +4 more

- 28 May 2015 -

Nature

TL;DR: Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.

...read moreread less

Journal ArticleDOI

Gradient-based learning applied to document recognition

Yann LeCun, +6 more

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.

...read moreread less

Collapse

Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks

Citations

An Optimized Hardware Implementation of Deep Learning Inference for Diabetes Prediction

Mitigating Edge Machine Learning Inference Bottlenecks: An Empirical Study on Accelerating Google Edge Models.

An Edge 3D CNN Accelerator for Low-Power Activity Recognition

Efficient Hardware Architectures for 1D- and MD-LSTM Networks

DRACO: Co-Optimizing Hardware Utilization, and Performance of DNNs on Systolic Accelerator

References

Deep Residual Learning for Image Recognition

ImageNet Classification with Deep Convolutional Neural Networks

Very Deep Convolutional Networks for Large-Scale Image Recognition

Deep learning

Gradient-based learning applied to document recognition

Related Papers (5)

Deep Residual Learning for Image Recognition

ImageNet Classification with Deep Convolutional Neural Networks

In-Datacenter Performance Analysis of a Tensor Processing Unit

Very Deep Convolutional Networks for Large-Scale Image Recognition

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding