scispace - formally typeset
Open AccessJournal ArticleDOI

Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks

TLDR
Eyeriss as mentioned in this paper is an accelerator for state-of-the-art deep convolutional neural networks (CNNs) that optimizes for the energy efficiency of the entire system, including the accelerator chip and off-chip DRAM, by reconfiguring the architecture.
Abstract
Eyeriss is an accelerator for state-of-the-art deep convolutional neural networks (CNNs). It optimizes for the energy efficiency of the entire system, including the accelerator chip and off-chip DRAM, for various CNN shapes by reconfiguring the architecture. CNNs are widely used in modern AI systems but also bring challenges on throughput and energy efficiency to the underlying hardware. This is because its computation requires a large amount of data, creating significant data movement from on-chip and off-chip that is more energy-consuming than computation. Minimizing data movement energy cost for any CNN shape, therefore, is the key to high throughput and energy efficiency. Eyeriss achieves these goals by using a proposed processing dataflow, called row stationary (RS), on a spatial architecture with 168 processing elements. RS dataflow reconfigures the computation mapping of a given shape, which optimizes energy efficiency by maximally reusing data locally to reduce expensive data movement, such as DRAM accesses. Compression and data gating are also applied to further improve energy efficiency. Eyeriss processes the convolutional layers at 35 frames/s and 0.0029 DRAM access/multiply and accumulation (MAC) for AlexNet at 278 mW (batch size $N = 4$ ), and 0.7 frames/s and 0.0035 DRAM access/MAC for VGG-16 at 236 mW ( $N = 3$ ).

read more

Citations
More filters
Proceedings ArticleDOI

NNSim: A Fast and Accurate SystemC/TLM Simulator for Deep Convolutional Neural Network Accelerators

TL;DR: A virtual platform for DCNN accelerator design with Compared to RTL implementation, the proposed model has a worst-case error of 97–99%, with 3000–13000 x simulation speedup.
Journal ArticleDOI

Dynamic Dataflow Scheduling and Computation Mapping Techniques for Efficient Depthwise Separable Convolution Acceleration

TL;DR: In this paper, the authors proposed two efficient dynamic design techniques, i.e., adaptive row-based dataflow scheduling and adaptive computation mapping, to achieve a much better trade-off between hardware efficiency and performance for DSC-based lightweight CNN accelerator.
Journal ArticleDOI

A Real-Time 17-Scale Object Detection Accelerator With Adaptive 2000-Stage Classification in 65 nm CMOS

TL;DR: An energy-efficient programmable ASIC accelerator for object detection with 2,000 classifiers for rigid boosted templates that can be implemented with a more modular hardware, compared to support vector machine (SVM) and deformable parts model (DPM) designs.
Proceedings ArticleDOI

DIAN: differentiable accelerator-network co-search towards maximal DNN efficiency

TL;DR: DIAN as discussed by the authors is a differentiable accelerator-network co-search framework for automatically searching for matched networks and accelerators to maximize both the accuracy and efficiency, which is applicable to both FPGA-and ASIC-based DNN accelerators.
Journal ArticleDOI

An Efficient FIFO Based Accelerator for Convolutional Neural Networks

TL;DR: An improved architecture to process the convolution layers in a CNN that takes advantage of sparsity in CNN layer’s inputs and outputs to achieve performance improvement and is able to exceed the performance of state-of-the-art architectures.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Journal ArticleDOI

Deep learning

TL;DR: Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.
Journal ArticleDOI

Gradient-based learning applied to document recognition

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
Related Papers (5)