Ramulator: A Fast and Extensible DRAM Simulator
TLDR
This paper presents Ramulator, a fast and cycle-accurate DRAM simulator that is built from the ground up for extensibility, and is able to provide out-of-the-box support for a wide array of DRAM standards.Abstract:
Recently, both industry and academia have proposed many different roadmaps for the future of DRAM. Consequently, there is a growing need for an extensible DRAM simulator, which can be easily modified to judge the merits of today's DRAM standards as well as those of tomorrow. In this paper, we present Ramulator , a fast and cycle-accurate DRAM simulator that is built from the ground up for extensibility. Unlike existing simulators, Ramulator is based on a generalized template for modeling a DRAM system, which is only later infused with the specific details of a DRAM standard. Thanks to such a decoupled and modular design, Ramulator is able to provide out-of-the-box support for a wide array of DRAM standards: DDR3/4, LPDDR3/4, GDDR5, WIO1/2, HBM, as well as some academic proposals (SALP, AL-DRAM, TL-DRAM, RowClone, and SARP). Importantly, Ramulator does not sacrifice simulation speed to gain extensibility: according to our evaluations, Ramulator is 2.5 $\times$ faster than the next fastest simulator. Ramulator is released under the permissive BSD license.read more
Citations
More filters
Proceedings ArticleDOI
Ambit: in-memory accelerator for bulk bitwise operations using commodity DRAM technology
Vivek Seshadri,Donghyuk Lee,Thomas Mullins,Hasan Hassan,Amirali Boroumand,Jeremie S. Kim,Michael Kozuch,Onur Mutlu,Phillip B. Gibbons,Todd C. Mowry +9 more
TL;DR: Ambit is proposed, an Accelerator-in-Memory for bulk bitwise operations that largely exploits existing DRAM structure, and hence incurs low cost on top of commodity DRAM designs (1% of DRAM chip area).
Proceedings ArticleDOI
Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks
Amirali Boroumand,Saugata Ghose,Youngsok Kim,Rachata Ausavarungnirun,Eric Shiu,Rahul Thakur,Dae Hyun Kim,Aki Kuusela,Allan Knies,Parthasarathy Ranganathan,Onur Mutlu +10 more
TL;DR: This work comprehensively analyzes the energy and performance impact of data movement for several widely-used Google consumer workloads, and finds that processing-in-memory (PIM) can significantly reduceData movement for all of these workloads by performing part of the computation close to memory.
Journal ArticleDOI
Error Characterization, Mitigation, and Recovery in Flash-Memory-Based Solid-State Drives
TL;DR: In this article, the authors provide rigorous experimental data from state-of-the-art MLC and TLC NAND flash devices on various types of flash memory errors, to motivate the need for such techniques.
Journal ArticleDOI
CACTI 7: New Tools for Interconnect Exploration in Innovative Off-Chip Memories
TL;DR: A tool is designed that carefully models I/O power in the memory system, explores the design space, and gives the user the ability to define new types of memory interconnects/topologies, and a new relay-on-board chip that partitions a DDR channel into multiple cascaded channels is introduced.
Proceedings ArticleDOI
Low-Cost Inter-Linked Subarrays (LISA): Enabling fast inter-subarray data movement in DRAM
TL;DR: A new DRAM substrate, Low-Cost Inter-Linked Subarrays (LISA), whose goal is to enable fast and efficient data movement across a large range of memory at low cost, and whose combined benefit is higher than the benefit of each alone, on a variety of workloads and system configurations.
References
More filters
Journal ArticleDOI
The gem5 simulator
Nathan Binkert,Bradford M. Beckmann,Gabriel Black,Steven K. Reinhardt,Ali G. Saidi,Arkaprava Basu,Joel Hestness,Derek R. Hower,Tushar Krishna,Somayeh Sardashti,Rathijit Sen,Korey Sewell,Muhammad Shoaib,Nilay Vaish,Mark D. Hill,Darien Wood +15 more
TL;DR: The high level of collaboration on the gem5 project, combined with the previous success of the component parts and a liberal BSD-like license, make gem5 a valuable full-system simulation tool.
Proceedings ArticleDOI
Analyzing CUDA workloads using a detailed GPU simulator
TL;DR: In this paper, the performance of non-graphics applications written in NVIDIA's CUDA programming model is evaluated on a microarchitecture performance simulator that runs NVIDIA's parallel thread execution (PTX) virtual instruction set.
Proceedings ArticleDOI
Memory access scheduling
TL;DR: This paper introduces memory access scheduling, a technique that improves the performance of a memory system by reordering memory references to exploit locality within the 3-D memory structure.
Journal ArticleDOI
DRAMSim2: A Cycle Accurate Memory System Simulator
TL;DR: The process of validating DRAMSim2 timing against manufacturer Verilog models in an effort to prove the accuracy of simulation results is described.
Journal ArticleDOI
RAIDR: Retention-Aware Intelligent DRAM Refresh
TL;DR: This paper proposes RAIDR (Retention-Aware Intelligent DRAM Refresh), a low-cost mechanism that can identify and skip unnecessary refreshes using knowledge of cell retention times and group DRAM rows into retention time bins and apply a different refresh rate to each bin.