Crossbar memory systems for machine learning: circuits, architecture, and evaluation frameworks
TimeMonday, December 6th1:30pm - 5:00pm PST
DescriptionTraditional computing systems based on the von Neumann architecture are fundamentally bottlenecked by data transfers between processors and memory. The emergence of data-intensive workloads, such as machine learning (ML), creates an urgent need to address this bottleneck by designing computing platforms that utilize the principle of collocated memory and processing units. Such an approach, known as “in-memory computing,” can potentially eliminate data movement costs by computing inside the memory array itself. Crossbars based on resistive nonvolatile memory (NVM) devices have shown immense promise in serving as the building blocks of in-memory computing systems for ML workloads. This is because their high density can lead to higher on-chip storage capacity, while they can also perform massively parallel, in situ matrix–vector multiplication (MVM) operations, thereby accelerating the main computational kernel of ML workloads. However, resistive crossbar-based analog computing is inherently approximate due to the device- and circuit-level nonidealities. Such non-idealities can significantly degrade the algorithmic efficacy of large-scale deep neural networks (DNNs) when mapped on NVM crossbars. Furthermore, the area and energy costs of peripheral circuits for conversions between the analog and digital domains can greatly diminish the intrinsic efficiency of crossbar-based MVM computation.
In this tutorial, we present a comprehensive overview of the emerging paradigm of computing using NVM crossbars for accelerating ML workloads. We describe the design principles of resistive crossbars, including the devices and associated circuits that constitute them. We discuss test-chip data illustrating device challenges as well as circuits and peripherals that address these challenges. We further discuss intrinsic approximations arising from the device and circuit characteristics and study their functional impact on the MVM operation. Next, we present an overview of spatial architectures that exploit the high storage density of NVM crossbars. We will discuss how compilers can be designed for memory mapping that address device variations as well as reduce the latency gap between arrays. Furthermore, we elaborate on evaluation frameworks to perform performance and functional simulation for large-scale DNNs mapped on NVM crossbars. The performance simulation framework evaluates hardware metrics such as energy, latency and area considering aforementioned spatial architectures. The functional simulation framework effectively captures device–circuit–architecture characteristics to evaluate the algorithmic performance of large-scale DNNs using resistive crossbar-based hardware. Finally, we discuss open challenges and future research directions that need to be explored in order to realize the vision of resistive crossbars as the building blocks of future computing platforms.