Resistive RAM Endurance Array-Level Characterization and Correction Techniques Targeting Deep Learning Applications
Abstract
Limited endurance of resistive RAM (RRAM) is a major challenge for future computing systems. Using thorough endurance tests that incorporate fine-grained read operations at the array level, we quantify for the first time temporary write failures (TWFs) caused by intrinsic RRAM cycle-to-cycle and cell-to-cell variations. We also quantify permanent write failures (PWFs) caused by irreversible breakdown/dissolution of the conductive filament. We show how technology-, RRAM programing-, and system resilience-level solutions can be effectively combined to design new generations of energy-efficient computing systems that can successfully run deep learning (and other machine learning) applications despite TWFs and PWFs. We analyze corresponding system lifetimes and TWF bit error ratio.