MoF-LoRA: Mixture of Low-Rank Fault-Tolerant Experts for RRAM-based In-Memory Computing
Abstract
Resistive random-access memory (RRAM)-based in-memory computing (R-IMC) architectures achieve energy efficiency by avoiding data movement for matrix-vector computation. However, R-IMC suffers from reliability challenges, such as programming errors, conductance drift, and stuck-at faults, which introduce noise and degrade performance. Several approaches have been proposed in the literature to enhance reliability; however, these solutions are limited to a few specific types of non-idealities, and they cannot be applied to large models, pre-trained models, or R-IMC architectures without introducing considerable computation and memory overhead. We propose a design-time solution that uses a mixture of fault-tolerant experts by adapting pre-trained models via low-rank adaptation. We train non-idealities-specific experts, which are merged with the base weights before mapping, ensuring no added cost yet reliable R-IMC inferencing. For a range of vision and language tasks with transformer models, we show that our approach improves inferencing accuracy by up to 76% under a mixture of non-idealities, enabling reliable mapping and inferencing of vision–language models with no additional cost at runtime.
BibTeX
@article{ahmed2026moflora,
author = {Soyed Tuhin Ahmed and Eduardo Ortega and T. Patrick Xiao and Ben Feinberg and Christopher H. Bennett and Matthew J. Marinella and Krishnendu Chakrabarty},
title = {{MoF-LoRA: Mixture of Low-Rank Fault-Tolerant Experts for RRAM-based In-Memory Computing}},
journal = {IEEE Journal on Emerging and Selected Topics in Circuits and Systems},
year = {2026},
pages = {1--1},
doi = {10.1109/JETCAS.2026.3655242}
}