δ-mem adds compact online memory to frozen LLMs without retraining

Researchers propose δ-mem, a memory mechanism that attaches a small associative-memory state to an unmodified, frozen LLM backbone. The state is just an 8×8 matrix, updated via a delta-rule learning step as new information arrives, and its readout produces low-rank corrections to the model’s attention during generation. The approach sidesteps the usual options for giving models long-term recall: expanding the context window, swapping in a different architecture, or fine-tuning the base model.

On benchmarks, the technique lifts average scores to 1.10× the frozen backbone and 1.15× the next-best memory baseline. Gains widen on memory-intensive evaluations, reaching 1.31× on MemoryAgentBench and 1.20× on LoCoMo, while general capabilities reportedly remain intact. The headline result is that a very small online state, coupled directly into attention, can deliver meaningful recall improvements at a fraction of the cost of larger context windows.

For agent and long-running assistant use cases, where history accumulates over many turns, this points to a cheaper path than scaling context length. It also fits a broader trend of bolt-on adapters that augment frozen foundation models rather than retraining them, lowering the barrier to deploying persistent-memory behavior on existing deployments.