FIND-20260323-035 · 2026-03-23 · Innovation Veille

EverMind MSA — Memory Sparse Attention for 100M-Token Context Windows

framework MEDIUM
MSA (Memory Sparse Attention) is a research framework published by EverMind-AI on March 19, 2026 that enables large language models to process up to 100 million tokens in a single context window with near-linear compute complexity. It achieves this through four mechanisms: (1) content-based sparse attention that dynamically selects the most relevant memory subsets, (2) document-wise RoPE that decouples per-document position from global memory position to prevent drift, (3) KV cache compression with a Memory Parallel inference engine that achieves 100M-token throughput on just two A800 GPUs, and (4) a Memory Interleave mechanism for multi-hop reasoning across scattered memory segments. The model is trained on a Qwen3-4B backbone with 158.95B-token continuous pretraining and reaches 94.84% accuracy at 1M tokens on RULER NIAH tasks, surpassing same-backbone RAG stacks at a fraction of the context overhead. The repo has 1,972 stars four days after launch, no declared dependencies (research artefact — paper + assets only), and is MIT-licensed.

Source

https://github.com/EverMind-AI/MSA

ODS Impact

Direct application to the ADLC pipeline and any ODS service that performs document understanding at scale. The ADLC orchestrator currently loads full spec files and history into context — an MSA-style compressed KV store could allow the BA, architect, and security agents to reason over entire project histories without context-window truncation. For ODS DocStore and DocEditor, MSA-style long-context retrieval could replace the planned RAG pipeline for large document corpora (legal contracts, CLM datasets), removing the chunking and embedding infrastructure entirely. The Memory Interleave multi-hop mechanism is directly relevant to Workflow Engine reasoning chains that span many documents. No code integration is possible today (research artefact only — no installable package), but the paper is worth tracking for when EverMind publishes inference weights or an SDK.

Security Review

License: MIT | Maintenance: ACTIVE | Risk: LOW | Recommendation: USE_WITH_CAUTION

Tags

ai llm long-context attention research rag-alternative adlc docstore