Mamba 3 анонимно проникает на ICLR 2026. Планирую разбор после TRM.… — @gonzo_ML

Mamba 3 анонимно проникает на ICLR 2026. Планирую разбор после TRM. https://openreview.net/forum?id=HwCvaJOiCj Mamba3 just silently dropped on ICLR🤯 A faster, longer-context, and more scalable LLM architecture than Transformers A few years ago, some researchers started rethinking sequence modeling from a different angle. Instead of stacking more attention layers, they went back to an older idea : state-space models, systems that keep an internal state evolving over time. That became the foundation for Mamba. The early versions were promising. Mamba-1 used continuous-time dynamics with selective memory updates, so it could remember efficiently without the heavy cost of attention. Mamba-2 went further and showed that state-space updates and attention are actually two sides of the same math, which made it run much faster on GPUs while keeping similar performance. Now Mamba-3 feels like the design finally matured. It refines how the internal state evolves, how it remembers, and how it uses hardware. The main update lies in switching from a simple Euler step to a trapezoidal integration, which takes into account both the start and end of each time interval. That small change makes its memory smoother and more stable over long sequences. It also lets the hidden state move in the complex plane, which adds a kind of rhythmic, oscillating memory. Instead of just decaying over time, the model can now represent repeating or periodic patterns, the kind of structure language and music often have. And with a new multi-input-multi-output design, Mamba-3 can process several streams in parallel, making much better use of modern GPUs. In practice, Mamba-3 opens up a lot of possibilities. Its ability to handle long sequences efficiently makes it a strong fit for tasks like long-document understanding, scientific time-series, or genome modeling: areas where Transformers struggle with context limits. Because it runs in linear time and keeps latency stable, it’s also well-suited for real-time applications like chat assistants, translation, and speech interfaces, where responsiveness matters more than raw scale. And its hardware-friendly design makes Mamba-3 could eventually power on-device or edge AI systems, running large models locally without depending on the cloud. It’s the kind of architecture that quietly expands from large-context reasoning on servers to lightweight intelligence on everyday devices https://x.com/JundeMorsenWu/status/1977664753011916859?t=xoorer9sscloa78ZjuvcsQ&s=19

Из этого канала