Understanding Mamba
State Space Models
State Space Models
This architecture is early, and the largest model right now is 2.8B params only.
That said, the way State Space Models (like Mamba) compress the context of a prompt into storage avoids having to attend to a long history when generating each new token.
This is a fundamental advantage. It’s a transformer, with compression built in (very loosely speaking).
Cheers, Ronan
Links:

