Understanding Mamba

State Space Models

Feb 02, 2024

State Space Models

This architecture is early, and the largest model right now is 2.8B params only.

That said, the way State Space Models (like Mamba) compress the context of a prompt into storage avoids having to attend to a long history when generating each new token.

This is a fundamental advantage. It’s a transformer, with compression built in (very loosely speaking).

Cheers, Ronan

Links:

One-click Template Repo
ADVANCED Fine Tuning Repo
ADVANCED Inference Repo
FUNCTION Calling Models

Trelis Research

Discussion about this post

Ready for more?