Understanding Input Selectivity in Mamba

State-Space Models (SSMs), and particularly Mamba, have recently emerged as a promising alternative to Transformers.
Mamba introduces input selectivity to its SSM layer (S6) and
incorporates convolution and gating into its block definition.
While these modifications do improve Mamba’s performance over its SSM predecessors, it remains largely unclear how Mamba leverages the additional functionalities provided by input selectivity, and how these interact with the other operations in the Mamba architecture.
In this work, we demystify the role of input selectivity in Mamba, investigating its…Apple Machine Learning Research