LongMamba

LongMamba: Enhancing Mamba's Long Context Capabilities via Training-Free Receptive Field Enlargement
Paper: https://arxiv.org/abs/2504.16053

About

LongMamba is a training-free technique that significantly enhances the long-context capabilities of Mamba models. LongMamba builds on our discovery that the hidden channels in Mamba can be categorized into local and global channels based on their receptive field lengths, with global channels primarily responsible for long-context capability. These global channels can become the key bottleneck as the input context lengthens. Specifically, when input lengths largely exceed the training sequence length, global channels exhibit limitations in adaptively extend their receptive fields, leading to Mamba's poor long-context performance. The key idea of LongMamba is to mitigate the hidden state memory decay in these global channels by preventing the accumulation of unimportant tokens in their memory. This is achieved by first identifying critical tokens in the global channels and then applying token filtering to accumulate only those critical tokens.

Getting Started

1. Install Environment

bash ./build_env.sh

This script will create a dedicated Python environment and install all required packages.

2. Prepare Data

Place the following datasets under the ./artifacts/ directory:

Align targets: Download from Google Drive link, unzip, and place under:
```
./artifacts/{base_model}-{align_folder}/
```
Example: ./artifacts/mamba2-1.3b-longmamba/delta_t-thre/
PG19 test sequences: Download from Google Drive link, unzip, and place under:
```
./artifacts/ppl_test/
```
Example: ./artifacts/ppl_test/pg19/

Evaluation

3. Configuration Parameters

Flag	Description	Default
`-lt`, `--long_eval_task`	Run LongBench evaluation (use `e` to run LongBench-E)	`no`
`-dt`, `--deci_task`	Run PG19 evaluation (use `pg19` to run PG19 task)	`no`
`-ppl`, `--perplexity`	Compute perplexity on a custom `.txt` dataset	Disabled
`--model`	Hugging Face model path or local checkpoint	`state-spaces/mamba2-1.3b`
`--model_arch`	Model architecture: `vanilla` or `ours`	`ours`
`--align_path`	Name of the align folder without base model prefix (use `longmamba`)	`longmamba`
`--our_method`	Different flitering methods (use `dt_thre`)	`dt_thre`
`--sample_path`	Path to custom `.txt` file for perplexity calculation	`subseq_lambada.txt`
`-d`, `--device`	CUDA device index	`0`

4. Running Evaluations

4.1 LongBench Evaluation ("e" subset)

# vanilla
CUDA_VISIBLE_DEVICES=0 python my_evaluation.py \
  --model state-spaces/mamba2-1.3b \
  --model_arch vanilla \
  -lt e

# ours
CUDA_VISIBLE_DEVICES=0 python my_evaluation.py \
  --model state-spaces/mamba2-1.3b \
  --model_arch ours \
  --align_path longmamba-mamba2-1.3b \
  --our_method dt_thre \
  -lt e

4.2 PG19 Evaluation

# vanilla
CUDA_VISIBLE_DEVICES=0 python my_evaluation.py \
  --model state-spaces/mamba2-1.3b \
  --model_arch vanilla \
  -dt pg19

# ours
CUDA_VISIBLE_DEVICES=0 python my_evaluation.py \
  --model state-spaces/mamba2-1.3b \
  --model_arch ours \
  --align_path longmamba-mamba2-1.3b \
  --our_method dt_thre \
  -dt pg19

4.3 Custom Perplexity

# vanilla
CUDA_VISIBLE_DEVICES=0 python my_evaluation.py \
  --model state-spaces/mamba2-1.3b \
  --model_arch vanilla \
  -ppl \
  --sample_path subseq_lambada.txt

# ours
CUDA_VISIBLE_DEVICES=0 python my_evaluation.py \
  --model state-spaces/mamba2-1.3b \
  --model_arch ours \
  --align_path longmamba-mamba2-1.3b \
  --our_method dt_thre \
  -ppl \
  --sample_path subseq_lambada.txt

Citation

If you find our work valuable, please consider citing our paper:

@inproceedings{ye2025longmamba,
  title={LongMamba: Enhancing Mamba's Long-Context Capabilities via Training-Free Receptive Field Enlargement},
  author={Zhifan Ye and Kejing Xia and Yonggan Fu and Xin Dong and Jihoon Hong and Xiangchi Yuan and Shizhe Diao and Jan Kautz and Pavlo Molchanov and Yingyan Celine Lin},
  booktitle={The Thirteenth International Conference on Learning Representations},
  year={2025},
  url={https://openreview.net/forum?id=fMbLszVO1H}
}

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
.github/workflows		.github/workflows
configs		configs
csrc/selective_scan		csrc/selective_scan
custom_datasets		custom_datasets
figures		figures
mamba_ssm		mamba_ssm
submodules		submodules
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
build_env.sh		build_env.sh
load_align_target.py		load_align_target.py
load_model_from_config.py		load_model_from_config.py
metrics.py		metrics.py
my_evaluation.py		my_evaluation.py
run.sh		run.sh
setup.py		setup.py
subseq_lambada.txt		subseq_lambada.txt
subseq_thepile.txt		subseq_thepile.txt
task_deci.py		task_deci.py
task_longbench.py		task_longbench.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LongMamba

About

Getting Started

1. Install Environment

2. Prepare Data

Evaluation

3. Configuration Parameters

4. Running Evaluations

4.1 LongBench Evaluation ("e" subset)

4.2 PG19 Evaluation

4.3 Custom Perplexity

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

GATECH-EIC/LongMamba

Folders and files

Latest commit

History

Repository files navigation

LongMamba

About

Getting Started

1. Install Environment

2. Prepare Data

Evaluation

3. Configuration Parameters

4. Running Evaluations

4.1 LongBench Evaluation ("e" subset)

4.2 PG19 Evaluation

4.3 Custom Perplexity

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages