MambaTalk

This is an official PyTorch implementation of MambaTalk: Efficient Holistic Gesture Synthesis with Selective State Space Models.

📝 Release Plans

Inference codes and pretrained weights
Training scripts

⚒️ Installation

Build Environtment

We Recommend a python version ==3.9.21 and cuda version ==12.2. Then build environment as follows:

git clone https://github.com/kkakkkka/MambaTalk -b main
# [Optional] Create a virtual env
conda create -n mambatalk python==3.9.21
conda activate mambatalk
# Install ffmpeg for media processing and libstdcxx-ng for rendering
conda install -c conda-forge libstdcxx-ng ffmpeg
# Install with pip:
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt
pip install 'git+https://github.com/facebookresearch/pytorch3d.git@stable'
pip install https://github.com/Dao-AILab/causal-conv1d/releases/download/v1.4.0/causal_conv1d-1.4.0+cu122torch2.1cxx11abiTRUE-cp39-cp39-linux_x86_64.whl
pip install https://github.com/state-spaces/mamba/releases/download/v2.2.4/mamba_ssm-2.2.4+cu11torch2.1cxx11abiFALSE-cp39-cp39-linux_x86_64.whl

Download weights

You may run the following command to download weights from Huggingface in ./pretrained/:

pip3 install "huggingface_hub[cli]"
huggingface-cli download --resume-download kkakkkka/MambaTalk --local-dir pretrained

These weights should be orgnized as follows:

./pretrained/
|-- pretrained_vq
|   |-- face.bin
|   |-- foot.bin
|   |-- hands.bin
|   |-- lower_foot.bin
|   |-- upper.bin
|-- smplx_models
|   |-- smplx/SMPLX_NEUTRAL_2020.npz
|-- test_sequences
|-- mambatalk_100.bin

🚀 Training and Inference

Data Preparation

Download the unzip version BEAT2 via hugging face in path <your root>:

git lfs install
git clone https://huggingface.co/datasets/H-Liu1997/BEAT2

Evaluation of Pretrained Weights

After you downloaded BEAT2 dataset, run:

bash run_scripts/test.sh

Customized Data

For your own data, you should organize it as follows:

.
├── smplxflame_30
│   ├── 2_scott_0_1_1.npz
│   ├── 2_scott_0_2_2.npz
├── test.csv
├── textgrid
│   ├── 2_scott_0_1_1.TextGrid
│   ├── 2_scott_0_2_2.TextGrid
├── wave16k
│   ├── 2_scott_0_1_1.wav
│   ├── 2_scott_0_2_2.wav

In test.csv, please list your files as shown below:

id,type
2_scott_0_1_1,test
2_scott_0_2_2,test

If you want to generate corresponding TextGrid files from your speech recordings, we recommend installing Montreal Forced Aligner (MFA). These aligned text files should then be used as input alongside your audio files.

pip install git+https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner
conda install -c conda-forge kalpy
pip install pgvector
pip install Bio
mfa model download acoustic english_us_arpa
mfa model download dictionary english_us_arpa
# Please put your speech recordings under ./data first
mfa align ./data english_us_arpa english_us_arpa ./data/result

Visualize the Generated Results

With generated npy files, you can visualize the results using command below:

npy_path="./res_2_scott_0_1_1.npz"
wav_path="./BEAT2/beat_english_v2.0.0/wave16k/2_scott_0_1_1.wav"
save_dir="outputs/render"

xvfb-run -a python render.py --npy_path $npy_path --wav_path $wav_path --save_dir $save_dir

Training of MambaTalk

bash run_scripts/train.sh

Training of VQVAEs

python train.py --config ./configs/cnn_vqvae_face_30.yaml

python train.py --config configs/cnn_vqvae_hands_30.yaml

python train.py --config configs/cnn_vqvae_lower_30.yaml

python train.py --config configs/cnn_vqvae_lower_foot_30.yaml

python train.py --config configs/cnn_vqvae_upper_30.yaml

Acknowledgements

The code is based on EMAGE. We thank the authors for their open-sourced code and encourage users to cite their works when applicable.

Citation

If MambaTalk is useful for your research, please consider citing:

@article{xu2024mambatalk,
  title={Mambatalk: Efficient holistic gesture synthesis with selective state space models},
  author={Xu, Zunnan and Lin, Yukang and Han, Haonan and Yang, Sicheng and Li, Ronghui and Zhang, Yachao and Li, Xiu},
  journal={Advances in Neural Information Processing Systems},
  volume={37},
  pages={20055--20080},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
configs		configs
dataloaders		dataloaders
models		models
optimizers		optimizers
run_scripts		run_scripts
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ae_trainer.py		ae_trainer.py
aeface_trainer.py		aeface_trainer.py
aelower_trainer.py		aelower_trainer.py
aelowerfoot_trainer.py		aelowerfoot_trainer.py
mambatalk_trainer.py		mambatalk_trainer.py
render.py		render.py
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MambaTalk

📝 Release Plans

⚒️ Installation

Build Environtment

Download weights

🚀 Training and Inference

Data Preparation

Evaluation of Pretrained Weights

Customized Data

Visualize the Generated Results

Training of MambaTalk

Training of VQVAEs

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

kkakkkka/MambaTalk

Folders and files

Latest commit

History

Repository files navigation

MambaTalk

📝 Release Plans

⚒️ Installation

Build Environtment

Download weights

🚀 Training and Inference

Data Preparation

Evaluation of Pretrained Weights

Customized Data

Visualize the Generated Results

Training of MambaTalk

Training of VQVAEs

Acknowledgements

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages