- 📝: The Open access paper published in Speech Communication related to this project is available on arXiv and Speech Communication
- 🤗: The dataset used in this project is hosted by Hugging Face. You can access it here.
- 🌐: For more information about the project, visit our project page.
- 🏆: Explore Leaderboards on Papers With Code.
pyenv update
pyenv install 3.12.O
pyenv virtualenv 3.12.0 vibravox-env
pyenv local vibravox-env
pip install vibravox
- 🟣:
headset_microphone
( Not available for Bandwidth Extension as it is the reference mic ) - 🟡:
throat_microphone
- 🟢:
forehead_accelerometer
- 🔵:
rigid_in_ear_microphone
- 🔴:
soft_in_ear_microphone
- 🧊:
temple_vibration_pickup
(non explicit list of models, just examples)
-
EBEN or Mimi for Bandwidth Extension
- Train and test on
speech_clean
, for recordings in a quiet environment:- For EBEN:
python run.py \ lightning_datamodule=bwe \ lightning_datamodule.sensor=throat_microphone \ lightning_module=eben \ lightning_module.generator.p=2 \ +callbacks=[bwe_checkpoint] \ ++trainer.check_val_every_n_epoch=15 \ ++trainer.max_epochs=500
- For Mimi:
python run.py \ sample_rate=24000 \ lightning_datamodule=bwe \ lightning_datamodule.sensor=throat_microphone \ lightning_datamodule.batch_size=16 \ lightning_module=regressive_mimi \ lightning_module.optimizer.lr=1e-4 \ +callbacks=[bwe_checkpoint] \
- Train on
speech_clean
mixed withspeechless_noisy
and test onspeech_noisy
, for recordings in a noisy environment:- For EBEN with weights initialized from vibravox_EBEN_models:
python run.py \ lightning_datamodule=noisybwe \ lightning_datamodule.sensor=throat_microphone \ lightning_module=eben \ lightning_module.description=from_pretrained-throat_microphone \ ++lightning_module.generator=dummy \ ++lightning_module.generator._target_=vibravox.torch_modules.dnn.eben_generator.EBENGenerator.from_pretrained \ ++lightning_module.generator.pretrained_model_name_or_path=Cnam-LMSSC/EBEN_throat_microphone \ ++lightning_module.discriminator=dummy \ ++lightning_module.discriminator._target_=vibravox.torch_modules.dnn.eben_discriminator.DiscriminatorEBENMultiScales.from_pretrained \ ++lightning_module.discriminator.pretrained_model_name_or_path=Cnam-LMSSC/DiscriminatorEBENMultiScales_throat_microphone \ +callbacks=[bwe_checkpoint] \ ++callbacks.checkpoint.monitor=validation/torchmetrics_stoi/synthetic \ ++trainer.check_val_every_n_epoch=15 \ ++trainer.max_epochs=200
- Train and test on
-
wav2vec2 for Speech to Phoneme
- Train and test on
speech_clean
, for recordings in a quiet environment: (weights initialized from facebook/wav2vec2-base-fr-voxpopuli )
python run.py \ lightning_datamodule=stp \ lightning_datamodule.sensor=throat_microphone \ lightning_module=wav2vec2_for_stp \ lightning_module.optimizer.lr=1e-5 \ ++trainer.max_epochs=10
- Train and test on
speech_noisy
, for recordings in a noisy environment: (weights initialized from vibravox_phonemizers )
python run.py \ lightning_datamodule=stp \ lightning_datamodule.sensor=throat_microphone \ lightning_datamodule.subset=speech_noisy \ lightning_datamodule/data_augmentation=aggressive \ lightning_module=wav2vec2_for_stp \ lightning_module.wav2vec2_for_ctc.pretrained_model_name_or_path=Cnam-LMSSC/phonemizer_throat_microphone \ lightning_module.optimizer.lr=1e-6 \ ++trainer.max_epochs=30
- Train and test on
-
ECAPA2 for Speaker Verification:
- Test the model on
speech_clean
:
python run.py \ lightning_datamodule=spkv \ lightning_module=ecapa2 \ logging=csv \ ++trainer.limit_train_batches=0 \ ++trainer.limit_val_batches=0
- Test on
speech_clean
mixed withspeechless_noisy
, representative ofspeech_noisy
with the exact same pairs that were used onspeech_clean
, allowing direct comparison of results:
python run.py \ lightning_datamodule=spkv \ lightning_datamodule.dataset_name=Cnam-LMSSC/vibravox_mixed_for_spkv \ lightning_datamodule.subset=speech_noisy_mixed \ lightning_module=ecapa2 \ logging=csv \ ++trainer.limit_train_batches=0 \ ++trainer.limit_val_batches=0
- Test the model on
If you use code in this repository or the Vibravox dataset (either curated or non-curated versions) for research, please cite this paper :
@article{hauret2025vibravox,
title={{Vibravox: A dataset of french speech captured with body-conduction audio sensors}},
author={{Hauret, Julien and Olivier, Malo and Joubaud, Thomas and Langrenne, Christophe and
Poir{\'e}e, Sarah and Zimpfer, V{\'e}ronique and Bavu, {\'E}ric},
journal={Speech Communication},
pages={103238},
year={2025},
publisher={Elsevier}
}
and this HuggingFace repository, which is linked to a DOI :
@misc{cnamlmssc2024vibravoxdataset,
author={Hauret, Julien and Olivier, Malo and Langrenne, Christophe and
Poir{\'e}e, Sarah and Bavu, {\'E}ric},
title = { {Vibravox} (Revision 7990b7d) },
year = 2024,
url = { https://huggingface.co/datasets/Cnam-LMSSC/vibravox },
doi = { 10.57967/hf/2727 },
publisher = { Hugging Face }
}