GitHub - jhauret/vibravox: Speech to Phoneme, Bandwidth Extension and Speaker Verification using the Vibravox dataset.

Speech to Phoneme, Bandwidth Extension and Speaker Verification using the Vibravox dataset.

Resources:

📝: The Open access paper published in Speech Communication related to this project is available on arXiv and Speech Communication
🤗: The dataset used in this project is hosted by Hugging Face. You can access it here.
🌐: For more information about the project, visit our project page.
🏆: Explore Leaderboards on Papers With Code.

Setup

Create your environment (example is given with `pyenv` here)

pyenv update
pyenv install 3.12.O
pyenv virtualenv 3.12.0 vibravox-env
pyenv local vibravox-env

Install `vibravox`

pip install vibravox

Available sensors

🟣:headset_microphone ( Not available for Bandwidth Extension as it is the reference mic )
🟡:throat_microphone
🟢:forehead_accelerometer
🔵:rigid_in_ear_microphone
🔴:soft_in_ear_microphone
🧊:temple_vibration_pickup

Run some models

(non explicit list of models, just examples)

EBEN or Mimi for Bandwidth Extension

Train and test on speech_clean, for recordings in a quiet environment:

For EBEN:

python run.py \
  lightning_datamodule=bwe \
  lightning_datamodule.sensor=throat_microphone \
  lightning_module=eben \
  lightning_module.generator.p=2 \
  +callbacks=[bwe_checkpoint] \
  ++trainer.check_val_every_n_epoch=15 \
  ++trainer.max_epochs=500

For Mimi:

python run.py \
  sample_rate=24000 \
  lightning_datamodule=bwe \
  lightning_datamodule.sensor=throat_microphone \
  lightning_datamodule.batch_size=16 \
  lightning_module=regressive_mimi \
  lightning_module.optimizer.lr=1e-4 \
  +callbacks=[bwe_checkpoint] \

Train on speech_clean mixed with speechless_noisy and test on speech_noisy, for recordings in a noisy environment:

For EBEN with weights initialized from vibravox_EBEN_models:

python run.py \
  lightning_datamodule=noisybwe \
  lightning_datamodule.sensor=throat_microphone \
  lightning_module=eben \
  lightning_module.description=from_pretrained-throat_microphone \
  ++lightning_module.generator=dummy \
  ++lightning_module.generator._target_=vibravox.torch_modules.dnn.eben_generator.EBENGenerator.from_pretrained \
  ++lightning_module.generator.pretrained_model_name_or_path=Cnam-LMSSC/EBEN_throat_microphone \
  ++lightning_module.discriminator=dummy \
  ++lightning_module.discriminator._target_=vibravox.torch_modules.dnn.eben_discriminator.DiscriminatorEBENMultiScales.from_pretrained \
  ++lightning_module.discriminator.pretrained_model_name_or_path=Cnam-LMSSC/DiscriminatorEBENMultiScales_throat_microphone \
  +callbacks=[bwe_checkpoint] \
  ++callbacks.checkpoint.monitor=validation/torchmetrics_stoi/synthetic \
  ++trainer.check_val_every_n_epoch=15 \
  ++trainer.max_epochs=200

wav2vec2 for Speech to Phoneme

Train and test on speech_clean, for recordings in a quiet environment: (weights initialized from facebook/wav2vec2-base-fr-voxpopuli )

python run.py \
  lightning_datamodule=stp \
  lightning_datamodule.sensor=throat_microphone \
  lightning_module=wav2vec2_for_stp \
  lightning_module.optimizer.lr=1e-5 \
  ++trainer.max_epochs=10

Train and test on speech_noisy, for recordings in a noisy environment: (weights initialized from vibravox_phonemizers )

python run.py \
lightning_datamodule=stp \
lightning_datamodule.sensor=throat_microphone \
lightning_datamodule.subset=speech_noisy \
lightning_datamodule/data_augmentation=aggressive \
lightning_module=wav2vec2_for_stp \
lightning_module.wav2vec2_for_ctc.pretrained_model_name_or_path=Cnam-LMSSC/phonemizer_throat_microphone \
lightning_module.optimizer.lr=1e-6 \
++trainer.max_epochs=30

ECAPA2 for Speaker Verification:

Test the model on speech_clean:

python run.py \
  lightning_datamodule=spkv \
  lightning_module=ecapa2 \
  logging=csv \
  ++trainer.limit_train_batches=0 \
  ++trainer.limit_val_batches=0

Test on speech_clean mixed with speechless_noisy, representative of speech_noisy with the exact same pairs that were used on speech_clean, allowing direct comparison of results:

python run.py \
  lightning_datamodule=spkv \
  lightning_datamodule.dataset_name=Cnam-LMSSC/vibravox_mixed_for_spkv \
  lightning_datamodule.subset=speech_noisy_mixed \
  lightning_module=ecapa2 \
  logging=csv \
  ++trainer.limit_train_batches=0 \
  ++trainer.limit_val_batches=0

Cite our work

If you use code in this repository or the Vibravox dataset (either curated or non-curated versions) for research, please cite this paper :

@article{hauret2025vibravox,
      title={{Vibravox: A dataset of french speech captured with body-conduction audio sensors}},
      author={{Hauret, Julien and Olivier, Malo and Joubaud, Thomas and Langrenne, Christophe and
        Poir{\'e}e, Sarah and Zimpfer, V{\'e}ronique and Bavu, {\'E}ric},
      journal={Speech Communication},
      pages={103238},
      year={2025},
      publisher={Elsevier}
}

and this HuggingFace repository, which is linked to a DOI :

@misc{cnamlmssc2024vibravoxdataset,
    author={Hauret, Julien and Olivier, Malo and Langrenne, Christophe and
        Poir{\'e}e, Sarah and Bavu, {\'E}ric},
    title        = { {Vibravox} (Revision 7990b7d) },
    year         = 2024,
    url          = { https://huggingface.co/datasets/Cnam-LMSSC/vibravox },
    doi          = { 10.57967/hf/2727 },
    publisher    = { Hugging Face }
}

Name		Name	Last commit message	Last commit date
Latest commit History 744 Commits
.github/workflows		.github/workflows
configs		configs
scripts		scripts
tests		tests
vibravox		vibravox
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Resources:

Setup

Create your environment (example is given with `pyenv` here)

Install `vibravox`

Available sensors

Run some models

Cite our work

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 4

Languages

License

jhauret/vibravox

Folders and files

Latest commit

History

Repository files navigation

Resources:

Setup

Create your environment (example is given with pyenv here)

Install vibravox

Available sensors

Run some models

Cite our work

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 4

Languages

Create your environment (example is given with `pyenv` here)

Install `vibravox`

Packages