Skip to content

MotivationalSpeechSynthesis/motivational-speech-synthesis

Repository files navigation

Motivational Speech Synthesis

Text-to-motivational-speech with adjustable motivational factor to control motivational prosody.

Preliminary Paper | Project Page | Colab Demo


Motivational speech has emerged as a popular audiovisual phenomenon within Western subcultures, conveying optimal strategies and principles for success through expressive, high-energy delivery. The present paper artistically explores methods for synthesizing the distinctive prosodic patterns inherent to motivational speech, while critically examining its sociocultural foundations. Drawing on recent advances in emotion-controllable text-to-speech (TTS) systems and speech emotion recognition (SER), we employ deep learning models and frameworks to replicate and analyze this genre of speech. Within our proposed architecture, we introduce a one-dimensional motivational factor derived from high-dimensional emotional speech representations, enabling the control of motivational prosody according to intensity. Situated within broader discourses on self-optimization and meritocracy, Motivational Speech Synthesis contributes to the field of emotional speech synthesis, while also prompting reflection on the societal values embedded in such mediated narratives.

Cloning

Use --recurse-submodulesflag to also clone submodules

git clone --recurse-submodules git@github.com:MotivationalSpeechSynthesis/motivational-speech-synthesis.git

If using HTTPS rather than SSH for cloning

git clone git@github.com:MotivationalSpeechSynthesis/motivational-speech-synthesis.git
cd motivational-speech-synthesis
git config submodule.emoknob.url https://github.com/tonychenxyz/emoknob.git
git submodule update --init --recursive

Requirements

  • Linux OS recommended (Windows support expected but not tested, macOS currently unsupported)

Installation and Running

Note: Each standalone script execution recompiles the model. For repeated experiments and faster iteration, use the provided Jupyter notebook.

Using uv

Run script

uv run motivationalTTS.py "Every journey begins with a single step."

Virtual env for jupyter-notebook:

uv venv

Start jupyter-notebook

uv run jupyter-notebook

Using pip

python -m venv env
source env/bin/activate
pip install -r requirements.txt

Run script

python motivationalTTS.py "Every journey begins with a single step."

Start jupyter-notebook

uv run jupyter-notebook

Optional Parameters

You can customize the synthesis with the following optional arguments:

uv run motivationalTTS.py "Every journey begins with a single step." \
    --motivational-factor 0.8 \
    --seed 42 \
    --intermediate-dir "./output_audio" \
    --output-name "my_audio.wav" \
    --device "cuda:0" \
    --dtype "float16" \
    --debug \
    --average-speaker-emb-dir "average-speaker-embeddings/average-speaker-embeddings_400"

Google Colab

The model can also be run with following Google Colab example

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •