Skip to content

A Streamlit web app for AI-powered voice cloning using Coqui XTTS v2. Record or upload reference voices, clone speech in multiple languages, and generate natural audio outputs.

License

Notifications You must be signed in to change notification settings

EbrahimAR/AI-Voice-Cloner-XTTS-v2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🗣️ AI Voice Cloner (XTTS v2)

A Streamlit web app for cloning voices using Coqui TTS.
Supports uploading, recording, or selecting reference voices and generating natural-sounding speech in multiple languages.


🚀 Features

  • 🎤 Record voice directly in the browser
  • 📤 Upload reference voice samples (WAV, MP3, M4A, OGG, FLAC)
  • 🎵 Save and reuse voices from a gallery
  • 🌎 Multilingual support (English, Spanish, French, German, Chinese, Japanese, etc.)
  • 🎭 Control speech style, speed, and emotion
  • 📂 Keep track of recent generations with playback & downloads
  • 📥 Export audio as WAV or MP3

🛠 Tech Stack

Frontend / UI

  • Streamlit: Web interface for interaction and real-time updates
  • streamlit-option-menu: Sidebar navigation with a clean UI

Backend / Logic

  • Coqui TTS (XTTS v2): Voice cloning & text-to-speech model
  • pydub: Audio file conversion (WebM/MP3 ↔ WAV)
  • librosa & soundfile: Audio signal processing
  • NumPy: Array and numerical operations

Utilities

  • requests: API calls (if extending functionality)
  • python-dotenv: Manage secrets/API keys
  • ffmpeg (system dependency): Required for audio encoding/decoding

Environment

  • requests: API calls (if extending functionality)
  • Virtual Environment (venv/) for dependency isolation

📂 Project Structure

voice_clone_app/
│
├── outputs/                  # Generated outputs (auto-created)
│   └── xtts_*.wav            # Generated files
│
├── voices/                   # Reference voices (auto-created)
│   ├── ref_*.wav             # Uploaded/recorded samples
│
├── app.py                    # Main Streamlit application
├── run_app.py                # Launcher (cross-platform, auto-opens browser)
├── test.py                   # Experimental/testing script
├── requirements.txt          # Python dependencies
├── LICENSE                   # Open-source license
├── .gitignore                # Ignored files/folders
└── README.md                 # Project documentation

⚙️ Installation

1. Clone the Repository

git clone https://github.com/EbrahimAR/AI-Voice-Cloner-XTTS-v2.git
cd AI-Voice-Cloner-XTTS-v2

2. Create virtual environment

python -m venv .venv
source .venv/bin/activate    # On Linux/Mac
.venv\Scripts\activate       # On Windows

3. Install dependencies

pip install -r requirements.txt

4. Install system dependencies

  • ffmpeg is required for MP3 conversion:
    • Windows: choco install ffmpeg
    • macOS: brew install ffmpeg
    • Linux: sudo apt-get install ffmpeg

▶️ Run the app

streamlit run app.py

Or use the helper script:

python run_app.py

App will open at: http://localhost:8502


👨‍💻 Author

Ebrahim Abdul Raoof

LinkedIn

GitHub


📜 License

This project is licensed under the MIT License. See LICENSE for details.

About

A Streamlit web app for AI-powered voice cloning using Coqui XTTS v2. Record or upload reference voices, clone speech in multiple languages, and generate natural audio outputs.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages