A collection of powerful applications built with the Sesame CSM-1b text-to-speech model. Generate natural-sounding speech with realistic qualities and voice cloning capabilities.
Application | Description | Key Features | Status |
---|---|---|---|
Personal Voice Diary | Convert diary entries into natural-sounding speech | Voice cloning, entry management, playback | 🔜 Planned |
Audiobook Creator | Create audiobooks from any text | Text chunking, multiple voices, background processing | Completed |
Voice Message Creator | Generate sharable voice messages | Custom voices, QR codes, expiring messages | 🔜 Planned |
Story Narrator for Children | Narrate children's stories with character voices | Character voices, sound effects | 🔜 Planned |
Emotion-based Voice Generator | Generate speech with different emotions | Multiple emotion presets, intensity control | 🔜 Planned |
Voice Style Transfer | Transfer voice to different speaking styles | Style presets, voice preservation | 🔜 Planned |
Voice-based Social Media Post Creator | Create audio for social media | Background music, platform templates | 🔜 Planned |
Multilingual Accent Tool | Generate speech with different accents | Multiple accent options, pronunciation tools | 🔜 Planned |
- Natural Voice Generation: Create realistic speech with the power of CSM-1b
- Voice Cloning: Clone any voice from a short audio sample
- Independent Applications: Each app is fully self-contained and ready to use
- Modern Architecture: Built with FastAPI backends and Streamlit UIs
- Cloud Deployment: Configured for easy deployment with Modal
- High Performance: Optimized for both CPU and GPU environments
Getting started with any application is straightforward:
-
Clone the repository
git clone https://github.com/mahimairaja/awesome-csm-1b.git cd awesome-csm-1b
-
Choose an application
cd src/<app-name>
-
Install dependencies
pip install -r requirements.txt
-
Set up your Hugging Face token in a
.env
fileHF_TOKEN=your_hugging_face_token
-
Start the backend
python app.py
-
In a new terminal, start the UI
streamlit run ui.py
- FastAPI: Backend API framework
- Streamlit: User interface
- PyTorch & Torchaudio: Audio processing
- Hugging Face: Model access and management
- Modal: Cloud deployment
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add some amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
- Python 3.10 or higher
- Hugging Face account with access to CSM-1b model
- Hugging Face API token
- CUDA-compatible GPU recommended for optimal performance
This project is licensed under the MIT License - see the LICENSE file for details.
Built with ❤️ by Mahimai Raja