A modern, user-friendly desktop application for interacting with the ElevenLabs text-to-speech API.
- Playground: Convert text to speech using ElevenLabs' advanced AI voices
- Voice Cloning (NEW v2.0): Create custom voice clones by uploading audio samples
- Voice Management: Browse, manage, and organize your voice library
- Voice Library (NEW v2.0): Access and search through available voices
- API Key Management: Save and manage multiple named API keys securely
- Voice Parameter Controls: Fine-tune voice output with visual sliders:
- Speed (0.5-2.0)
- Stability (0-1)
- Similarity Boost (0-1)
- Style Exaggeration (0-1)
- Preset Management (NEW v2.0): Save and load custom voice parameter combinations
- Test Settings: Preview voice settings with sample text
- Break Tags: Insert SSML break tags for natural pauses
- History Tracking: Review and replay previous generations
- Tips and Tricks: Best practices guide for optimal results
- An ElevenLabs API key (get one at elevenlabs.io)
- Node.js and npm installed on your system
- Go to the Releases page
- Download the latest version for your operating system
- Install and run the application
-
Clone this repository:
git clone https://github.com/SannidhyaSah/ElevenLabs-GUI-Studio-.git
-
Navigate to the project directory:
cd ElevenLabs-GUI-Studio-
-
Install dependencies:
npm install
-
Start the application in development mode:
npm start
To build the application for your platform:
npm run build
This will create distributable packages in the dist
directory.
- When you first start the application, you'll be directed to the Settings tab to add an API key
- Enter your ElevenLabs API key in the input field
- Give your API key a name (e.g., "Personal", "Work", "Testing")
- Click "Save API Key"
- You can add multiple API keys and switch between them using the dropdown
- API keys are stored securely in the data folder
- Navigate to the Playground tab (formerly Text to Speech)
- Select a voice and model
- Adjust voice parameters using the sliders
- Enter the text you want to convert to speech
- Use the "Add Break" button to insert pause tags if needed
- Click "Generate Speech" to create the audio
- Use the player controls to listen to the generated speech
- Click "Save Audio" to save the audio file to your computer
- Click the "New" button to start a fresh generation
- Select a preset from the dropdown to instantly apply voice settings:
- Balanced: Default settings for general use
- Expressive: Lower stability for emotional range
- Stable: High stability for consistent narration
- Fast Speech: 1.5x speed for quick delivery
- Slow & Clear: 0.8x speed for clarity
- To save current settings as a preset:
- Adjust parameters to your liking
- Enter a name in "Save as..." field
- Click the save icon
- To delete a preset:
- Select it from the dropdown
- Click the delete icon
- Adjust the voice parameters (Stability, Similarity, Style, Speed) using the sliders
- Click the "Test Settings" button to generate a sample audio with current settings
- Listen to the audio to hear how your settings affect the voice
- Click "Reset Settings" to return to default values if needed
- Navigate to the Voice Management tab
- Click on "Clone Voice" sub-tab
- Enter a name for your voice clone
- Add an optional description
- Upload audio samples by:
- Clicking the upload area to browse files
- Dragging and dropping audio files
- Add optional labels (e.g., "accent:british, age:middle")
- Click "Create Voice Clone" to generate your custom voice
- Your cloned voice will appear in the voice selection dropdown
Tips for Voice Cloning:
- Use clear, high-quality audio samples
- Provide multiple samples for better results
- Ensure minimal background noise
- Samples should be between 30 seconds to 3 minutes
- Navigate to the History tab to view your previous generations
- Click "Play" on any history item to hear it again
- Click "Use Text" to load the text from a previous generation
- Click "Delete" to remove a specific history item
- Click "Clear History" to remove all history items
-
Stability (0-1): Controls how stable/consistent the voice is. Lower values (0.0-0.3) allow for more emotional range and variability, while higher values (0.7-1.0) make the voice more monotonous but consistent.
-
Similarity Boost (0-1): Controls how closely the AI adheres to the original voice. Higher values (0.7-1.0) make it sound more like the original speaker, while lower values (0.0-0.3) allow for more creativity but may sound less like the original voice.
-
Style (0-1): Controls style exaggeration of the voice. Higher values (0.7-1.0) amplify the style of the original speaker, making the voice more distinctive and characterized. Default is 0.0 (no style exaggeration).
-
Speed (0.5-2.0): Controls the speed of the generated speech. Lower values create slower speech (0.5 is half speed), while higher values create faster speech (2.0 is double speed). Default is 1.0 (normal pace).
You can test different combinations of these parameters using the "Test Settings" button to find the perfect voice for your needs.
- Use proper punctuation to guide the pacing and intonation of the speech
- Use the "Add Break" button to insert pauses of specific duration with
<break time="Xs" />
tags - Break long texts into smaller paragraphs for better results
- Different voices work better with different models - experiment to find the best combination
- Use the "Test Settings" button to quickly hear how different parameter combinations sound
- For emotional speech, use lower stability values (0.1-0.3)
- For narration or audiobooks, use medium stability (0.4-0.6) and high similarity (0.7-0.9)
- For consistent voice assistants, use high stability (0.7-0.9)
The application supports SSML (Speech Synthesis Markup Language) tags for more control over the speech:
<break time="Xs" />
- Add a pause of X seconds (use the "Add Break" button)<emphasis>text</emphasis>
- Emphasize text<prosody rate="slow/medium/fast">text</prosody>
- Control speech rate<prosody pitch="low/medium/high">text</prosody>
- Control pitch
Created by @SannidhyaSah
This is an unofficial application and is not affiliated with ElevenLabs. You must have a valid ElevenLabs API key to use this application. All API usage is subject to ElevenLabs' terms of service.
- Voice Cloning Feature: Added complete voice cloning functionality
- Upload multiple audio samples
- Create custom voice clones
- Manage voice labels and descriptions
- Preset Management System: Save and load voice parameter combinations
- Default presets included (Balanced, Expressive, Stable, Fast Speech, Slow & Clear)
- Create custom presets
- Delete unwanted presets
- Voice Library Integration: Browse and search available voices
- Major UI Overhaul: Complete redesign with modern dark theme
- Performance Optimization: Removed heavy effects for smoother operation
- Improved Visibility: Enhanced slider tracks and controls
- Better Spacing: Fixed button overlaps and improved layout
- Color Scheme Update: Changed from purple to professional gray (#23272e)
- Notification System: Reduced duration to 1 second for better UX
- Preset Management: Improved preset selector to prevent overflow
- Responsive Design: Better adaptation to different screen sizes
- Basic text-to-speech functionality
- Voice and model selection
- Parameter controls
- History tracking
- API key management
MIT