[Feature Request] Make TTS engine configurable, with optional pre-processing #5543
sealad886
started this conversation in
Feature Requests
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Configurable TTS Engine with LLM Preprocessing
Why This Matters
I’ve been fighting with the built-in TTS—and it’s flat, robotic, and often unintelligible, especially when it tries to read raw code. Rather than settling for a one-size-fits-all voice, this proposal makes the TTS layer fully configurable so that anyone (even just a lone developer) can:
That way, the system finally speaks like a human instead of a broken robot—and it won’t waste ears on raw code dumps.
Core Ideas
Engine Abstraction
Define a simple interface so you can plug in any TTS backend—Amazon Polly, Google Cloud, open-source neural models, or whatever you prefer.
Voice Profiles
Let users choose from named personas (e.g. “en-US-Alice”, “es-ES-Jorge”) with metadata for gender, style, and mood.
Prosody Controls
Expose parameters for rate, pitch shift, volume, and emphasis tags so you can get the tone just right.
Smart Fallback
If your primary engine is slow or errors out, automatically switch to a backup—no dead air.
Optional LLM Preprocessor
Before sending text to TTS, run it through a configurable LLM prompt that:
You get clean, user-friendly speech instead of a garbled code read-out.
Why It’s Worth It
Superior Audio Quality
Pick the best neural voices on the market instead of settling for tinny monotone.
Accessibility & Global Reach
Support any language or dialect—and make sure it actually sounds intelligible.
Cost Control
Route low-priority traffic to free/open-source engines, use paid voices when it counts.
Reliability
Automatic failover and LLM checks prevent both silence and nonsense.
Future-Proof
Swap in new TTS or LLM innovations without rewriting your app.
Bottom Line
Making the TTS engine configurable and adding LLM-driven content refinement turns a frustrating feature into a powerful tool. You get natural output, better accessibility, and complete control—without ripping out your existing code.
One might imagine being able to have a natural language, and natural sounding, conversation about your code bidirectionally...
Imagine...
If you look at some of the ideas in this discussion on External Control APIs, then you could imagine using a mobile device (Smart Watch, wireless headphones, car's radio, ...) to design, understand, and publish complex code without even looking at your computer.
Beta Was this translation helpful? Give feedback.
All reactions