[Feature Request] Make TTS engine configurable, with optional pre-processing #5543

sealad886 · 2025-07-10T01:37:53Z

sealad886
Jul 10, 2025

Configurable TTS Engine with LLM Preprocessing

Why This Matters

I’ve been fighting with the built-in TTS—and it’s flat, robotic, and often unintelligible, especially when it tries to read raw code. Rather than settling for a one-size-fits-all voice, this proposal makes the TTS layer fully configurable so that anyone (even just a lone developer) can:

Swap in different engines (commercial or open-source)
Pick and switch voices on the fly
Tweak prosody (speed, pitch, emphasis) per use case
Optionally run text through an LLM first to filter or summarize content before speaking

That way, the system finally speaks like a human instead of a broken robot—and it won’t waste ears on raw code dumps.

Core Ideas

Engine Abstraction
Define a simple interface so you can plug in any TTS backend—Amazon Polly, Google Cloud, open-source neural models, or whatever you prefer.
Voice Profiles
Let users choose from named personas (e.g. “en-US-Alice”, “es-ES-Jorge”) with metadata for gender, style, and mood.
Prosody Controls
Expose parameters for rate, pitch shift, volume, and emphasis tags so you can get the tone just right.
Smart Fallback
If your primary engine is slow or errors out, automatically switch to a backup—no dead air.
Optional LLM Preprocessor
Before sending text to TTS, run it through a configurable LLM prompt that:
- Strips out or summarizes code snippets into natural-language explanations
- Filters irrelevant content (e.g. debug logs, long stack traces)
- Adapts wording for clarity and brevity
You get clean, user-friendly speech instead of a garbled code read-out.

Why It’s Worth It

Superior Audio Quality
Pick the best neural voices on the market instead of settling for tinny monotone.
Accessibility & Global Reach
Support any language or dialect—and make sure it actually sounds intelligible.
Cost Control
Route low-priority traffic to free/open-source engines, use paid voices when it counts.
Reliability
Automatic failover and LLM checks prevent both silence and nonsense.
Future-Proof
Swap in new TTS or LLM innovations without rewriting your app.

Bottom Line

Making the TTS engine configurable and adding LLM-driven content refinement turns a frustrating feature into a powerful tool. You get natural output, better accessibility, and complete control—without ripping out your existing code.

One might imagine being able to have a natural language, and natural sounding, conversation about your code bidirectionally...

Imagine...

If you look at some of the ideas in this discussion on External Control APIs, then you could imagine using a mobile device (Smart Watch, wireless headphones, car's radio, ...) to design, understand, and publish complex code without even looking at your computer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request] Make TTS engine configurable, with optional pre-processing #5543

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

[Feature Request] Make TTS engine configurable, with optional pre-processing #5543

Uh oh!

sealad886 Jul 10, 2025

Configurable TTS Engine with LLM Preprocessing

Why This Matters

Core Ideas

Why It’s Worth It

Bottom Line

Imagine...

Replies: 0 comments

sealad886
Jul 10, 2025