args, server: Simplify and fix bugs #86

danielzgtg · 2025-06-03T13:41:09Z

User-facing improvements:

Bare ./tts-cli -p no longer crashes
Merged --use-espeak into just omitting --phonemizer-path
Errors are now text/plain instead of JSON
- We don't need to write so much error handling code
- OpenAI compatibility isn't important for errors
Server startup page now only disappears after everything loads
Fixed buffer reuse race condition between audio generation and HTTP output
conditional_prompt is now a JSON key, no longer an HTTP endpoint
- This lifts the limitation of 1 worker thread

Internal improvements:

struct arg is now pointer-free and like a regular std::variant
- args["my-arg"] (+ cast rarely) should feel like Python
Deduplicated args.add, updated .md, unified descriptions
Converted many copies to pass-by-reference
- Less related files are left for the code style PR
Cached the only 2 JSON success responses
The main thread is now simply the listener thread, not a worker
Timeouts are now also enforced on the HTTP threads
Removed simple_response_map to avoid collisions and thread contention
- simple_text_prompt_task is pooled and no longer leaked every request
This is the first time we're freeing runner memory

This PR was split from the future of #58, and depends on and includes the PR #85.

It's been a few weeks and no opposition surfaced. This change allows new code to be written faster and less cluttered.

kokoro_model was being freed twice, firstly by kokoro_duration_runner while in use by kokoro_runner, which then had a use-after-free when it tried to properly free it. kokoro_context was also double-freeing some backend data that its base class runner_context would later free again. There was a mismatched new/free in prepare_post_load. After removing the double-free, I chose to add unique_ptr to let this part hold on to RAII and ownership. reference_wrapper is instead used in another PR to indicate non-ownership. Nearby pointers have not been upgraded to unique_ptr, because they involve backends, the situation of which has been difficult to untangle. Before, there was a SIGSEGV on post-args-refactor server shutdown. After, there the server exits cleanly on Ctrl+C. This isn't visible on the main branch, which just leaks the memory.

constexpr also fixed ODR problems.

mmwillet · 2025-06-03T17:06:30Z

@danielzgtg Thanks! this looks good. I won't have time to review this tonight, but will review it tomorrow. Please wait on review before merging (seeing as this is a big one).

User-facing improvements * Bare `./tts-cli -p` no longer crashes * Merged --use-espeak into just omitting --phonemizer-path * Errors are now text/plain instead of JSON * We don't need to write so much error handling code * OpenAI compatibility isn't important for errors * Server startup page now only disappears after everything loads * Fixed buffer reuse race condition between audio generation and HTTP output * conditional_prompt is now a JSON key, no longer an HTTP endpoint * This lifts the limitation of 1 worker thread Internal improvements * struct arg is now pointer-free and like a regular std::variant * `args["my-arg"]` (+ cast rarely) should feel like Python * Deduplicated args.add, updated .md, unified descriptions * Converted many copies to pass-by-reference * Less related files are left for the code style PR * Cached the only 2 JSON success responses * The main thread is now simply the listener thread, not a worker * Timeouts are now also enforced on the HTTP threads * Removed simple_response_map to avoid collisions and thread contention * simple_text_prompt_task is pooled and no longer leaked every request * This is the first time we're freeing runner memory

danielzgtg · 2025-06-04T11:27:33Z

src/kokoro_model.h

+}()};
+
+constexpr str get_espeak_id_from_kokoro_voice(str voice) {
+	return KOKORO_LANG_TO_ESPEAK_ID[voice[0]] ? KOKORO_LANG_TO_ESPEAK_ID[voice[0]] : "gmw/en-US";


The lack of cast to unsigned may permit an out-of-bounds read with Unicode, which none of the voices are named using, so is minor. It'll be fixed in the next PR.

mmwillet

Excellent work! Thanks for doing this. This is really excellent engineering work!

I've made some comments on a few minor issues that I've found while reviewing, but the only issue that is blocking is the change to no longer run the first worker on the main thread in server.cpp. See my comments below and this PR for more information.

The term Nit: prior to a pull request comment refers to the informal english verb nitpick which means to find or point out minor faults in a fussy or pedantic way. As such, when I use the prefix Nit: I am expressing a minor note that is often preferential and unnecessary. Feel free to ignore any Nit: comment that you disagree with. It will not bother me in the slightest.

mmwillet · 2025-06-04T17:03:14Z

include/imports.h

+using namespace std;
+using namespace std::string_view_literals;
+typedef std::string_view sv;
+typedef const char * str;


Nit (see the review comment for an explanation of what I mean when I prefix a comment with Nit:): I suspect it is more explicit and clear to just use const char * instead of str.

Will revert both aliases. I'll see if I can move them to my IDE's live templates, so that it doesn't feel like 3x the amount of typing again every time I use a string.

mmwillet · 2025-06-04T17:09:52Z

include/common.h

-    std::string voice = "";
-    bool sample = true;
-    std::string espeak_voice_id = "";
+    bool use_cross_attn{true}; // TODO split out this load-time option from the rest of the generate-time configuration


Technically this isn't a load time option because we compile the graph at inference. With that said, I am probably the only one interested in using this option. There is little to no function for it in most use cases. My vote would be we simply remove it.

include/tts.h

mmwillet · 2025-06-04T17:24:58Z

examples/args.cpp

+        cout << " (-" << abbreviation << ")";
+    }
+    if (*description) {
+        cout << (required ? ":\n    (REQUIRED) " : ":\n    (OPTIONAL) ") << description << ".\n";


Nit: it is strange to have the last period added here given that all other punctuation is added by the description.

examples/args.cpp

examples/server/server.cpp

mmwillet · 2025-06-04T18:09:11Z

src/tts.cpp

+void update_conditional_prompt(tts_runner * runner, str file_path, str prompt, bool cpu_only) {
+    const auto parler{dynamic_cast<parler_tts_runner *>(runner)};
+    if (!parler) {
+        fprintf(stderr, "Wrong model for conditional prompt\n");


We should be more explicit here.

Suggested change

fprintf(stderr, "Wrong model for conditional prompt\n");

fprintf(stderr, "The '%s' architecture does not support conditional generation.\n", SUPPORTED_ARCHITECTURES[runner->arch]);

examples/server/server.cpp

mmwillet · 2025-06-04T19:05:05Z

examples/server/server.cpp

+    void process_task(simple_text_prompt_task & task) {
+        tts_runner * runner = &*runners[task.model];
+        if (*task.conditional_prompt) {
+            TTS_ASSERT(*q.get().text_encoder_path);


This isn't possible because of our request validation right (i.e. line 405)?

Additionally, I don't think we generally want to use TTS_ASSERT inside the worker where it can be avoided, especially in response to request options, as that would give the end user a way of terminating a worker.

Will remove as unnecessary. Although I interpret TTS_ASSERT as more like std::unreachable than input validation.

examples/server/server.cpp

mmwillet

Per comments, I am fine with this being merged, assuming we fast follow with an improvement to the conditional prompt behavior.

add back missing header from resolving merge conflicts

danielzgtg added 3 commits June 3, 2025 09:47

chore: Enable C++23 and abbreviations

f2ebb39

It's been a few weeks and no opposition surfaced. This change allows new code to be written faster and less cluttered.

refactor: Encapsulate args and quantize_impl into examples/

c443aae

constexpr also fixed ODR problems.

danielzgtg force-pushed the refactor/args branch 2 times, most recently from ac0db97 to 16471e5 Compare June 3, 2025 14:41

mmwillet self-requested a review June 3, 2025 17:06

danielzgtg force-pushed the refactor/args branch from 16471e5 to 5fed88e Compare June 4, 2025 07:21

danielzgtg force-pushed the refactor/args branch from 5fed88e to 2d4d444 Compare June 4, 2025 08:05

danielzgtg commented Jun 4, 2025

View reviewed changes

mmwillet requested changes Jun 4, 2025

View reviewed changes

mmwillet self-requested a review June 5, 2025 17:40

mmwillet approved these changes Jun 5, 2025

View reviewed changes

mmwillet added 2 commits June 5, 2025 13:48

Merge branch 'main' into refactor/args

1ce486d

Update server.cpp

d3946b4

add back missing header from resolving merge conflicts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

args, server: Simplify and fix bugs #86

args, server: Simplify and fix bugs #86

Uh oh!

danielzgtg commented Jun 3, 2025

Uh oh!

mmwillet commented Jun 3, 2025 •

edited

Loading

Uh oh!

danielzgtg Jun 4, 2025

Uh oh!

mmwillet left a comment •

edited

Loading

Uh oh!

mmwillet Jun 4, 2025 •

edited

Loading

Uh oh!

danielzgtg Jun 5, 2025

Uh oh!

mmwillet Jun 4, 2025

Uh oh!

Uh oh!

mmwillet Jun 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

mmwillet Jun 4, 2025

Uh oh!

Uh oh!

mmwillet Jun 4, 2025

Uh oh!

danielzgtg Jun 5, 2025

Uh oh!

Uh oh!

mmwillet left a comment •

edited

Loading

Uh oh!

Uh oh!

	fprintf(stderr, "Wrong model for conditional prompt\n");
	fprintf(stderr, "The '%s' architecture does not support conditional generation.\n", SUPPORTED_ARCHITECTURES[runner->arch]);

args, server: Simplify and fix bugs #86

Are you sure you want to change the base?

args, server: Simplify and fix bugs #86

Uh oh!

Conversation

danielzgtg commented Jun 3, 2025

Uh oh!

mmwillet commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danielzgtg Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

mmwillet left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mmwillet Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danielzgtg Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

mmwillet Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mmwillet Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mmwillet Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mmwillet Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

danielzgtg Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mmwillet left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mmwillet commented Jun 3, 2025 •

edited

Loading

mmwillet left a comment •

edited

Loading

mmwillet Jun 4, 2025 •

edited

Loading

mmwillet Jun 4, 2025 •

edited

Loading

mmwillet left a comment •

edited

Loading