-
Notifications
You must be signed in to change notification settings - Fork 14
args, server: Simplify and fix bugs #86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
It's been a few weeks and no opposition surfaced. This change allows new code to be written faster and less cluttered.
kokoro_model was being freed twice, firstly by kokoro_duration_runner while in use by kokoro_runner, which then had a use-after-free when it tried to properly free it. kokoro_context was also double-freeing some backend data that its base class runner_context would later free again. There was a mismatched new/free in prepare_post_load. After removing the double-free, I chose to add unique_ptr to let this part hold on to RAII and ownership. reference_wrapper is instead used in another PR to indicate non-ownership. Nearby pointers have not been upgraded to unique_ptr, because they involve backends, the situation of which has been difficult to untangle. Before, there was a SIGSEGV on post-args-refactor server shutdown. After, there the server exits cleanly on Ctrl+C. This isn't visible on the main branch, which just leaks the memory.
constexpr also fixed ODR problems.
ac0db97
to
16471e5
Compare
@danielzgtg Thanks! this looks good. I won't have time to review this tonight, but will review it tomorrow. Please wait on review before merging (seeing as this is a big one). |
User-facing improvements * Bare `./tts-cli -p` no longer crashes * Merged --use-espeak into just omitting --phonemizer-path * Errors are now text/plain instead of JSON * We don't need to write so much error handling code * OpenAI compatibility isn't important for errors * Server startup page now only disappears after everything loads * Fixed buffer reuse race condition between audio generation and HTTP output * conditional_prompt is now a JSON key, no longer an HTTP endpoint * This lifts the limitation of 1 worker thread Internal improvements * struct arg is now pointer-free and like a regular std::variant * `args["my-arg"]` (+ cast rarely) should feel like Python * Deduplicated args.add, updated .md, unified descriptions * Converted many copies to pass-by-reference * Less related files are left for the code style PR * Cached the only 2 JSON success responses * The main thread is now simply the listener thread, not a worker * Timeouts are now also enforced on the HTTP threads * Removed simple_response_map to avoid collisions and thread contention * simple_text_prompt_task is pooled and no longer leaked every request * This is the first time we're freeing runner memory
}()}; | ||
|
||
constexpr str get_espeak_id_from_kokoro_voice(str voice) { | ||
return KOKORO_LANG_TO_ESPEAK_ID[voice[0]] ? KOKORO_LANG_TO_ESPEAK_ID[voice[0]] : "gmw/en-US"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lack of cast to unsigned may permit an out-of-bounds read with Unicode, which none of the voices are named using, so is minor. It'll be fixed in the next PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work! Thanks for doing this. This is really excellent engineering work!
I've made some comments on a few minor issues that I've found while reviewing, but the only issue that is blocking is the change to no longer run the first worker on the main thread in server.cpp
. See my comments below and this PR for more information.
The term Nit:
prior to a pull request comment refers to the informal english verb nitpick
which means to find or point out minor faults in a fussy or pedantic way. As such, when I use the prefix Nit:
I am expressing a minor note that is often preferential and unnecessary. Feel free to ignore any Nit:
comment that you disagree with. It will not bother me in the slightest.
using namespace std; | ||
using namespace std::string_view_literals; | ||
typedef std::string_view sv; | ||
typedef const char * str; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit (see the review comment for an explanation of what I mean when I prefix a comment with Nit:
): I suspect it is more explicit and clear to just use const char *
instead of str.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will revert both aliases. I'll see if I can move them to my IDE's live templates, so that it doesn't feel like 3x the amount of typing again every time I use a string.
std::string voice = ""; | ||
bool sample = true; | ||
std::string espeak_voice_id = ""; | ||
bool use_cross_attn{true}; // TODO split out this load-time option from the rest of the generate-time configuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically this isn't a load time option because we compile the graph at inference. With that said, I am probably the only one interested in using this option. There is little to no function for it in most use cases. My vote would be we simply remove it.
cout << " (-" << abbreviation << ")"; | ||
} | ||
if (*description) { | ||
cout << (required ? ":\n (REQUIRED) " : ":\n (OPTIONAL) ") << description << ".\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: it is strange to have the last period added here given that all other punctuation is added by the description.
void update_conditional_prompt(tts_runner * runner, str file_path, str prompt, bool cpu_only) { | ||
const auto parler{dynamic_cast<parler_tts_runner *>(runner)}; | ||
if (!parler) { | ||
fprintf(stderr, "Wrong model for conditional prompt\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be more explicit here.
fprintf(stderr, "Wrong model for conditional prompt\n"); | |
fprintf(stderr, "The '%s' architecture does not support conditional generation.\n", SUPPORTED_ARCHITECTURES[runner->arch]); |
void process_task(simple_text_prompt_task & task) { | ||
tts_runner * runner = &*runners[task.model]; | ||
if (*task.conditional_prompt) { | ||
TTS_ASSERT(*q.get().text_encoder_path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't possible because of our request validation right (i.e. line 405)?
Additionally, I don't think we generally want to use TTS_ASSERT
inside the worker where it can be avoided, especially in response to request options, as that would give the end user a way of terminating a worker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will remove as unnecessary. Although I interpret TTS_ASSERT
as more like std::unreachable
than input validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per comments, I am fine with this being merged, assuming we fast follow with an improvement to the conditional prompt behavior.
add back missing header from resolving merge conflicts
User-facing improvements:
./tts-cli -p
no longer crashesInternal improvements:
args["my-arg"]
(+ cast rarely) should feel like PythonThis PR was split from the future of #58, and depends on and includes the PR #85.