-
Notifications
You must be signed in to change notification settings - Fork 738
online-transducer: reset the encoder toghter with 2 previous output symbols (non-blank) #2129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I remember that for some streaming zipformer transducer model, if we also reset the encoder states, then it has trouble with recognizing the first word after reset. |
aha, that could be an issue with masking-out the initial left context, initialized to 0.0 matices. in my model this is done inside the encoder model based on the |
Can we add a config parameter to let the user decide whether the encode states should be reset on a detected endpoint? |
yes, that would be a good option fitting both ways, what would be the config parameter name ? |
Yes, I agree. |
Okay, the I found also an issue with I am not sure how to solve this well. Either by tracking what was build. K. |
How about something like #if SHERPA_ONNX_ENABLE_TTS == 1
PybindOfflineTts(&m);
#else
m.attr("OfflineTts") = py::none();
m.attr("OfflineTtsConfig") = py::none();
m.attr("OfflineTtsModelConfig") = py::none();
#endif or we can add something like #define SHERPA_ONNX_DUMMY_IMPL(x) m.attr(#x) = py::none()
#if SHERPA_ONNX_ENABLE_TTS == 1
PybindOfflineTts(&m);
#else
SHERPA_ONNX_DUMMY_IMPL(OfflineTts);
SHERPA_ONNX_DUMMY_IMPL(OfflineTtsConfig);
SHERPA_ONNX_DUMMY_IMPL(OfflineTtsModelConfig);
#endif You can also use
or you can assign any other expression as long as |
I have a question. Where should this be ? In the pybind11 code ? Currently, if let's say TTS is deactivated, also the pybind11 wrapper is deactivated:
So, you suggest to activate all the pybind11 wrappers, and some symbols will be strings like "Deactivated in cmake" or the None symbols. And it will be driven by conditioning according to the And the Do I understand it correctly ? K. |
You can put it here sherpa-onnx/sherpa-onnx/python/csrc/sherpa-onnx.cc Lines 76 to 79 in 921c437
or anywhere you think is appropriate, as long as it is in the C++ code.
Yes, no need to change The main idea is to set the attribute of the
it will not throw. in c++, if we use
it is equivalent to
We don't care what is assigned to
it won't throw. |
I tried something in that direction. And they are enabled here in python/csrc/CMakeLists.txt However, if I compile the project, it finished correctly and installed the package.
Is there some other place where the "faked" wrappers sholud be enabled besides the python/csrc/CMakeLists.txt ? My belief is that the I tried to inspect the symbols inside with |
This is the example build log after changing the
|
Related issue #1700 Btw, I might be wrong but two blanks is not what model expects, zipformer models take -1 and blank: Two blanks are significantly worse for offline zipformer, not sure about streaming one. |
sorry, the "close" was purely accidental |
aha, this is worth checking too, a comparison with the |
sherpa-onnx/python/csrc/faked-tts.cc
Outdated
#include "sherpa-onnx/python/csrc/offline-tts-kokoro-model-config.h" | ||
#include "sherpa-onnx/python/csrc/offline-tts-matcha-model-config.h" | ||
#include "sherpa-onnx/python/csrc/offline-tts-model-config.h" | ||
#include "sherpa-onnx/python/csrc/offline-tts-vits-model-config.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we remove these lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure. I tried to keep the same .h
files and replace the internals of .cc
files.
So these are irrelevant for the pybind11 API for python ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried it, and rebuilt with a new buid
dir. The error is still the same:
ImportError: cannot import name 'FastClustering' from '_sherpa_onnx' (/mnt/matylda5/iveselyk/CNECT_TENDER/SHERPA_ONNX/CONDA_ENVIRONMENT/lib/python3.9/site-packages/_sherpa_onnx.cpython-39-x86_64-linux-gnu.so)
You need to call the functions you add in sherpa-onnx.cc Please have a look at my comment before. It shows you where to put the code. Put the .cc files in CMakeLists.txt is far not enough. |
89ca560
to
0cd9c61
Compare
…ymbols (non-blank) - added `reset_encoder` boolean member into the OnlineRecognizerConfig class - by default the encoder is not reset
0cd9c61
to
60b905a
Compare
okay, now it is fixed, the empty symbols are hard-coded in (I should not try to program while being too tired after lunch ;-) ) Thank you for all the suggestions and patience, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Left a minor comment. Otherwise, it looks great to me.
@@ -67,7 +67,7 @@ static void PybindOnlineRecognizerConfig(py::module *m) { | |||
py::arg("max_active_paths") = 4, py::arg("hotwords_file") = "", | |||
py::arg("hotwords_score") = 0, py::arg("blank_penalty") = 0.0, | |||
py::arg("temperature_scale") = 2.0, py::arg("rule_fsts") = "", | |||
py::arg("rule_fars") = "") | |||
py::arg("rule_fars") = "", py::arg("reset_encoder")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
py::arg("rule_fars") = "", py::arg("reset_encoder")) | |
py::arg("rule_fars") = "", py::arg("reset_encoder") = false) |
Can you give it a default value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, default value is in place now
Thank you for your contribution! |
I've noticed a potential issue in streaming-transducer code.
In the case of an endpoint following and empty segment,
the prev. 2 tokens in predictor are reset to 2x blank symbol.
But, the encoder states are kept.
It seems more logical to reset BOTH the encoder states and the prev.
blank symbols (i.e. to effectively start the decoding from scratch).
If the prev. 2 tokens are kept, the encoder states are also kept.
Would you agree too ?
K.
// I've tested it with custom encoder, recognizing a 16min long czech fairy tale from youtube.
// It has behaved normally, but I am not sure, how often there were 2 endpoints in a row,
// with no speech in between.