Skip to content

online-transducer: reset the encoder toghter with 2 previous output symbols (non-blank) #2129

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 24, 2025

Conversation

KarelVesely84
Copy link
Contributor

I've noticed a potential issue in streaming-transducer code.

In the case of an endpoint following and empty segment,
the prev. 2 tokens in predictor are reset to 2x blank symbol.
But, the encoder states are kept.

It seems more logical to reset BOTH the encoder states and the prev.
blank symbols (i.e. to effectively start the decoding from scratch).

If the prev. 2 tokens are kept, the encoder states are also kept.

Would you agree too ?
K.

// I've tested it with custom encoder, recognizing a 16min long czech fairy tale from youtube.
// It has behaved normally, but I am not sure, how often there were 2 endpoints in a row,
// with no speech in between.

@csukuangfj
Copy link
Collaborator

I remember that for some streaming zipformer transducer model, if we also reset the encoder states, then it has trouble with recognizing the first word after reset.

@KarelVesely84
Copy link
Contributor Author

I remember that for some streaming zipformer transducer model, if we also reset the encoder states, then it has trouble with recognizing the first word after reset.

aha, that could be an issue with masking-out the initial left context, initialized to 0.0 matices. in my model this is done inside the encoder model based on the processed_lens in streaming state. with zipformer, i think this was done externally in train.py , but i did not investigate into this in zipformer...

@csukuangfj
Copy link
Collaborator

Can we add a config parameter to let the user decide whether the encode states should be reset on a detected endpoint?
We can set it to false by default。

@KarelVesely84
Copy link
Contributor Author

KarelVesely84 commented Apr 17, 2025

yes, that would be a good option fitting both ways, what would be the config parameter name ?
reset_encoder=T/F ?

@csukuangfj
Copy link
Collaborator

yes, that would be a good option fitting both ways, what would be the config parameter name ? reset_encoder=T/F ?

Yes, I agree.

@KarelVesely84
Copy link
Contributor Author

KarelVesely84 commented Apr 17, 2025

Okay, the reset_encoder option in OnlineRecognizerConfig is now added and tested locally.

I found also an issue with __init__.py when some modules are disabled in cmake.
Those symbols are then missing in the _sherpa_onnx*.so file and an import sherpa_onnx in python fails.

I am not sure how to solve this well. Either by tracking what was build.
Or by a pre-lookup for symbols in that .so file, leaving out the missing ones.
Or by some try ... except around importing each symbol, but that would be ugly.

K.

@csukuangfj
Copy link
Collaborator

csukuangfj commented Apr 18, 2025

I am not sure how to solve this well. Either by tracking what was build.

How about something like

#if SHERPA_ONNX_ENABLE_TTS == 1
  PybindOfflineTts(&m);
#else
  m.attr("OfflineTts") = py::none();
  m.attr("OfflineTtsConfig") = py::none();
  m.attr("OfflineTtsModelConfig") = py::none();
#endif

or we can add something like

#define SHERPA_ONNX_DUMMY_IMPL(x) m.attr(#x) = py::none()

#if SHERPA_ONNX_ENABLE_TTS == 1
  PybindOfflineTts(&m);
#else
  SHERPA_ONNX_DUMMY_IMPL(OfflineTts);
  SHERPA_ONNX_DUMMY_IMPL(OfflineTtsConfig);
  SHERPA_ONNX_DUMMY_IMPL(OfflineTtsModelConfig);
#endif

You can also use

m.attr("OfflineTts") = "Not implemented yet";

or you can assign any other expression as long as m.attr('OfflineTts') is assigned.

@KarelVesely84
Copy link
Contributor Author

KarelVesely84 commented Apr 22, 2025

I have a question. Where should this be ? In the pybind11 code ?
Or directly in the __init__.py of the installed sherpa-onnx python module ?

Currently, if let's say TTS is deactivated, also the pybind11 wrapper is deactivated:

if(SHERPA_ONNX_ENABLE_TTS)

So, you suggest to activate all the pybind11 wrappers, and some symbols will be strings like "Deactivated in cmake" or the None symbols. And it will be driven by conditioning according to the SHERPA_ONNX_ENABLE_* macros.

And the __init__.py will stay as it is now.

Do I understand it correctly ?

K.

@csukuangfj
Copy link
Collaborator

I have a question. Where should this be ? In the pybind11 code ?

You can put it here

#if SHERPA_ONNX_ENABLE_TTS == 1
PybindOfflineTts(&m);
#endif

or anywhere you think is appropriate, as long as it is in the C++ code.


And the init.py will stay as it is now.

Yes, no need to change __init__.py.

The main idea is to set the attribute of the _sherpa module so that when you use

from _sherpa import xxx

it will not throw.


in c++, if we use

m.attr("xxx") = 'yyy'

it is equivalent to

_sherpa.xxx = 'yyy'

We don't care what is assigned to _sherpa.xxx. What matters is _sherpa has an attribute called xxx so when we use

from _sherpa import xxx

it won't throw.

@KarelVesely84
Copy link
Contributor Author

I tried something in that direction.
There are 2 new files:
sherpa-onnx/python/csrc/faked-tts.cc
sherpa-onnx/python/csrc/faked-diarization.cc

And they are enabled here in python/csrc/CMakeLists.txt

However, if I compile the project, it finished correctly and installed the package.
But, then when sourcing in python import sherpa_onnx i still get the error message:

ImportError: cannot import name 'FastClustering' from '_sherpa_onnx' (/mnt/matylda5/iveselyk/CNECT_TENDER/SHERPA_ONNX/CONDA_ENVIRONMENT/lib/python3.9/site-packages/_sherpa_onnx.cpython-39-x86_64-linux-gnu.so)

Is there some other place where the "faked" wrappers sholud be enabled besides the python/csrc/CMakeLists.txt ?

My belief is that the sherpa-onnx-core is first built independently of the pybind11 wrappers.
Next all the pybind11 wrappers are compiled, and these two components are forming the library _sherpa_onnx.cpython-39-x86_64-linux-gnu.so.

I tried to inspect the symbols inside with nm -gDC _sherpa_onnx.cpython-39-x86_64-linux-gnu.so, but I could not grep even those that are clearly there... (no OnlineRecognizer, no py:: in the grep's)

@KarelVesely84
Copy link
Contributor Author

KarelVesely84 commented Apr 23, 2025

This is the example build log after changing the faked-diarization.cc:

_sherpa_onnx.cpython-39-x86_64-linux-gnu.so -- pybind11 is downloaded to /mnt/matylda5/iveselyk/EU-ASR_TENDER/SHERPA_ONNX/src/sherpa-onnx/build/temp.linux-x86_64-cpython-39/_deps/pybind11-src
-- pybind11 v2.12.0
-- PYTHON_EXECUTABLE: /mnt/matylda5/iveselyk/CNECT_TENDER/SHERPA_ONNX/CONDA_ENVIRONMENT/bin/python3
-- PYTHON_VERSION: 3.9
-- CMAKE_CXX_FLAGS:
-- CMAKE_CXX_FLAGS:
-- Configuring done (3.4s)
-- Generating done (1.8s)
-- Build files have been written to: /mnt/matylda5/iveselyk/EU-ASR_TENDER/SHERPA_ONNX/src/sherpa-onnx/build/temp.linux-x86_64-cpython-39
[  0%] Built target ssentencepiece_core
[  5%] Built target kaldi-native-fbank-core
[  9%] Built target fst
[ 10%] Built target fstfar
[ 14%] Built target kaldifst_core
[ 18%] Built target kaldi-decoder-core
[ 74%] Built target sherpa-onnx-core
[ 76%] Built target sherpa-onnx-c-api
[ 78%] Built target sherpa-onnx-cxx-api
[ 80%] Building CXX object sherpa-onnx/python/csrc/CMakeFiles/_sherpa_onnx.dir/faked-diarization.cc.o
[ 80%] Linking CXX shared module ../../../lib/_sherpa_onnx.cpython-39-x86_64-linux-gnu.so
/usr/local/bin/ld: skipping incompatible /usr/local/lib/gcc/x86_64-linux/11.5.0/../../../libc.so when searching for -lc
[100%] Built target _sherpa_onnx
Installing the project stripped...
-- Install configuration: "Debug"
-- Up-to-date: /mnt/matylda5/iveselyk/EU-ASR_TENDER/SHERPA_ONNX/src/sherpa-onnx/build/lib.linux-x86_64-cpython-39/sherpa_onnx/lib/libonnxruntime.so
-- Up-to-date: /mnt/matylda5/iveselyk/EU-ASR_TENDER/SHERPA_ONNX/src/sherpa-onnx/build/lib.linux-x86_64-cpython-39/sherpa_onnx/./sherpa-onnx.pc
-- Installing: /mnt/matylda5/iveselyk/EU-ASR_TENDER/SHERPA_ONNX/src/sherpa-onnx/build/lib.linux-x86_64-cpython-39/sherpa_onnx/../_sherpa_onnx.cpython-39-x86_64-linux-gnu.so
-- Set non-toolchain portion of runtime path of "/mnt/matylda5/iveselyk/EU-ASR_TENDER/SHERPA_ONNX/src/sherpa-onnx/build/lib.linux-x86_64-cpython-39/sherpa_onnx/../_sherpa_onnx.cpython-39-x86_64-linux-gnu.so" to "$ORIGIN"
-- Installing: /mnt/matylda5/iveselyk/EU-ASR_TENDER/SHERPA_ONNX/src/sherpa-onnx/build/lib.linux-x86_64-cpython-39/sherpa_onnx/lib/libsherpa-onnx-c-api.so
-- Set non-toolchain portion of runtime path of "/mnt/matylda5/iveselyk/EU-ASR_TENDER/SHERPA_ONNX/src/sherpa-onnx/build/lib.linux-x86_64-cpython-39/sherpa_onnx/lib/libsherpa-onnx-c-api.so" to "$ORIGIN"
-- Installing: /mnt/matylda5/iveselyk/EU-ASR_TENDER/SHERPA_ONNX/src/sherpa-onnx/build/lib.linux-x86_64-cpython-39/sherpa_onnx/lib/libsherpa-onnx-cxx-api.so
-- Set non-toolchain portion of runtime path of "/mnt/matylda5/iveselyk/EU-ASR_TENDER/SHERPA_ONNX/src/sherpa-onnx/build/lib.linux-x86_64-cpython-39/sherpa_onnx/lib/libsherpa-onnx-cxx-api.so" to "$ORIGIN"
-- Up-to-date: /mnt/matylda5/iveselyk/EU-ASR_TENDER/SHERPA_ONNX/src/sherpa-onnx/build/lib.linux-x86_64-cpython-39/sherpa_onnx/include/sherpa-onnx/c-api/c-api.h
-- Up-to-date: /mnt/matylda5/iveselyk/EU-ASR_TENDER/SHERPA_ONNX/src/sherpa-onnx/build/lib.linux-x86_64-cpython-39/sherpa_onnx/include/sherpa-onnx/c-api/cxx-api.h
/mnt/matylda5/iveselyk/CNECT_TENDER/SHERPA_ONNX/CONDA_ENVIRONMENT/lib/python3.9/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.

@nshmyrev
Copy link
Contributor

Related issue #1700

Btw, I might be wrong but two blanks is not what model expects, zipformer models take -1 and blank:

https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless2/beam_search.py#L590

Two blanks are significantly worse for offline zipformer, not sure about streaming one.

@KarelVesely84 KarelVesely84 reopened this Apr 23, 2025
@KarelVesely84
Copy link
Contributor Author

sorry, the "close" was purely accidental

@KarelVesely84
Copy link
Contributor Author

KarelVesely84 commented Apr 23, 2025

Related issue #1700

Btw, I might be wrong but two blanks is not what model expects, zipformer models take -1 and blank:

https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless2/beam_search.py#L590

Two blanks are significantly worse for offline zipformer, not sure about streaming one.

aha, this is worth checking too, a comparison with the decode.py / streaming_decode.py initial symbols of the predictor in the model... (the -1 could be a sentinel value for internal sanity check)

Comment on lines 5 to 8
#include "sherpa-onnx/python/csrc/offline-tts-kokoro-model-config.h"
#include "sherpa-onnx/python/csrc/offline-tts-matcha-model-config.h"
#include "sherpa-onnx/python/csrc/offline-tts-model-config.h"
#include "sherpa-onnx/python/csrc/offline-tts-vits-model-config.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove these lines?

Copy link
Contributor Author

@KarelVesely84 KarelVesely84 Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure. I tried to keep the same .h files and replace the internals of .cc files.
So these are irrelevant for the pybind11 API for python ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried it, and rebuilt with a new buid dir. The error is still the same:

ImportError: cannot import name 'FastClustering' from '_sherpa_onnx' (/mnt/matylda5/iveselyk/CNECT_TENDER/SHERPA_ONNX/CONDA_ENVIRONMENT/lib/python3.9/site-packages/_sherpa_onnx.cpython-39-x86_64-linux-gnu.so)

@csukuangfj
Copy link
Collaborator

You need to call the functions you add in sherpa-onnx.cc

Please have a look at my comment before.

It shows you where to put the code.

Put the .cc files in CMakeLists.txt is far not enough.

@KarelVesely84 KarelVesely84 force-pushed the tranducer_reset_encoder branch from 89ca560 to 0cd9c61 Compare April 23, 2025 14:33
…ymbols (non-blank)

- added `reset_encoder` boolean member into the OnlineRecognizerConfig class
- by default the encoder is not reset
@KarelVesely84 KarelVesely84 force-pushed the tranducer_reset_encoder branch from 0cd9c61 to 60b905a Compare April 23, 2025 14:34
@KarelVesely84
Copy link
Contributor Author

okay, now it is fixed, the empty symbols are hard-coded in sherpa-onnx/python/csrc/sherpa-onnx.cc
the import sherpa_onnx in python is now okay,
the decoding of 15min long recording with OnlineRecognizer was also okay.

(I should not try to program while being too tired after lunch ;-) )

Thank you for all the suggestions and patience,
K.

Copy link
Collaborator

@csukuangfj csukuangfj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Left a minor comment. Otherwise, it looks great to me.

@@ -67,7 +67,7 @@ static void PybindOnlineRecognizerConfig(py::module *m) {
py::arg("max_active_paths") = 4, py::arg("hotwords_file") = "",
py::arg("hotwords_score") = 0, py::arg("blank_penalty") = 0.0,
py::arg("temperature_scale") = 2.0, py::arg("rule_fsts") = "",
py::arg("rule_fars") = "")
py::arg("rule_fars") = "", py::arg("reset_encoder"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
py::arg("rule_fars") = "", py::arg("reset_encoder"))
py::arg("rule_fars") = "", py::arg("reset_encoder") = false)

Can you give it a default value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, default value is in place now

@csukuangfj
Copy link
Collaborator

Thank you for your contribution!

@csukuangfj csukuangfj merged commit 6a1efd8 into k2-fsa:master Apr 24, 2025
151 of 219 checks passed
@KarelVesely84 KarelVesely84 deleted the tranducer_reset_encoder branch May 22, 2025 12:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants