online-transducer: reset the encoder toghter with 2 previous output symbols (non-blank) #2129

KarelVesely84 · 2025-04-17T09:23:45Z

I've noticed a potential issue in streaming-transducer code.

In the case of an endpoint following and empty segment,
the prev. 2 tokens in predictor are reset to 2x blank symbol.
But, the encoder states are kept.

It seems more logical to reset BOTH the encoder states and the prev.
blank symbols (i.e. to effectively start the decoding from scratch).

If the prev. 2 tokens are kept, the encoder states are also kept.

Would you agree too ?
K.

// I've tested it with custom encoder, recognizing a 16min long czech fairy tale from youtube.
// It has behaved normally, but I am not sure, how often there were 2 endpoints in a row,
// with no speech in between.

csukuangfj · 2025-04-17T09:26:07Z

I remember that for some streaming zipformer transducer model, if we also reset the encoder states, then it has trouble with recognizing the first word after reset.

KarelVesely84 · 2025-04-17T09:33:21Z

I remember that for some streaming zipformer transducer model, if we also reset the encoder states, then it has trouble with recognizing the first word after reset.

aha, that could be an issue with masking-out the initial left context, initialized to 0.0 matices. in my model this is done inside the encoder model based on the processed_lens in streaming state. with zipformer, i think this was done externally in train.py , but i did not investigate into this in zipformer...

csukuangfj · 2025-04-17T09:35:08Z

Can we add a config parameter to let the user decide whether the encode states should be reset on a detected endpoint？
We can set it to false by default。

KarelVesely84 · 2025-04-17T09:59:26Z

yes, that would be a good option fitting both ways, what would be the config parameter name ?
reset_encoder=T/F ?

csukuangfj · 2025-04-17T10:06:24Z

yes, that would be a good option fitting both ways, what would be the config parameter name ? reset_encoder=T/F ?

Yes, I agree.

KarelVesely84 · 2025-04-17T15:08:11Z

Okay, the reset_encoder option in OnlineRecognizerConfig is now added and tested locally.

I found also an issue with __init__.py when some modules are disabled in cmake.
Those symbols are then missing in the _sherpa_onnx*.so file and an import sherpa_onnx in python fails.

I am not sure how to solve this well. Either by tracking what was build.
Or by a pre-lookup for symbols in that .so file, leaving out the missing ones.
Or by some try ... except around importing each symbol, but that would be ugly.

K.

csukuangfj · 2025-04-18T03:12:39Z

I am not sure how to solve this well. Either by tracking what was build.

How about something like

#if SHERPA_ONNX_ENABLE_TTS == 1
  PybindOfflineTts(&m);
#else
  m.attr("OfflineTts") = py::none();
  m.attr("OfflineTtsConfig") = py::none();
  m.attr("OfflineTtsModelConfig") = py::none();
#endif

or we can add something like

#define SHERPA_ONNX_DUMMY_IMPL(x) m.attr(#x) = py::none()

#if SHERPA_ONNX_ENABLE_TTS == 1
  PybindOfflineTts(&m);
#else
  SHERPA_ONNX_DUMMY_IMPL(OfflineTts);
  SHERPA_ONNX_DUMMY_IMPL(OfflineTtsConfig);
  SHERPA_ONNX_DUMMY_IMPL(OfflineTtsModelConfig);
#endif

You can also use

m.attr("OfflineTts") = "Not implemented yet";

or you can assign any other expression as long as m.attr('OfflineTts') is assigned.

KarelVesely84 · 2025-04-22T12:04:12Z

I have a question. Where should this be ? In the pybind11 code ?
Or directly in the __init__.py of the installed sherpa-onnx python module ?

Currently, if let's say TTS is deactivated, also the pybind11 wrapper is deactivated:

sherpa-onnx/sherpa-onnx/python/csrc/CMakeLists.txt

Line 60 in 921c437

if(SHERPA_ONNX_ENABLE_TTS)

So, you suggest to activate all the pybind11 wrappers, and some symbols will be strings like "Deactivated in cmake" or the None symbols. And it will be driven by conditioning according to the SHERPA_ONNX_ENABLE_* macros.

And the __init__.py will stay as it is now.

Do I understand it correctly ?

K.

csukuangfj · 2025-04-22T14:12:51Z

I have a question. Where should this be ? In the pybind11 code ?

You can put it here

sherpa-onnx/sherpa-onnx/python/csrc/sherpa-onnx.cc

Lines 76 to 79 in 921c437

    
           #if SHERPA_ONNX_ENABLE_TTS == 1 
        
             PybindOfflineTts(&m); 
        
           #endif

or anywhere you think is appropriate, as long as it is in the C++ code.

And the init.py will stay as it is now.

Yes, no need to change __init__.py.

The main idea is to set the attribute of the _sherpa module so that when you use

from _sherpa import xxx

it will not throw.

in c++, if we use

m.attr("xxx") = 'yyy'

it is equivalent to

_sherpa.xxx = 'yyy'

We don't care what is assigned to _sherpa.xxx. What matters is _sherpa has an attribute called xxx so when we use

from _sherpa import xxx

it won't throw.

KarelVesely84 · 2025-04-23T11:37:37Z

I tried something in that direction.
There are 2 new files:
sherpa-onnx/python/csrc/faked-tts.cc
sherpa-onnx/python/csrc/faked-diarization.cc

And they are enabled here in python/csrc/CMakeLists.txt

However, if I compile the project, it finished correctly and installed the package.
But, then when sourcing in python import sherpa_onnx i still get the error message:

ImportError: cannot import name 'FastClustering' from '_sherpa_onnx' (/mnt/matylda5/iveselyk/CNECT_TENDER/SHERPA_ONNX/CONDA_ENVIRONMENT/lib/python3.9/site-packages/_sherpa_onnx.cpython-39-x86_64-linux-gnu.so)

Is there some other place where the "faked" wrappers sholud be enabled besides the python/csrc/CMakeLists.txt ?

My belief is that the sherpa-onnx-core is first built independently of the pybind11 wrappers.
Next all the pybind11 wrappers are compiled, and these two components are forming the library _sherpa_onnx.cpython-39-x86_64-linux-gnu.so.

I tried to inspect the symbols inside with nm -gDC _sherpa_onnx.cpython-39-x86_64-linux-gnu.so, but I could not grep even those that are clearly there... (no OnlineRecognizer, no py:: in the grep's)

KarelVesely84 · 2025-04-23T11:40:01Z

This is the example build log after changing the faked-diarization.cc:

_sherpa_onnx.cpython-39-x86_64-linux-gnu.so -- pybind11 is downloaded to /mnt/matylda5/iveselyk/EU-ASR_TENDER/SHERPA_ONNX/src/sherpa-onnx/build/temp.linux-x86_64-cpython-39/_deps/pybind11-src
-- pybind11 v2.12.0
-- PYTHON_EXECUTABLE: /mnt/matylda5/iveselyk/CNECT_TENDER/SHERPA_ONNX/CONDA_ENVIRONMENT/bin/python3
-- PYTHON_VERSION: 3.9
-- CMAKE_CXX_FLAGS:
-- CMAKE_CXX_FLAGS:
-- Configuring done (3.4s)
-- Generating done (1.8s)
-- Build files have been written to: /mnt/matylda5/iveselyk/EU-ASR_TENDER/SHERPA_ONNX/src/sherpa-onnx/build/temp.linux-x86_64-cpython-39
[  0%] Built target ssentencepiece_core
[  5%] Built target kaldi-native-fbank-core
[  9%] Built target fst
[ 10%] Built target fstfar
[ 14%] Built target kaldifst_core
[ 18%] Built target kaldi-decoder-core
[ 74%] Built target sherpa-onnx-core
[ 76%] Built target sherpa-onnx-c-api
[ 78%] Built target sherpa-onnx-cxx-api
[ 80%] Building CXX object sherpa-onnx/python/csrc/CMakeFiles/_sherpa_onnx.dir/faked-diarization.cc.o
[ 80%] Linking CXX shared module ../../../lib/_sherpa_onnx.cpython-39-x86_64-linux-gnu.so
/usr/local/bin/ld: skipping incompatible /usr/local/lib/gcc/x86_64-linux/11.5.0/../../../libc.so when searching for -lc
[100%] Built target _sherpa_onnx
Installing the project stripped...
-- Install configuration: "Debug"
-- Up-to-date: /mnt/matylda5/iveselyk/EU-ASR_TENDER/SHERPA_ONNX/src/sherpa-onnx/build/lib.linux-x86_64-cpython-39/sherpa_onnx/lib/libonnxruntime.so
-- Up-to-date: /mnt/matylda5/iveselyk/EU-ASR_TENDER/SHERPA_ONNX/src/sherpa-onnx/build/lib.linux-x86_64-cpython-39/sherpa_onnx/./sherpa-onnx.pc
-- Installing: /mnt/matylda5/iveselyk/EU-ASR_TENDER/SHERPA_ONNX/src/sherpa-onnx/build/lib.linux-x86_64-cpython-39/sherpa_onnx/../_sherpa_onnx.cpython-39-x86_64-linux-gnu.so
-- Set non-toolchain portion of runtime path of "/mnt/matylda5/iveselyk/EU-ASR_TENDER/SHERPA_ONNX/src/sherpa-onnx/build/lib.linux-x86_64-cpython-39/sherpa_onnx/../_sherpa_onnx.cpython-39-x86_64-linux-gnu.so" to "$ORIGIN"
-- Installing: /mnt/matylda5/iveselyk/EU-ASR_TENDER/SHERPA_ONNX/src/sherpa-onnx/build/lib.linux-x86_64-cpython-39/sherpa_onnx/lib/libsherpa-onnx-c-api.so
-- Set non-toolchain portion of runtime path of "/mnt/matylda5/iveselyk/EU-ASR_TENDER/SHERPA_ONNX/src/sherpa-onnx/build/lib.linux-x86_64-cpython-39/sherpa_onnx/lib/libsherpa-onnx-c-api.so" to "$ORIGIN"
-- Installing: /mnt/matylda5/iveselyk/EU-ASR_TENDER/SHERPA_ONNX/src/sherpa-onnx/build/lib.linux-x86_64-cpython-39/sherpa_onnx/lib/libsherpa-onnx-cxx-api.so
-- Set non-toolchain portion of runtime path of "/mnt/matylda5/iveselyk/EU-ASR_TENDER/SHERPA_ONNX/src/sherpa-onnx/build/lib.linux-x86_64-cpython-39/sherpa_onnx/lib/libsherpa-onnx-cxx-api.so" to "$ORIGIN"
-- Up-to-date: /mnt/matylda5/iveselyk/EU-ASR_TENDER/SHERPA_ONNX/src/sherpa-onnx/build/lib.linux-x86_64-cpython-39/sherpa_onnx/include/sherpa-onnx/c-api/c-api.h
-- Up-to-date: /mnt/matylda5/iveselyk/EU-ASR_TENDER/SHERPA_ONNX/src/sherpa-onnx/build/lib.linux-x86_64-cpython-39/sherpa_onnx/include/sherpa-onnx/c-api/cxx-api.h
/mnt/matylda5/iveselyk/CNECT_TENDER/SHERPA_ONNX/CONDA_ENVIRONMENT/lib/python3.9/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.

nshmyrev · 2025-04-23T11:41:50Z

Related issue #1700

Btw, I might be wrong but two blanks is not what model expects, zipformer models take -1 and blank:

https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless2/beam_search.py#L590

Two blanks are significantly worse for offline zipformer, not sure about streaming one.

KarelVesely84 · 2025-04-23T11:45:57Z

sorry, the "close" was purely accidental

KarelVesely84 · 2025-04-23T11:48:29Z

Related issue #1700

Btw, I might be wrong but two blanks is not what model expects, zipformer models take -1 and blank:

https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless2/beam_search.py#L590

Two blanks are significantly worse for offline zipformer, not sure about streaming one.

aha, this is worth checking too, a comparison with the decode.py / streaming_decode.py initial symbols of the predictor in the model... (the -1 could be a sentinel value for internal sanity check)

csukuangfj · 2025-04-23T11:48:44Z

sherpa-onnx/python/csrc/faked-tts.cc

+#include "sherpa-onnx/python/csrc/offline-tts-kokoro-model-config.h"
+#include "sherpa-onnx/python/csrc/offline-tts-matcha-model-config.h"
+#include "sherpa-onnx/python/csrc/offline-tts-model-config.h"
+#include "sherpa-onnx/python/csrc/offline-tts-vits-model-config.h"


can we remove these lines?

I am not sure. I tried to keep the same .h files and replace the internals of .cc files.
So these are irrelevant for the pybind11 API for python ?

I tried it, and rebuilt with a new buid dir. The error is still the same:

ImportError: cannot import name 'FastClustering' from '_sherpa_onnx' (/mnt/matylda5/iveselyk/CNECT_TENDER/SHERPA_ONNX/CONDA_ENVIRONMENT/lib/python3.9/site-packages/_sherpa_onnx.cpython-39-x86_64-linux-gnu.so)

csukuangfj · 2025-04-23T12:13:03Z

You need to call the functions you add in sherpa-onnx.cc

Please have a look at my comment before.

It shows you where to put the code.

Put the .cc files in CMakeLists.txt is far not enough.

…ymbols (non-blank) - added `reset_encoder` boolean member into the OnlineRecognizerConfig class - by default the encoder is not reset

KarelVesely84 · 2025-04-23T14:58:18Z

okay, now it is fixed, the empty symbols are hard-coded in sherpa-onnx/python/csrc/sherpa-onnx.cc
the import sherpa_onnx in python is now okay,
the decoding of 15min long recording with OnlineRecognizer was also okay.

(I should not try to program while being too tired after lunch ;-) )

Thank you for all the suggestions and patience,
K.

csukuangfj

Thanks! Left a minor comment. Otherwise, it looks great to me.

csukuangfj · 2025-04-23T15:25:42Z

sherpa-onnx/python/csrc/online-recognizer.cc

@@ -67,7 +67,7 @@ static void PybindOnlineRecognizerConfig(py::module *m) {
           py::arg("max_active_paths") = 4, py::arg("hotwords_file") = "",
           py::arg("hotwords_score") = 0, py::arg("blank_penalty") = 0.0,
           py::arg("temperature_scale") = 2.0, py::arg("rule_fsts") = "",
-           py::arg("rule_fars") = "")
+           py::arg("rule_fars") = "", py::arg("reset_encoder"))


Suggested change

py::arg("rule_fars") = "", py::arg("reset_encoder"))

py::arg("rule_fars") = "", py::arg("reset_encoder") = false)

Can you give it a default value?

okay, default value is in place now

csukuangfj · 2025-04-24T00:17:55Z

Thank you for your contribution!

KarelVesely84 closed this Apr 23, 2025

KarelVesely84 reopened this Apr 23, 2025

csukuangfj reviewed Apr 23, 2025

View reviewed changes

KarelVesely84 force-pushed the tranducer_reset_encoder branch from 89ca560 to 0cd9c61 Compare April 23, 2025 14:33

KarelVesely84 added 2 commits April 23, 2025 16:34

online-transducer: reset the encoder toghter with 2 previous output s…

5fd4d26

…ymbols (non-blank) - added `reset_encoder` boolean member into the OnlineRecognizerConfig class - by default the encoder is not reset

pybind11, adding empty symbols for disabled modules (tts, diarization)

60b905a

KarelVesely84 force-pushed the tranducer_reset_encoder branch from 0cd9c61 to 60b905a Compare April 23, 2025 14:34

csukuangfj reviewed Apr 23, 2025

View reviewed changes

reset_encoder, add default value (false) [pybind11]

13cee7a

csukuangfj merged commit 6a1efd8 into k2-fsa:master Apr 24, 2025
151 of 219 checks passed

KarelVesely84 deleted the tranducer_reset_encoder branch May 22, 2025 12:20

	py::arg("rule_fars") = "", py::arg("reset_encoder"))
	py::arg("rule_fars") = "", py::arg("reset_encoder") = false)

online-transducer: reset the encoder toghter with 2 previous output symbols (non-blank) #2129

online-transducer: reset the encoder toghter with 2 previous output symbols (non-blank) #2129

Uh oh!

Conversation

KarelVesely84 commented Apr 17, 2025

Uh oh!

csukuangfj commented Apr 17, 2025

Uh oh!

KarelVesely84 commented Apr 17, 2025

Uh oh!

csukuangfj commented Apr 17, 2025

Uh oh!

KarelVesely84 commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

csukuangfj commented Apr 17, 2025

Uh oh!

KarelVesely84 commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

csukuangfj commented Apr 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KarelVesely84 commented Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

csukuangfj commented Apr 22, 2025

Uh oh!

KarelVesely84 commented Apr 23, 2025

Uh oh!

KarelVesely84 commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nshmyrev commented Apr 23, 2025

Uh oh!

KarelVesely84 commented Apr 23, 2025

Uh oh!

KarelVesely84 commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

csukuangfj Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

KarelVesely84 Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

KarelVesely84 Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

csukuangfj commented Apr 23, 2025

Uh oh!

KarelVesely84 commented Apr 23, 2025

Uh oh!

csukuangfj left a comment

Choose a reason for hiding this comment

Uh oh!

csukuangfj Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

KarelVesely84 Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

csukuangfj commented Apr 24, 2025

Uh oh!

Uh oh!

Uh oh!

KarelVesely84 commented Apr 17, 2025 •

edited

Loading

KarelVesely84 commented Apr 17, 2025 •

edited

Loading

csukuangfj commented Apr 18, 2025 •

edited

Loading

KarelVesely84 commented Apr 22, 2025 •

edited

Loading

KarelVesely84 commented Apr 23, 2025 •

edited

Loading

KarelVesely84 commented Apr 23, 2025 •

edited

Loading

KarelVesely84 Apr 23, 2025 •

edited

Loading