Support replacing homophonic phrases #2153
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Usage
Generate a
replace.fst
. You can find an example athttps://colab.research.google.com/drive/1jEaS3s8FbRJIcVQJv2EQx19EM_mnuARi?usp=sharing
Use it with a Chinese ASR model. You can use any ASR model from sherpa-onnx as long as it outputs Chinese.
Example with sense voice
The output is given below:
If we don't use this PR, the following command
has the following output
Compare the results below:
现代测试名字识别,丹尼尔·波维林美丽、峤峤、球球、豆豆、橙橙、果果苗苗。
现代测试名字识别,丹尼尔波为林美丽、乔乔、球球、豆豆、晨晨、果果苗苗。
If you don't have access to the colab notebook, here is the code for generating
replace.fst
:Note that you need to use
to install
pynini