Skip to content

ASR incorrectly received the TTS voice and repeating ... #379

@phoenixdna

Description

@phoenixdna

I found quite frustrating while I trying to use the TTS combine with azure 's ASR . For some reason, The TTS output was received by ASR incorrectly even if I mute the microphone with Pyaudio. So please someone help

The Code pieces

def text_to_speech(text):
    
    speech_config2 = speechsdk.SpeechConfig(subscription='xxx', region='eastasia')
    audio_config2 = speechsdk.audio.AudioOutputConfig(use_default_speaker=True)

    # The neural multilingual voice can speak different languages based on the input text.
    speech_config2.speech_synthesis_voice_name='zh-CN-XiaoyiNeural'
    speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config2, audio_config=audio_config2)

    # Get text from the console and synthesize to the default speaker.
    

    global asr_active
    asr_active = False
    mute_microphone()
    #speech_recognizer.stop_continuous_recognition()
    time.sleep(1)
    print("《《《《《《《《《《《tts=>>>",text)
    speech_synthesis_result = speech_synthesizer.speak_text_async(text).get()
    print("tts=>》》》》》》》》》》》》》》》》》>>")

    time.sleep(1)
    asr_active = True
    unmute_microphone()

    #speech_recognizer.start_continuous_recognition()

System output

final response recived
model response: 很抱歉,我无法提供实时的日期信息。请您查看您的电子设备或询问您的语音助手来获取今天的日期。
microphone muted
《《《《《《《《《《《tts=>>> 很抱歉,我无法提供实时的日期信息。请您查看您的电子设备或询问您的语音助手来获取今天的日期。
tts=>》》》》》》》》》》》》》》》》》>>
Speech synthesized for text [很抱歉,我无法提供实时的日期信息。请您查看您的电子设备或询问您的语音助手来获取今天的日期。]
microphone UNmuted
RECOGNIZED: SpeechRecognitionEventArgs(session_id=cf14ed11f79a444faa988fab687687c0, result=SpeechRecognitionResult(result_id=96877e5119b74d2e9f7b394072741b87, text="很抱歉,我无法生气。", reason=ResultReason.RecognizedSpeech))
sent to model reached

Question:

As you can seen in the code, even I muted the microphone during speech_synthesizer.speak_text_async(text).get(), I still can see some of the words has been incorrectly received in speech_recognizer,

RECOGNIZED: SpeechRecognitionEventArgs(session_id=cf14ed11f79a444faa988fab687687c0, result=SpeechRecognitionResult(result_id=96877e5119b74d2e9f7b394072741b87, text="很抱歉,我无法生气。",

I don't know why ASR can recognize the TTS voice although I have muted microphone, actrually I also add the active flag in callback function of SpeechRecognizer.recognized, but no help..., so please help ,Thx in advance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions