Skip to content

streamlit-webrct doesn't work with faster-whisper model #252

@aqiao

Description

@aqiao

Hi, i'm working on voice translator app, and trying streamlit-webrtc with faster-whisper model
I noticed this link :https://github.com/whitphx/streamlit-webrtc/blob/main/pages/10_sendonly_audio.py
and found there is a while loop to handle audio data like below:

while True:
    if webrtc_ctx.audio_receiver:
        try:
            audio_frames = webrtc_ctx.audio_receiver.get_frames(timeout=1)
        except queue.Empty:
            logger.warning("Queue is empty. Abort.")
            break

        sound_chunk = pydub.AudioSegment.empty()
        for audio_frame in audio_frames:
            sound = pydub.AudioSegment(
                data=audio_frame.to_ndarray().tobytes(),
                sample_width=audio_frame.format.bytes,
                frame_rate=audio_frame.sample_rate,
                channels=len(audio_frame.layout.channels),
...


Doesn't it block UI thread since the while loop is in UI thread in my point view, and once i use below code, the page is pending.
Below is my code, please help to figure out how to fix it

webrtc_ctx = webrtc_streamer(
    key="speech-to-text",
    mode=WebRtcMode.SENDONLY,
    audio_receiver_size=10240,
    rtc_configuration={"iceServers": [{"urls": ["stun:stun.l.google.com:19302"]}]},
    media_stream_constraints={"video": False, "audio": True},
)

status_indicator = st.empty()
text_output = st.empty()
stream = None

while True:
    if webrtc_ctx.audio_receiver:
        sound_chunk = pydub.AudioSegment.empty()
        try:
            audio_frames = webrtc_ctx.audio_receiver.get_frames(timeout=1)
        except queue.Empty:
            time.sleep(0.1)
            status_indicator.write("No frame arrived.")

        status_indicator.write("Running. Say something!")

        for audio_frame in audio_frames:
            sound = pydub.AudioSegment(
                data=audio_frame.to_ndarray().tobytes(),
                sample_width=audio_frame.format.bytes,
                frame_rate=audio_frame.sample_rate,
                channels=len(audio_frame.layout.channels),
            )
            sound_chunk += sound
        print(sound_chunk)
        if len(sound_chunk) > 0:
            sound_chunk = sound_chunk.set_channels(1).set_frame_rate(44100)
            buffer = np.array(sound_chunk.get_array_of_samples())

            segments, _ = WhisperModel(model_size
                                       , device=f"{'cuda' if supports_gpu else 'cpu'}"
                                       , compute_type=compute_type).transcribe(buffer)
            transcript = " ".join(segment.text for segment in segments)
            print(transcript)
            text = " ".join(segment.text for segment in segments)
            text_output.markdown(f"**Text:** {text}")
    else:
        status_indicator.write("AudioReciver is not set. Abort.")

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions