Skip to content

Conversation

@naveenJuspay
Copy link

@naveenJuspay naveenJuspay commented Sep 30, 2025

Summary by CodeRabbit

  • New Features
    • Smarter audio playback lifecycle: automatically pauses during tool/function activity and can resume after completion.
    • Instant audio interruption when the assistant or user starts speaking for snappier, less-overlapping responses.
    • User speaking detection controls audio: stops on voice start; resumes after confirmed transcription or short timeout.
    • Grace-period based, chunked audio streaming for smoother playback and quicker stops.
    • Conversation turns reset audio state on real user input for clearer, turn-based interactions.
  • Bug Fixes
    • Improved reliability of audio start/stop across rapid interactions and delayed transcriptions.

@coderabbitai
Copy link

coderabbitai bot commented Sep 30, 2025

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Adds a centralized AudioManager, integrates it with LLM and user-speaking events, and updates processing flow to stop/resume audio around user speech, LLM text, and function-call lifecycles. Inserts a new UserSpeakingAudioProcessor into the pipeline, updates LLM spy logic for immediate audio interruption, and resets audio state on genuine user messages.

Changes

Cohort / File(s) Summary
Audio Manager
app/agents/voice/automatic/audio/audio_manager.py
New AudioManager with 100ms chunk streaming, grace delay, start/stop/reset APIs, singleton helpers (initialize_*, get_*, set_*, reset_for_new_input).
Pipeline Integration
app/agents/voice/automatic/__init__.py
Initializes audio manager; wires audio lifecycle to LLM events; inserts UserSpeakingAudioProcessor into pipeline; exposes imports.
LLM Spy Processor
app/agents/voice/automatic/processors/llm_spy.py
Adds immediate audio stop on text frames and response end; manages function-call set start/stop; adjusts collection flow; uses AudioManager.
User Speaking Processor
app/agents/voice/automatic/processors/user_speaking_audio.py
New processor managing audio based on user speaking, PTT, and transcription timing; waits for transcription post-PTT; coordinates enable/start/stop with AudioManager.
Conversation Manager Hooks
app/agents/voice/automatic/utils/conversation_manager.py
Resets audio state for genuine user messages; skips reset for inferred voice inputs or function-related content; logs decisions.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant Mic as PTT/STT
  participant USP as UserSpeakingAudioProcessor
  participant LLM as LLM Pipeline
  participant AM as AudioManager
  participant TTS as TTS Transport

  rect rgba(230,245,255,0.5)
    Note over User,Mic: User begins speaking
    User->>Mic: Voice input
    Mic->>USP: Transcription frames (partial/empty)
    USP->>AM: stop() and enable_input()
  end

  alt User stops speaking
    Mic-->>USP: PTT release
    USP->>USP: Wait for transcription (timeout)
    opt Transcription arrives (non-empty)
      USP->>AM: start() after grace
      AM->>TTS: queue_frame(audio chunks)
      loop 100ms chunks
        AM->>TTS: stream next chunk
      end
    end
  end

  rect rgba(255,245,230,0.5)
    Note over LLM,AM: LLM emits text
    LLM-->>USP: LLMTextFrame
    USP->>AM: stop() immediately
  end
Loading
sequenceDiagram
  autonumber
  participant LLM as LLM/Orchestrator
  participant Spy as LLMSpyProcessor
  participant AM as AudioManager

  rect rgba(240,255,240,0.6)
    Note over LLM,Spy: Function call lifecycle
    LLM-->>Spy: FunctionCallInProgressFrame (first)
    Spy->>AM: start(function_call_set)
  end

  LLM-->>Spy: LLMTextFrame (any non-empty text)
  Spy->>AM: stop() immediately

  LLM-->>Spy: FunctionCallResultFrame (possibly final)
  alt Turn completed (all calls done)
    Spy->>AM: stop() function_call_set
  end

  LLM-->>Spy: LLMFullResponseEndFrame
  Spy->>AM: stop() if any text output
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

Thump-thump goes my fluffy ear,
I pause the tune when words appear.
When functions hum, a gentle cue—
I nibble bytes and start anew.
In hundred-millisecond hops I play,
Then hush—user speaks—okay!
Carrots queued, the audio’s true.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Title Check ⚠️ Warning The title “Origin/audio for waiting v2” is vague and does not convey the primary intent of the changes, which introduce a centralized AudioManager, event-driven audio lifecycle hooks, and a UserSpeakingAudioProcessor; it fails to summarize the main enhancements in a concise and descriptive way. Please update the title to clearly reflect the core changes, for example “Introduce AudioManager and UserSpeakingAudioProcessor for event-driven audio lifecycle management,” so that reviewers immediately understand the primary purpose of the PR.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 82.14% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (11)
app/agents/voice/automatic/utils/conversation_manager.py (2)

131-131: Consider moving the import to module-level.

The import of get_audio_manager is inside the method. For better performance and clarity, consider moving it to the top of the file with other imports.

 from app.agents.voice.automatic.features.charts.conversation import (
     ConversationDebugData,
     ConversationEvent,
     ConversationMessage,
     ConversationTurn,
     ToolCall,
     ToolResult,
 )
+from app.agents.voice.automatic.audio.audio_manager import get_audio_manager
 from app.core.logger import logger

Then remove the import from line 131:

     def add_user_message(
         self, session_id: str, content: str, message_id: Optional[str] = None
     ) -> ConversationMessage:
         """Add a user message and start a new turn"""
         # MINIMAL QUEUE: Only reset audio for ACTUAL user input, not function calls
-        from app.agents.voice.automatic.audio.audio_manager import get_audio_manager
         audio_manager = get_audio_manager()

135-143: Replace content-based heuristics with an explicit source flag
The logic in conversation_manager.py uses content.startswith("[Inferred from voice]") and a lowercase “function” check—tied to the default user_content value “[Inferred from voice]” in start_turn_with_events—which is brittle. Introduce an explicit enum or boolean on the message object to signal its origin instead of inferring it from the content.

app/agents/voice/automatic/__init__.py (3)

285-287: Remove unused function parameter.

The service parameter in on_llm_response_started is unused. Consider removing it or prefixing with underscore to indicate it's intentionally unused.

-    async def on_llm_response_started(service, function_calls):
+    async def on_llm_response_started(_service, function_calls):
         logger.info(f"Function calls started event triggered with {len(function_calls)} calls")

Or if the event handler signature requires it, use _ as the parameter name:

-    async def on_llm_response_started(service, function_calls):
+    async def on_llm_response_started(_service, _function_calls):
         logger.info(f"Function calls started event triggered with {len(_function_calls)} calls")

293-298: Consider encapsulating audio state manipulation.

The handler directly accesses and modifies audio_manager internal state (is_playing, loop_task). This breaks encapsulation and creates tight coupling. Consider adding a method to AudioManager like stop_without_disable() or pause_for_function_calls() to encapsulate this logic.

In audio_manager.py, add:

async def pause_for_function_calls(self):
    """Pause audio playback during function calls without disabling."""
    if self.is_playing:
        self.is_playing = False
        if self.loop_task and not self.loop_task.done():
            self.loop_task.cancel()
            try:
                await self.loop_task
            except asyncio.CancelledError:
                pass
        logger.info("Audio paused for function calls")

Then simplify the handler:

     async def on_function_calls_finished(_service):
-        # Stop audio when function calls finish but keep it enabled for potential resumption
         if not audio_manager:
             logger.warning("Audio manager not available, skipping audio control")
             return
             
-        if audio_manager.is_playing:
-            # Stop current audio but don't disable - allow resumption if no LLM text follows
-            audio_manager.is_playing = False
-            if audio_manager.loop_task and not audio_manager.loop_task.done():
-                audio_manager.loop_task.cancel()
-            logger.info("Function calls finished - stopped audio but kept enabled for resumption")
-        else:
-            logger.debug("Function calls finished - audio not playing, no action needed")
+        await audio_manager.pause_for_function_calls()

22-22: Remove unused import
The TTSSpeakFrame import on line 22 isn’t referenced or re-exported anywhere—remove it.

app/agents/voice/automatic/processors/user_speaking_audio.py (3)

40-46: Consider making transcription timeout configurable.

The _transcription_timeout is hardcoded to 3.0 seconds. Consider making this configurable via a constructor parameter or config file, especially since different STT services may have different latency characteristics.

-    def __init__(self, name: str = "UserSpeakingAudioProcessor"):
+    def __init__(self, name: str = "UserSpeakingAudioProcessor", transcription_timeout: float = 3.0):
         super().__init__(name=name)
         self._user_currently_speaking = False
         self._actual_speech_detected = False  # Track if transcription was received
         self._speech_start_time = None  # Track when speech started
         self._pending_audio_task = None  # Task waiting for transcription
-        self._transcription_timeout = 3.0  # Wait up to 3 seconds for transcription after PTT release
+        self._transcription_timeout = transcription_timeout  # Wait for transcription after PTT release

44-44: Unused state variable: _speech_start_time.

The _speech_start_time field is set on line 76 but never read. If you plan to use it for duration checks or logging, implement that logic; otherwise, remove it to reduce cognitive load.

         self._user_currently_speaking = False
         self._actual_speech_detected = False  # Track if transcription was received
-        self._speech_start_time = None  # Track when speech started
         self._pending_audio_task = None  # Task waiting for transcription

And remove the assignment on line 76:

             if not self._user_currently_speaking:
                 self._user_currently_speaking = True
                 self._actual_speech_detected = False  # Reset speech detection for new session
-                self._speech_start_time = time.time()  # Record when speech started
                 audio_manager = get_audio_manager()

61-62: Remove commented-out code.

Lines 62, 82, and 86 contain commented-out logger statements. Remove them to keep the code clean, or uncomment them if the logging is valuable for debugging.

                     # If we have a pending audio task waiting for transcription, start audio now
                     if self._pending_audio_task and not self._pending_audio_task.done():
-                        # logger.info("Starting audio due to delayed transcription")
                         audio_manager = get_audio_manager()
                     # First stop any currently playing audio immediately
                     if audio_manager.is_playing:
                         await audio_manager.stop_and_disable_audio()
-                        # logger.info("Stopped playing audio - user started speaking")
                     # Then enable for new input
                     audio_manager.set_user_input()
-                    # logger.info(f"Audio enabled - user started speaking ({type(frame).__name__})")
                 else:

Also applies to: 82-82, 86-86

app/agents/voice/automatic/audio/audio_manager.py (2)

105-136: Minor: Remove unnecessary f-string prefix.

The logic correctly implements the grace period and immediate start flows with proper cancellation handling.

At line 113, remove the f prefix since there are no placeholders:

-            logger.info(f"Grace period ended - starting audio")
+            logger.info("Grace period ended - starting audio")

174-178: Consider tracking the background task or adding error handling.

The set_bot_speaking method creates a fire-and-forget task. If stop_and_disable_audio raises an exception, it will be silently ignored, potentially leaving audio in an inconsistent state.

Store and optionally await the task, or add a done callback for error logging:

 def set_bot_speaking(self, speaking: bool):
     """Track when bot starts/stops speaking to prevent audio during speech."""
     if speaking and self.is_playing:
         logger.info("Bot started speaking - stopping audio")
-        asyncio.create_task(self.stop_and_disable_audio())
+        task = asyncio.create_task(self.stop_and_disable_audio())
+        task.add_done_callback(lambda t: logger.error(f"Error stopping audio: {t.exception()}") if t.exception() else None)
app/agents/voice/automatic/processors/llm_spy.py (1)

21-21: Optional: Remove unused import if not needed.

TTSSpeakFrame is imported but not referenced in the file. If it's not needed for future work, consider removing it to keep imports clean.

     UserStartedSpeakingFrame,
-    TTSSpeakFrame
 )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2660c99 and d86c661.

⛔ Files ignored due to path filters (1)
  • app/agents/voice/automatic/audio/waiting_6sec.wav is excluded by !**/*.wav
📒 Files selected for processing (5)
  • app/agents/voice/automatic/__init__.py (5 hunks)
  • app/agents/voice/automatic/audio/audio_manager.py (1 hunks)
  • app/agents/voice/automatic/processors/llm_spy.py (7 hunks)
  • app/agents/voice/automatic/processors/user_speaking_audio.py (1 hunks)
  • app/agents/voice/automatic/utils/conversation_manager.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
app/agents/voice/automatic/processors/llm_spy.py (3)
app/agents/voice/automatic/audio/audio_manager.py (6)
  • get_audio_manager (313-315)
  • stop_function_call_set_audio (197-218)
  • stop_and_disable_audio (138-163)
  • set_bot_speaking (174-178)
  • start_audio (101-103)
  • start_function_call_set_audio (180-195)
app/agents/voice/automatic/utils/conversation_manager.py (2)
  • start_turn_with_events (245-266)
  • add_llm_response_with_events (268-290)
app/agents/voice/automatic/rtvi/rtvi.py (1)
  • emit_rtvi_event (6-13)
app/agents/voice/automatic/processors/user_speaking_audio.py (1)
app/agents/voice/automatic/audio/audio_manager.py (4)
  • get_audio_manager (313-315)
  • start_audio (101-103)
  • stop_and_disable_audio (138-163)
  • set_user_input (72-75)
app/agents/voice/automatic/utils/conversation_manager.py (1)
app/agents/voice/automatic/audio/audio_manager.py (3)
  • get_audio_manager (313-315)
  • reset_for_new_input (332-336)
  • reset_function_call_set (220-223)
app/agents/voice/automatic/__init__.py (2)
app/agents/voice/automatic/processors/user_speaking_audio.py (1)
  • UserSpeakingAudioProcessor (29-131)
app/agents/voice/automatic/audio/audio_manager.py (1)
  • initialize_audio_manager (324-328)
🪛 Ruff (0.13.1)
app/agents/voice/automatic/audio/audio_manager.py

68-68: Do not catch blind exception: Exception

(BLE001)


113-113: f-string without any placeholders

Remove extraneous f prefix

(F541)


178-178: Store a reference to the return value of asyncio.create_task

(RUF006)


277-277: Do not catch blind exception: Exception

(BLE001)


305-305: Do not catch blind exception: Exception

(BLE001)

app/agents/voice/automatic/processors/llm_spy.py

46-46: Store a reference to the return value of asyncio.create_task

(RUF006)


50-50: Store a reference to the return value of asyncio.create_task

(RUF006)

app/agents/voice/automatic/processors/user_speaking_audio.py

130-130: Do not catch blind exception: Exception

(BLE001)

app/agents/voice/automatic/__init__.py

286-286: Unused function argument: service

(ARG001)


291-291: Unused function argument: service

(ARG001)

🔇 Additional comments (15)
app/agents/voice/automatic/__init__.py (2)

219-221: LGTM: Audio manager initialization.

The audio manager is correctly initialized early in the flow and passed the TTS service. This establishes the centralized audio control as intended.


329-331: LGTM: UserSpeakingAudioProcessor integration.

The processor is correctly instantiated and positioned in the pipeline after STT/PTT and before RTVI/context aggregation, which aligns with the intended flow for managing audio based on user speech events.

app/agents/voice/automatic/processors/user_speaking_audio.py (2)

73-88: LGTM: User started speaking flow.

The logic correctly stops any playing audio and enables the audio manager for new input. The null check and error logging provide good defensive programming.


113-131: Broad exception catch is acceptable here.

The static analysis tool flags catching Exception on line 130, but in this context it's appropriate because:

  • The method properly handles asyncio.CancelledError separately
  • Any unexpected error is logged with context
  • The error won't propagate and crash the processor
app/agents/voice/automatic/audio/audio_manager.py (6)

1-13: LGTM!

Imports and configuration are clean. The module uses pydub for audio manipulation—consider pinning to a maintained fork (e.g., pozalabs-pydub or pydub-ng) if you encounter compatibility issues on newer Python versions.


15-40: LGTM!

Constructor design is clean with minimal state. Synchronous audio loading in __init__ is acceptable for one-time setup.


42-70: LGTM!

Audio loading logic correctly chunks the audio into 100ms segments for fast interruption. The broad exception handler at line 68 is justified here—it provides a safe fallback for any audio loading failure while logging the error.


72-103: LGTM!

The grace period logic correctly guards against duplicate starts and properly manages task lifecycle. The static analysis warning about not storing the task reference is a false positive—the task is stored in self.delay_task at line 99.


180-280: LGTM!

Function call set audio methods correctly manage state and lifecycle. The streaming logic implements a clever timing strategy (80ms for first chunk, 100ms thereafter) to maintain a minimal 20ms buffer for seamless playback. The broad exception handler at line 277 is justified for robustness and always resets the playing state in the finally block.


282-337: LGTM!

Audio queue clearing is defensive and handles both sync/async interrupt methods correctly. The broad exception handler at line 305 is justified—it's logged at debug level and prevents clearing failures from propagating. Global singleton API is clean and straightforward.

app/agents/voice/automatic/processors/llm_spy.py (5)

182-197: LGTM!

TextFrame handling correctly stops audio immediately on non-empty bot output and notifies the audio manager that the bot is speaking. The logic properly preserves existing text sanitization flow.


203-226: LGTM!

LLM response handling correctly implements the instant-stop-on-text strategy. The response start frame prepares for collection without stopping audio, and text frames trigger immediate audio interruption on the first non-empty output. Chart accumulation is properly guarded.


228-256: LGTM!

Response end handling correctly captures text output before clearing state and implements smart audio resumption logic: if the LLM produced no text (e.g., function calls only) and the user has provided input, waiting audio resumes. This prevents silence during function execution.


258-309: LGTM!

Function call start handling correctly detects the first call in a function call set and initiates waiting audio. The check on is_function_call_set_active ensures audio starts only once per set. Existing tracing and event emission logic is properly preserved.


311-366: LGTM!

Function call result handling correctly detects turn completion (when all function calls finish) and stops the function call set audio. The detection via "turn_complete" event type is appropriate, and the guard on function_call_set_audio_playing prevents unnecessary stops. Existing result emission and chart logic is preserved.

Comment on lines 291 to 461
async def on_function_calls_finished(service):
# Stop audio when function calls finish but keep it enabled for potential resumption
if audio_manager.is_playing:
# Stop current audio but don't disable - allow resumption if no LLM text follows
audio_manager.is_playing = False
if audio_manager.loop_task and not audio_manager.loop_task.done():
audio_manager.loop_task.cancel()
logger.info("Function calls finished - stopped audio but kept enabled for resumption")
else:
logger.debug("Function calls finished - audio not playing, no action needed")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add null safety check and address unused parameter.

Two issues:

  1. The service parameter is unused (static analysis hint)
  2. audio_manager is accessed without verifying it was successfully initialized on line 220

If initialize_audio_manager fails or returns None, this code will raise an AttributeError.

-    async def on_function_calls_finished(service):
+    async def on_function_calls_finished(_service):
         # Stop audio when function calls finish but keep it enabled for potential resumption
+        if not audio_manager:
+            logger.warning("Audio manager not available, skipping audio control")
+            return
+            
         if audio_manager.is_playing:
             # Stop current audio but don't disable - allow resumption if no LLM text follows
             audio_manager.is_playing = False
             if audio_manager.loop_task and not audio_manager.loop_task.done():
                 audio_manager.loop_task.cancel()
             logger.info("Function calls finished - stopped audio but kept enabled for resumption")
         else:
             logger.debug("Function calls finished - audio not playing, no action needed")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def on_function_calls_finished(service):
# Stop audio when function calls finish but keep it enabled for potential resumption
if audio_manager.is_playing:
# Stop current audio but don't disable - allow resumption if no LLM text follows
audio_manager.is_playing = False
if audio_manager.loop_task and not audio_manager.loop_task.done():
audio_manager.loop_task.cancel()
logger.info("Function calls finished - stopped audio but kept enabled for resumption")
else:
logger.debug("Function calls finished - audio not playing, no action needed")
async def on_function_calls_finished(_service):
# Stop audio when function calls finish but keep it enabled for potential resumption
if not audio_manager:
logger.warning("Audio manager not available, skipping audio control")
return
if audio_manager.is_playing:
# Stop current audio but don't disable - allow resumption if no LLM text follows
audio_manager.is_playing = False
if audio_manager.loop_task and not audio_manager.loop_task.done():
audio_manager.loop_task.cancel()
logger.info("Function calls finished - stopped audio but kept enabled for resumption")
else:
logger.debug("Function calls finished - audio not playing, no action needed")
🧰 Tools
🪛 Ruff (0.13.1)

291-291: Unused function argument: service

(ARG001)

🤖 Prompt for AI Agents
In app/agents/voice/automatic/__init__.py around lines 291 to 300, the
on_function_calls_finished handler both declares an unused service parameter and
accesses audio_manager without null-safety; update the signature to either
remove or underscore the unused parameter (e.g., _service) to satisfy static
analysis, and guard all accesses to audio_manager with a None-check before
touching is_playing or loop_task (if audio_manager is None, log/debug and
return). Also ensure you check for the presence of loop_task before calling
done()/cancel() to avoid AttributeError.

Comment on lines 40 to 59
def _stop_audio_immediately(context: str = "unknown") -> bool:
"""INSTANT audio stopping using simplified AudioManager API."""
audio_manager = get_audio_manager()
if audio_manager and (audio_manager.user_has_input or audio_manager.is_playing or audio_manager.is_in_delay_period):
# Stop function call set audio if active
if audio_manager.function_call_set_audio_playing:
asyncio.create_task(audio_manager.stop_function_call_set_audio())
logger.info(f"INSTANT STOP: Function call set audio stopped due to {context}")

# Use the simplified stop method for regular audio and grace period
asyncio.create_task(audio_manager.stop_and_disable_audio())

logger.info(f"INSTANT AUDIO STOP: {context}")
return True
return False
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Track background tasks to prevent silent failures.

The _stop_audio_immediately helper creates fire-and-forget tasks at lines 46 and 50. If these async methods raise exceptions, they will be silently ignored, potentially leaving audio in an inconsistent state.

Store task references and add error handling:

 def _stop_audio_immediately(context: str = "unknown") -> bool:
     """INSTANT audio stopping using simplified AudioManager API."""
     audio_manager = get_audio_manager()
     if audio_manager and (audio_manager.user_has_input or audio_manager.is_playing or audio_manager.is_in_delay_period):
         # Stop function call set audio if active
         if audio_manager.function_call_set_audio_playing:
-            asyncio.create_task(audio_manager.stop_function_call_set_audio())
+            task = asyncio.create_task(audio_manager.stop_function_call_set_audio())
+            task.add_done_callback(lambda t: logger.error(f"Error stopping function call audio: {t.exception()}") if t.exception() else None)
             logger.info(f"INSTANT STOP: Function call set audio stopped due to {context}")
         
         # Use the simplified stop method for regular audio and grace period
-        asyncio.create_task(audio_manager.stop_and_disable_audio())
+        task = asyncio.create_task(audio_manager.stop_and_disable_audio())
+        task.add_done_callback(lambda t: logger.error(f"Error stopping audio: {t.exception()}") if t.exception() else None)
         
         logger.info(f"INSTANT AUDIO STOP: {context}")
         return True
     return False
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _stop_audio_immediately(context: str = "unknown") -> bool:
"""INSTANT audio stopping using simplified AudioManager API."""
audio_manager = get_audio_manager()
if audio_manager and (audio_manager.user_has_input or audio_manager.is_playing or audio_manager.is_in_delay_period):
# Stop function call set audio if active
if audio_manager.function_call_set_audio_playing:
asyncio.create_task(audio_manager.stop_function_call_set_audio())
logger.info(f"INSTANT STOP: Function call set audio stopped due to {context}")
# Use the simplified stop method for regular audio and grace period
asyncio.create_task(audio_manager.stop_and_disable_audio())
logger.info(f"INSTANT AUDIO STOP: {context}")
return True
return False
def _stop_audio_immediately(context: str = "unknown") -> bool:
"""INSTANT audio stopping using simplified AudioManager API."""
audio_manager = get_audio_manager()
if audio_manager and (audio_manager.user_has_input or audio_manager.is_playing or audio_manager.is_in_delay_period):
# Stop function call set audio if active
if audio_manager.function_call_set_audio_playing:
task = asyncio.create_task(audio_manager.stop_function_call_set_audio())
task.add_done_callback(
lambda t: logger.error(f"Error stopping function call audio: {t.exception()}")
if t.exception() else None
)
logger.info(f"INSTANT STOP: Function call set audio stopped due to {context}")
# Use the simplified stop method for regular audio and grace period
task = asyncio.create_task(audio_manager.stop_and_disable_audio())
task.add_done_callback(
lambda t: logger.error(f"Error stopping audio: {t.exception()}")
if t.exception() else None
)
logger.info(f"INSTANT AUDIO STOP: {context}")
return True
return False
🧰 Tools
🪛 Ruff (0.13.1)

46-46: Store a reference to the return value of asyncio.create_task

(RUF006)


50-50: Store a reference to the return value of asyncio.create_task

(RUF006)

Comment on lines 48 to 111
async def process_frame(self, frame: Frame, direction: FrameDirection):
"""Process frames and manage audio based on user speaking events."""
await super().process_frame(frame, direction)


# Handle transcription frames - detect actual speech content
if isinstance(frame, (TranscriptionFrame, InterimTranscriptionFrame)):
# Accept transcription even after PTT release (delayed transcription)
if frame.text.strip():
if not self._actual_speech_detected:
self._actual_speech_detected = True

# If we have a pending audio task waiting for transcription, start audio now
if self._pending_audio_task and not self._pending_audio_task.done():
# logger.info("Starting audio due to delayed transcription")
audio_manager = get_audio_manager()
if audio_manager:
await audio_manager.start_audio()

# Handle user started speaking events
elif isinstance(frame, (
UserStartedSpeakingFrame,
VADUserStartedSpeakingFrame,
EmulateUserStartedSpeakingFrame
)):
if not self._user_currently_speaking:
self._user_currently_speaking = True
self._actual_speech_detected = False # Reset speech detection for new session
self._speech_start_time = time.time() # Record when speech started
audio_manager = get_audio_manager()
if audio_manager:
# First stop any currently playing audio immediately
if audio_manager.is_playing:
await audio_manager.stop_and_disable_audio()
# logger.info("Stopped playing audio - user started speaking")

# Then enable for new input
audio_manager.set_user_input()
# logger.info(f"Audio enabled - user started speaking ({type(frame).__name__})")
else:
logger.error("No audio manager found!")

# Handle user stopped speaking events
elif isinstance(frame, (
UserStoppedSpeakingFrame,
VADUserStoppedSpeakingFrame,
EmulateUserStoppedSpeakingFrame
)):
if self._user_currently_speaking:
self._user_currently_speaking = False
audio_manager = get_audio_manager()
if audio_manager:

# Check if transcription was already detected during speaking
if self._actual_speech_detected:
await audio_manager.start_audio()
else:
# Wait for delayed transcription (common case)
self._pending_audio_task = asyncio.create_task(self._wait_for_transcription())
else:
logger.error("No audio manager found!")

# Pass frame through to next processor
await self.push_frame(frame, direction)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add cleanup method to cancel pending tasks.

The processor creates async tasks but doesn't provide a cleanup mechanism. If the processor is destroyed or the pipeline shuts down, _pending_audio_task may remain active, leading to resource leaks or unexpected behavior.

Add a cleanup method:

async def cleanup(self):
    """Cancel any pending tasks on shutdown."""
    if self._pending_audio_task and not self._pending_audio_task.done():
        self._pending_audio_task.cancel()
        try:
            await self._pending_audio_task
        except asyncio.CancelledError:
            pass
    await super().cleanup()

Verify that pipecat's FrameProcessor base class calls cleanup() during pipeline shutdown, or manually invoke it in the shutdown flow.

🤖 Prompt for AI Agents
In app/agents/voice/automatic/processors/user_speaking_audio.py around lines 48
to 111, the processor creates an asyncio task in _pending_audio_task but lacks a
cleanup method to cancel it, which can leak resources on shutdown; add an async
cleanup(self) that cancels and awaits the pending task (handling
asyncio.CancelledError) and then calls await super().cleanup(); ensure the
method checks task.done() before canceling and that pipeline shutdown invokes
this cleanup (or document calling it from the shutdown flow).

@@ -0,0 +1,336 @@
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move wav file to assets/sounds

@naveenJuspay naveenJuspay force-pushed the origin/audio_for_waiting_v2 branch from d86c661 to 9c29dba Compare October 6, 2025 10:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants