Origin/audio for waiting v2 #279

naveenJuspay · 2025-09-30T12:41:53Z

Summary by CodeRabbit

New Features
- Smarter audio playback lifecycle: automatically pauses during tool/function activity and can resume after completion.
- Instant audio interruption when the assistant or user starts speaking for snappier, less-overlapping responses.
- User speaking detection controls audio: stops on voice start; resumes after confirmed transcription or short timeout.
- Grace-period based, chunked audio streaming for smoother playback and quicker stops.
- Conversation turns reset audio state on real user input for clearer, turn-based interactions.
Bug Fixes
- Improved reliability of audio start/stop across rapid interactions and delayed transcriptions.

coderabbitai · 2025-09-30T12:42:01Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Adds a centralized AudioManager, integrates it with LLM and user-speaking events, and updates processing flow to stop/resume audio around user speech, LLM text, and function-call lifecycles. Inserts a new UserSpeakingAudioProcessor into the pipeline, updates LLM spy logic for immediate audio interruption, and resets audio state on genuine user messages.

Changes

Cohort / File(s)	Summary
Audio Manager `app/agents/voice/automatic/audio/audio_manager.py`	New AudioManager with 100ms chunk streaming, grace delay, start/stop/reset APIs, singleton helpers (`initialize_`, `get_`, `set_*`, `reset_for_new_input`).
Pipeline Integration `app/agents/voice/automatic/__init__.py`	Initializes audio manager; wires audio lifecycle to LLM events; inserts UserSpeakingAudioProcessor into pipeline; exposes imports.
LLM Spy Processor `app/agents/voice/automatic/processors/llm_spy.py`	Adds immediate audio stop on text frames and response end; manages function-call set start/stop; adjusts collection flow; uses AudioManager.
User Speaking Processor `app/agents/voice/automatic/processors/user_speaking_audio.py`	New processor managing audio based on user speaking, PTT, and transcription timing; waits for transcription post-PTT; coordinates enable/start/stop with AudioManager.
Conversation Manager Hooks `app/agents/voice/automatic/utils/conversation_manager.py`	Resets audio state for genuine user messages; skips reset for inferred voice inputs or function-related content; logs decisions.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant Mic as PTT/STT
  participant USP as UserSpeakingAudioProcessor
  participant LLM as LLM Pipeline
  participant AM as AudioManager
  participant TTS as TTS Transport

  rect rgba(230,245,255,0.5)
    Note over User,Mic: User begins speaking
    User->>Mic: Voice input
    Mic->>USP: Transcription frames (partial/empty)
    USP->>AM: stop() and enable_input()
  end

  alt User stops speaking
    Mic-->>USP: PTT release
    USP->>USP: Wait for transcription (timeout)
    opt Transcription arrives (non-empty)
      USP->>AM: start() after grace
      AM->>TTS: queue_frame(audio chunks)
      loop 100ms chunks
        AM->>TTS: stream next chunk
      end
    end
  end

  rect rgba(255,245,230,0.5)
    Note over LLM,AM: LLM emits text
    LLM-->>USP: LLMTextFrame
    USP->>AM: stop() immediately
  end

sequenceDiagram
  autonumber
  participant LLM as LLM/Orchestrator
  participant Spy as LLMSpyProcessor
  participant AM as AudioManager

  rect rgba(240,255,240,0.6)
    Note over LLM,Spy: Function call lifecycle
    LLM-->>Spy: FunctionCallInProgressFrame (first)
    Spy->>AM: start(function_call_set)
  end

  LLM-->>Spy: LLMTextFrame (any non-empty text)
  Spy->>AM: stop() immediately

  LLM-->>Spy: FunctionCallResultFrame (possibly final)
  alt Turn completed (all calls done)
    Spy->>AM: stop() function_call_set
  end

  LLM-->>Spy: LLMFullResponseEndFrame
  Spy->>AM: stop() if any text output

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

Thump-thump goes my fluffy ear,
I pause the tune when words appear.
When functions hum, a gentle cue—
I nibble bytes and start anew.
In hundred-millisecond hops I play,
Then hush—user speaks—okay!
Carrots queued, the audio’s true.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Title Check	⚠️ Warning	The title “Origin/audio for waiting v2” is vague and does not convey the primary intent of the changes, which introduce a centralized AudioManager, event-driven audio lifecycle hooks, and a UserSpeakingAudioProcessor; it fails to summarize the main enhancements in a concise and descriptive way.	Please update the title to clearly reflect the core changes, for example “Introduce AudioManager and UserSpeakingAudioProcessor for event-driven audio lifecycle management,” so that reviewers immediately understand the primary purpose of the PR.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 82.14% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (11)

app/agents/voice/automatic/utils/conversation_manager.py (2)
131-131: Consider moving the import to module-level.

The import of get_audio_manager is inside the method. For better performance and clarity, consider moving it to the top of the file with other imports.
 from app.agents.voice.automatic.features.charts.conversation import (
     ConversationDebugData,
     ConversationEvent,
     ConversationMessage,
     ConversationTurn,
     ToolCall,
     ToolResult,
 )
+from app.agents.voice.automatic.audio.audio_manager import get_audio_manager
 from app.core.logger import logger
Then remove the import from line 131:
     def add_user_message(
         self, session_id: str, content: str, message_id: Optional[str] = None
     ) -> ConversationMessage:
         """Add a user message and start a new turn"""
         # MINIMAL QUEUE: Only reset audio for ACTUAL user input, not function calls
-        from app.agents.voice.automatic.audio.audio_manager import get_audio_manager
         audio_manager = get_audio_manager()
135-143: Replace content-based heuristics with an explicit source flag
The logic in conversation_manager.py uses content.startswith("[Inferred from voice]") and a lowercase “function” check—tied to the default user_content value “[Inferred from voice]” in start_turn_with_events—which is brittle. Introduce an explicit enum or boolean on the message object to signal its origin instead of inferring it from the content.
app/agents/voice/automatic/__init__.py (3)
285-287: Remove unused function parameter.

The service parameter in on_llm_response_started is unused. Consider removing it or prefixing with underscore to indicate it's intentionally unused.
-    async def on_llm_response_started(service, function_calls):
+    async def on_llm_response_started(_service, function_calls):
         logger.info(f"Function calls started event triggered with {len(function_calls)} calls")
Or if the event handler signature requires it, use _ as the parameter name:
-    async def on_llm_response_started(service, function_calls):
+    async def on_llm_response_started(_service, _function_calls):
         logger.info(f"Function calls started event triggered with {len(_function_calls)} calls")
293-298: Consider encapsulating audio state manipulation.

The handler directly accesses and modifies audio_manager internal state (is_playing, loop_task). This breaks encapsulation and creates tight coupling. Consider adding a method to AudioManager like stop_without_disable() or pause_for_function_calls() to encapsulate this logic.

In audio_manager.py, add:
async def pause_for_function_calls(self):
    """Pause audio playback during function calls without disabling."""
    if self.is_playing:
        self.is_playing = False
        if self.loop_task and not self.loop_task.done():
            self.loop_task.cancel()
            try:
                await self.loop_task
            except asyncio.CancelledError:
                pass
        logger.info("Audio paused for function calls")
Then simplify the handler:
     async def on_function_calls_finished(_service):
-        # Stop audio when function calls finish but keep it enabled for potential resumption
         if not audio_manager:
             logger.warning("Audio manager not available, skipping audio control")
             return
             
-        if audio_manager.is_playing:
-            # Stop current audio but don't disable - allow resumption if no LLM text follows
-            audio_manager.is_playing = False
-            if audio_manager.loop_task and not audio_manager.loop_task.done():
-                audio_manager.loop_task.cancel()
-            logger.info("Function calls finished - stopped audio but kept enabled for resumption")
-        else:
-            logger.debug("Function calls finished - audio not playing, no action needed")
+        await audio_manager.pause_for_function_calls()
22-22: Remove unused import
The TTSSpeakFrame import on line 22 isn’t referenced or re-exported anywhere—remove it.
app/agents/voice/automatic/processors/user_speaking_audio.py (3)
40-46: Consider making transcription timeout configurable.

The _transcription_timeout is hardcoded to 3.0 seconds. Consider making this configurable via a constructor parameter or config file, especially since different STT services may have different latency characteristics.
-    def __init__(self, name: str = "UserSpeakingAudioProcessor"):
+    def __init__(self, name: str = "UserSpeakingAudioProcessor", transcription_timeout: float = 3.0):
         super().__init__(name=name)
         self._user_currently_speaking = False
         self._actual_speech_detected = False  # Track if transcription was received
         self._speech_start_time = None  # Track when speech started
         self._pending_audio_task = None  # Task waiting for transcription
-        self._transcription_timeout = 3.0  # Wait up to 3 seconds for transcription after PTT release
+        self._transcription_timeout = transcription_timeout  # Wait for transcription after PTT release
44-44: Unused state variable: _speech_start_time.

The _speech_start_time field is set on line 76 but never read. If you plan to use it for duration checks or logging, implement that logic; otherwise, remove it to reduce cognitive load.
         self._user_currently_speaking = False
         self._actual_speech_detected = False  # Track if transcription was received
-        self._speech_start_time = None  # Track when speech started
         self._pending_audio_task = None  # Task waiting for transcription
And remove the assignment on line 76:
             if not self._user_currently_speaking:
                 self._user_currently_speaking = True
                 self._actual_speech_detected = False  # Reset speech detection for new session
-                self._speech_start_time = time.time()  # Record when speech started
                 audio_manager = get_audio_manager()
61-62: Remove commented-out code.

Lines 62, 82, and 86 contain commented-out logger statements. Remove them to keep the code clean, or uncomment them if the logging is valuable for debugging.
                     # If we have a pending audio task waiting for transcription, start audio now
                     if self._pending_audio_task and not self._pending_audio_task.done():
-                        # logger.info("Starting audio due to delayed transcription")
                         audio_manager = get_audio_manager()
                     # First stop any currently playing audio immediately
                     if audio_manager.is_playing:
                         await audio_manager.stop_and_disable_audio()
-                        # logger.info("Stopped playing audio - user started speaking")
                     # Then enable for new input
                     audio_manager.set_user_input()
-                    # logger.info(f"Audio enabled - user started speaking ({type(frame).__name__})")
                 else:
Also applies to: 82-82, 86-86
app/agents/voice/automatic/audio/audio_manager.py (2)
105-136: Minor: Remove unnecessary f-string prefix.

The logic correctly implements the grace period and immediate start flows with proper cancellation handling.

At line 113, remove the f prefix since there are no placeholders:
-            logger.info(f"Grace period ended - starting audio")
+            logger.info("Grace period ended - starting audio")
174-178: Consider tracking the background task or adding error handling.

The set_bot_speaking method creates a fire-and-forget task. If stop_and_disable_audio raises an exception, it will be silently ignored, potentially leaving audio in an inconsistent state.

Store and optionally await the task, or add a done callback for error logging:
 def set_bot_speaking(self, speaking: bool):
     """Track when bot starts/stops speaking to prevent audio during speech."""
     if speaking and self.is_playing:
         logger.info("Bot started speaking - stopping audio")
-        asyncio.create_task(self.stop_and_disable_audio())
+        task = asyncio.create_task(self.stop_and_disable_audio())
+        task.add_done_callback(lambda t: logger.error(f"Error stopping audio: {t.exception()}") if t.exception() else None)
app/agents/voice/automatic/processors/llm_spy.py (1)
21-21: Optional: Remove unused import if not needed.

TTSSpeakFrame is imported but not referenced in the file. If it's not needed for future work, consider removing it to keep imports clean.
     UserStartedSpeakingFrame,
-    TTSSpeakFrame
 )

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2660c99 and d86c661.

⛔ Files ignored due to path filters (1)

app/agents/voice/automatic/audio/waiting_6sec.wav is excluded by !**/*.wav

📒 Files selected for processing (5)

app/agents/voice/automatic/__init__.py (5 hunks)
app/agents/voice/automatic/audio/audio_manager.py (1 hunks)
app/agents/voice/automatic/processors/llm_spy.py (7 hunks)
app/agents/voice/automatic/processors/user_speaking_audio.py (1 hunks)
app/agents/voice/automatic/utils/conversation_manager.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (4)

app/agents/voice/automatic/processors/llm_spy.py (3)

app/agents/voice/automatic/audio/audio_manager.py (6)

get_audio_manager (313-315)

stop_function_call_set_audio (197-218)

stop_and_disable_audio (138-163)

set_bot_speaking (174-178)

start_audio (101-103)

start_function_call_set_audio (180-195)

app/agents/voice/automatic/utils/conversation_manager.py (2)

start_turn_with_events (245-266)

add_llm_response_with_events (268-290)

app/agents/voice/automatic/rtvi/rtvi.py (1)

emit_rtvi_event (6-13)

app/agents/voice/automatic/processors/user_speaking_audio.py (1)

app/agents/voice/automatic/audio/audio_manager.py (4)

get_audio_manager (313-315)

start_audio (101-103)

stop_and_disable_audio (138-163)

set_user_input (72-75)

app/agents/voice/automatic/utils/conversation_manager.py (1)

app/agents/voice/automatic/audio/audio_manager.py (3)

get_audio_manager (313-315)

reset_for_new_input (332-336)

reset_function_call_set (220-223)

app/agents/voice/automatic/__init__.py (2)

app/agents/voice/automatic/processors/user_speaking_audio.py (1)

UserSpeakingAudioProcessor (29-131)

app/agents/voice/automatic/audio/audio_manager.py (1)

initialize_audio_manager (324-328)

🪛 Ruff (0.13.1)

app/agents/voice/automatic/audio/audio_manager.py

68-68: Do not catch blind exception: Exception

(BLE001)

113-113: f-string without any placeholders

Remove extraneous f prefix

(F541)

178-178: Store a reference to the return value of asyncio.create_task

(RUF006)

277-277: Do not catch blind exception: Exception

(BLE001)

305-305: Do not catch blind exception: Exception

(BLE001)

app/agents/voice/automatic/processors/llm_spy.py

46-46: Store a reference to the return value of asyncio.create_task

(RUF006)

50-50: Store a reference to the return value of asyncio.create_task

(RUF006)

app/agents/voice/automatic/processors/user_speaking_audio.py

130-130: Do not catch blind exception: Exception

(BLE001)

app/agents/voice/automatic/__init__.py

286-286: Unused function argument: service

(ARG001)

291-291: Unused function argument: service

(ARG001)

🔇 Additional comments (15)

app/agents/voice/automatic/__init__.py (2)

219-221: LGTM: Audio manager initialization.

The audio manager is correctly initialized early in the flow and passed the TTS service. This establishes the centralized audio control as intended.

329-331: LGTM: UserSpeakingAudioProcessor integration.

The processor is correctly instantiated and positioned in the pipeline after STT/PTT and before RTVI/context aggregation, which aligns with the intended flow for managing audio based on user speech events.

app/agents/voice/automatic/processors/user_speaking_audio.py (2)

73-88: LGTM: User started speaking flow.

The logic correctly stops any playing audio and enables the audio manager for new input. The null check and error logging provide good defensive programming.

113-131: Broad exception catch is acceptable here.

The static analysis tool flags catching Exception on line 130, but in this context it's appropriate because:

The method properly handles asyncio.CancelledError separately

Any unexpected error is logged with context

The error won't propagate and crash the processor

app/agents/voice/automatic/audio/audio_manager.py (6)

1-13: LGTM!

Imports and configuration are clean. The module uses pydub for audio manipulation—consider pinning to a maintained fork (e.g., pozalabs-pydub or pydub-ng) if you encounter compatibility issues on newer Python versions.

15-40: LGTM!

Constructor design is clean with minimal state. Synchronous audio loading in __init__ is acceptable for one-time setup.

42-70: LGTM!

Audio loading logic correctly chunks the audio into 100ms segments for fast interruption. The broad exception handler at line 68 is justified here—it provides a safe fallback for any audio loading failure while logging the error.

72-103: LGTM!

The grace period logic correctly guards against duplicate starts and properly manages task lifecycle. The static analysis warning about not storing the task reference is a false positive—the task is stored in self.delay_task at line 99.

180-280: LGTM!

Function call set audio methods correctly manage state and lifecycle. The streaming logic implements a clever timing strategy (80ms for first chunk, 100ms thereafter) to maintain a minimal 20ms buffer for seamless playback. The broad exception handler at line 277 is justified for robustness and always resets the playing state in the finally block.

282-337: LGTM!

Audio queue clearing is defensive and handles both sync/async interrupt methods correctly. The broad exception handler at line 305 is justified—it's logged at debug level and prevents clearing failures from propagating. Global singleton API is clean and straightforward.

app/agents/voice/automatic/processors/llm_spy.py (5)

182-197: LGTM!

TextFrame handling correctly stops audio immediately on non-empty bot output and notifies the audio manager that the bot is speaking. The logic properly preserves existing text sanitization flow.

203-226: LGTM!

LLM response handling correctly implements the instant-stop-on-text strategy. The response start frame prepares for collection without stopping audio, and text frames trigger immediate audio interruption on the first non-empty output. Chart accumulation is properly guarded.

228-256: LGTM!

Response end handling correctly captures text output before clearing state and implements smart audio resumption logic: if the LLM produced no text (e.g., function calls only) and the user has provided input, waiting audio resumes. This prevents silence during function execution.

258-309: LGTM!

Function call start handling correctly detects the first call in a function call set and initiates waiting audio. The check on is_function_call_set_active ensures audio starts only once per set. Existing tracing and event emission logic is properly preserved.

311-366: LGTM!

Function call result handling correctly detects turn completion (when all function calls finish) and stops the function call set audio. The detection via "turn_complete" event type is appropriate, and the guard on function_call_set_audio_playing prevents unnecessary stops. Existing result emission and chart logic is preserved.

coderabbitai · 2025-09-30T12:46:37Z

app/agents/voice/automatic/__init__.py

+    async def on_function_calls_finished(service):
+        # Stop audio when function calls finish but keep it enabled for potential resumption
+        if audio_manager.is_playing:
+            # Stop current audio but don't disable - allow resumption if no LLM text follows
+            audio_manager.is_playing = False
+            if audio_manager.loop_task and not audio_manager.loop_task.done():
+                audio_manager.loop_task.cancel()
+            logger.info("Function calls finished - stopped audio but kept enabled for resumption")
+        else:
+            logger.debug("Function calls finished - audio not playing, no action needed")


⚠️ Potential issue | 🟠 Major

Add null safety check and address unused parameter.

Two issues:

The service parameter is unused (static analysis hint)

audio_manager is accessed without verifying it was successfully initialized on line 220

If initialize_audio_manager fails or returns None, this code will raise an AttributeError.

- async def on_function_calls_finished(service): + async def on_function_calls_finished(_service): # Stop audio when function calls finish but keep it enabled for potential resumption + if not audio_manager: + logger.warning("Audio manager not available, skipping audio control") + return + if audio_manager.is_playing: # Stop current audio but don't disable - allow resumption if no LLM text follows audio_manager.is_playing = False if audio_manager.loop_task and not audio_manager.loop_task.done(): audio_manager.loop_task.cancel() logger.info("Function calls finished - stopped audio but kept enabled for resumption") else: logger.debug("Function calls finished - audio not playing, no action needed")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

async def on_function_calls_finished(service):

# Stop audio when function calls finish but keep it enabled for potential resumption

if audio_manager.is_playing:

# Stop current audio but don't disable - allow resumption if no LLM text follows

audio_manager.is_playing = False

if audio_manager.loop_task and not audio_manager.loop_task.done():

audio_manager.loop_task.cancel()

logger.info("Function calls finished - stopped audio but kept enabled for resumption")

else:

logger.debug("Function calls finished - audio not playing, no action needed")

async def on_function_calls_finished(_service):

# Stop audio when function calls finish but keep it enabled for potential resumption

if not audio_manager:

logger.warning("Audio manager not available, skipping audio control")

return

if audio_manager.is_playing:

# Stop current audio but don't disable - allow resumption if no LLM text follows

audio_manager.is_playing = False

if audio_manager.loop_task and not audio_manager.loop_task.done():

audio_manager.loop_task.cancel()

logger.info("Function calls finished - stopped audio but kept enabled for resumption")

else:

logger.debug("Function calls finished - audio not playing, no action needed")

🧰 Tools

🪛 Ruff (0.13.1)

291-291: Unused function argument: service

(ARG001)

🤖 Prompt for AI Agents

In app/agents/voice/automatic/__init__.py around lines 291 to 300, the on_function_calls_finished handler both declares an unused service parameter and accesses audio_manager without null-safety; update the signature to either remove or underscore the unused parameter (e.g., _service) to satisfy static analysis, and guard all accesses to audio_manager with a None-check before touching is_playing or loop_task (if audio_manager is None, log/debug and return). Also ensure you check for the presence of loop_task before calling done()/cancel() to avoid AttributeError.

coderabbitai · 2025-09-30T12:46:38Z

app/agents/voice/automatic/processors/llm_spy.py

+def _stop_audio_immediately(context: str = "unknown") -> bool:
+    """INSTANT audio stopping using simplified AudioManager API."""
+    audio_manager = get_audio_manager()
+    if audio_manager and (audio_manager.user_has_input or audio_manager.is_playing or audio_manager.is_in_delay_period):
+        # Stop function call set audio if active
+        if audio_manager.function_call_set_audio_playing:
+            asyncio.create_task(audio_manager.stop_function_call_set_audio())
+            logger.info(f"INSTANT STOP: Function call set audio stopped due to {context}")
+
+        # Use the simplified stop method for regular audio and grace period
+        asyncio.create_task(audio_manager.stop_and_disable_audio())
+
+        logger.info(f"INSTANT AUDIO STOP: {context}")
+        return True
+    return False


🛠️ Refactor suggestion | 🟠 Major

Track background tasks to prevent silent failures.

The _stop_audio_immediately helper creates fire-and-forget tasks at lines 46 and 50. If these async methods raise exceptions, they will be silently ignored, potentially leaving audio in an inconsistent state.

Store task references and add error handling:

def _stop_audio_immediately(context: str = "unknown") -> bool: """INSTANT audio stopping using simplified AudioManager API.""" audio_manager = get_audio_manager() if audio_manager and (audio_manager.user_has_input or audio_manager.is_playing or audio_manager.is_in_delay_period): # Stop function call set audio if active if audio_manager.function_call_set_audio_playing: - asyncio.create_task(audio_manager.stop_function_call_set_audio()) + task = asyncio.create_task(audio_manager.stop_function_call_set_audio()) + task.add_done_callback(lambda t: logger.error(f"Error stopping function call audio: {t.exception()}") if t.exception() else None) logger.info(f"INSTANT STOP: Function call set audio stopped due to {context}") # Use the simplified stop method for regular audio and grace period - asyncio.create_task(audio_manager.stop_and_disable_audio()) + task = asyncio.create_task(audio_manager.stop_and_disable_audio()) + task.add_done_callback(lambda t: logger.error(f"Error stopping audio: {t.exception()}") if t.exception() else None) logger.info(f"INSTANT AUDIO STOP: {context}") return True return False

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def _stop_audio_immediately(context: str = "unknown") -> bool:

"""INSTANT audio stopping using simplified AudioManager API."""

audio_manager = get_audio_manager()

if audio_manager and (audio_manager.user_has_input or audio_manager.is_playing or audio_manager.is_in_delay_period):

# Stop function call set audio if active

if audio_manager.function_call_set_audio_playing:

asyncio.create_task(audio_manager.stop_function_call_set_audio())

logger.info(f"INSTANT STOP: Function call set audio stopped due to {context}")

# Use the simplified stop method for regular audio and grace period

asyncio.create_task(audio_manager.stop_and_disable_audio())

logger.info(f"INSTANT AUDIO STOP: {context}")

return True

return False

def _stop_audio_immediately(context: str = "unknown") -> bool:

"""INSTANT audio stopping using simplified AudioManager API."""

audio_manager = get_audio_manager()

if audio_manager and (audio_manager.user_has_input or audio_manager.is_playing or audio_manager.is_in_delay_period):

# Stop function call set audio if active

if audio_manager.function_call_set_audio_playing:

task = asyncio.create_task(audio_manager.stop_function_call_set_audio())

task.add_done_callback(

lambda t: logger.error(f"Error stopping function call audio: {t.exception()}")

if t.exception() else None

)

logger.info(f"INSTANT STOP: Function call set audio stopped due to {context}")

# Use the simplified stop method for regular audio and grace period

task = asyncio.create_task(audio_manager.stop_and_disable_audio())

task.add_done_callback(

lambda t: logger.error(f"Error stopping audio: {t.exception()}")

if t.exception() else None

)

logger.info(f"INSTANT AUDIO STOP: {context}")

return True

return False

🧰 Tools

🪛 Ruff (0.13.1)

46-46: Store a reference to the return value of asyncio.create_task

(RUF006)

50-50: Store a reference to the return value of asyncio.create_task

(RUF006)

coderabbitai · 2025-09-30T12:46:38Z

app/agents/voice/automatic/processors/user_speaking_audio.py

+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        """Process frames and manage audio based on user speaking events."""
+        await super().process_frame(frame, direction)
+
+
+        # Handle transcription frames - detect actual speech content
+        if isinstance(frame, (TranscriptionFrame, InterimTranscriptionFrame)):
+            # Accept transcription even after PTT release (delayed transcription)
+            if frame.text.strip():
+                if not self._actual_speech_detected:
+                    self._actual_speech_detected = True
+
+                    # If we have a pending audio task waiting for transcription, start audio now
+                    if self._pending_audio_task and not self._pending_audio_task.done():
+                        # logger.info("Starting audio due to delayed transcription")
+                        audio_manager = get_audio_manager()
+                        if audio_manager:
+                            await audio_manager.start_audio()
+
+        # Handle user started speaking events
+        elif isinstance(frame, (
+            UserStartedSpeakingFrame,
+            VADUserStartedSpeakingFrame,
+            EmulateUserStartedSpeakingFrame
+        )):
+            if not self._user_currently_speaking:
+                self._user_currently_speaking = True
+                self._actual_speech_detected = False  # Reset speech detection for new session
+                self._speech_start_time = time.time()  # Record when speech started
+                audio_manager = get_audio_manager()
+                if audio_manager:
+                    # First stop any currently playing audio immediately
+                    if audio_manager.is_playing:
+                        await audio_manager.stop_and_disable_audio()
+                        # logger.info("Stopped playing audio - user started speaking")
+
+                    # Then enable for new input
+                    audio_manager.set_user_input()
+                    # logger.info(f"Audio enabled - user started speaking ({type(frame).__name__})")
+                else:
+                    logger.error("No audio manager found!")
+
+        # Handle user stopped speaking events
+        elif isinstance(frame, (
+            UserStoppedSpeakingFrame,
+            VADUserStoppedSpeakingFrame,
+            EmulateUserStoppedSpeakingFrame
+        )):
+            if self._user_currently_speaking:
+                self._user_currently_speaking = False
+                audio_manager = get_audio_manager()
+                if audio_manager:
+
+                    # Check if transcription was already detected during speaking
+                    if self._actual_speech_detected:
+                        await audio_manager.start_audio()
+                    else:
+                        # Wait for delayed transcription (common case)
+                        self._pending_audio_task = asyncio.create_task(self._wait_for_transcription())
+                else:
+                    logger.error("No audio manager found!")
+
+        # Pass frame through to next processor
+        await self.push_frame(frame, direction)


🛠️ Refactor suggestion | 🟠 Major

Add cleanup method to cancel pending tasks.

The processor creates async tasks but doesn't provide a cleanup mechanism. If the processor is destroyed or the pipeline shuts down, _pending_audio_task may remain active, leading to resource leaks or unexpected behavior.

Add a cleanup method:

async def cleanup(self): """Cancel any pending tasks on shutdown.""" if self._pending_audio_task and not self._pending_audio_task.done(): self._pending_audio_task.cancel() try: await self._pending_audio_task except asyncio.CancelledError: pass await super().cleanup()

Verify that pipecat's FrameProcessor base class calls cleanup() during pipeline shutdown, or manually invoke it in the shutdown flow.

🤖 Prompt for AI Agents

In app/agents/voice/automatic/processors/user_speaking_audio.py around lines 48 to 111, the processor creates an asyncio task in _pending_audio_task but lacks a cleanup method to cancel it, which can leak resources on shutdown; add an async cleanup(self) that cancels and awaits the pending task (handling asyncio.CancelledError) and then calls await super().cleanup(); ensure the method checks task.done() before canceling and that pipeline shutdown invokes this cleanup (or document calling it from the shutdown flow).

app/agents/voice/automatic/processors/user_speaking_audio.py

badri-singhal · 2025-10-03T05:30:12Z

app/agents/voice/automatic/audio/audio_manager.py

@@ -0,0 +1,336 @@
+"""


Move wav file to assets/sounds

coderabbitai bot reviewed Sep 30, 2025

View reviewed changes

badri-singhal reviewed Oct 3, 2025

View reviewed changes

app/agents/voice/automatic/audio/audio_manager.py

@@ -0,0 +1,336 @@

"""

Copy link

Contributor

badri-singhal Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move wav file to assets/sounds

naveenJuspay added 11 commits October 6, 2025 15:28

initial - audio for waiting

4313e1f

refactored code

a95b3a4

minor change for ptt to fix audio processing bug

409485b

removed unwanted logs

a930b1d

removed commented code

3d0fa4b

removed unused code

9bfb682

minor change

0ec3871

set audio for function calls

2a7c1e9

to 500ms

3ecefa6

100ms chunks for audio

eda19c1

added 500ms delay for initial audio play

9c29dba

naveenJuspay force-pushed the origin/audio_for_waiting_v2 branch from d86c661 to 9c29dba Compare October 6, 2025 10:36

resolved comments

fe7c255

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Origin/audio for waiting v2 #279

Origin/audio for waiting v2 #279

Uh oh!

naveenJuspay commented Sep 30, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 30, 2025 •

edited

Loading

Review skipped

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 30, 2025

Uh oh!

coderabbitai bot Sep 30, 2025

Uh oh!

coderabbitai bot Sep 30, 2025

Uh oh!

Uh oh!

badri-singhal Oct 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Origin/audio for waiting v2 #279

Are you sure you want to change the base?

Origin/audio for waiting v2 #279

Uh oh!

Conversation

naveenJuspay commented Sep 30, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

badri-singhal Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

naveenJuspay commented Sep 30, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 30, 2025 •

edited

Loading