Origin/audio for waiting #256

naveenJuspay · 2025-09-24T13:09:44Z

Added waiting audio for ai chat

Summary by CodeRabbit

New Features
- Event-driven “waiting” audio that starts after you speak and loops seamlessly.
- Instant stop of waiting audio when the assistant starts responding.
- Smarter handling of speech start/stop and late transcriptions to avoid premature audio.
Bug Fixes
- Prevents waiting audio from continuing during assistant speech.
- Removes intrusive tool-call sounds during function calls.
- Reduces edge cases where waiting audio starts without real user input.
Chores
- Improved logging and diagnostics around audio lifecycle and confirmations.

coderabbitai · 2025-09-24T13:09:52Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 78.57% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title "Origin/audio for waiting" clearly references the main feature in the changeset—adding waiting audio—but includes an extraneous branch-style prefix ("Origin/") and is slightly informal; it is related to the PR's purpose but could be clearer and more concise.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

app/agents/voice/automatic/__init__.py (1)

486-501: Guard against NameError: ptt_vad_filter may be undefined.

ptt_vad_filter is created only under config.DISABLE_VAD_FOR_PTT, but referenced unconditionally here. If the flag is false, this will crash at runtime on PTT messages.

@@
-                elif message_type == "ptt-start":
+                elif message_type == "ptt-start":
                     # Handle PTT start event
                     logger.debug("PTT started - activating VAD filter")
-                    ptt_vad_filter.set_ptt_active(True)
+                    if 'ptt_vad_filter' in locals() and ptt_vad_filter:
+                        ptt_vad_filter.set_ptt_active(True)
+                    else:
+                        logger.debug("PTT VAD filter not enabled; ignoring ptt-start")
                     # Send emulated user started speaking frame
                     await task.queue_frames([EmulateUserStartedSpeakingFrame()])
@@
-                elif message_type == "ptt-end":
+                elif message_type == "ptt-end":
                     # Handle PTT end event
                     logger.debug(
                         "PTT ended - deactivating VAD filter and sending stop frame"
                     )
-                    ptt_vad_filter.set_ptt_active(False)
+                    if 'ptt_vad_filter' in locals() and ptt_vad_filter:
+                        ptt_vad_filter.set_ptt_active(False)
+                    else:
+                        logger.debug("PTT VAD filter not enabled; ignoring ptt-end")
@@
-                elif message_type == "ptt-sync":
+                elif message_type == "ptt-sync":
                     # Handle PTT state synchronization from client
                     client_ptt_state = message.get("data", {}).get("ptt_active", False)
-                    current_state = ptt_vad_filter._ptt_active
+                    current_state = getattr(ptt_vad_filter, "_ptt_active", False) if 'ptt_vad_filter' in locals() and ptt_vad_filter else False
@@
-                    if client_ptt_state != current_state:
+                    if ('ptt_vad_filter' in locals() and ptt_vad_filter) and (client_ptt_state != current_state):
                         logger.warning(
                             f"PTT state mismatch! client: {client_ptt_state}, server: {current_state}"
                         )
                         # Sync to client state (client is authoritative)
                         ptt_vad_filter.set_ptt_active(client_ptt_state)
                         logger.info(f"PTT state synchronized to: {client_ptt_state}")
@@
-                    else:
+                    else:
                         logger.debug(
                             f"PTT state sync: states match (current_state: {current_state})"
                         )

🧹 Nitpick comments (20)

app/agents/voice/automatic/__init__.py (6)

5-6: Remove duplicate argparse import.

Second import is redundant and flagged by Ruff (F811).
-import argparse
-from dotenv import load_dotenv
+from dotenv import load_dotenv
10-11: Drop unused imports (pydub, audioop).

Neither AudioSegment nor audioop is used in this module.
-from pydub import AudioSegment
-import audioop
+
27-30: Fix duplicate/possibly unused frame imports.

OutputAudioRawFrame is already imported on Line 25 (F811).

LLMFullResponseStartFrame, FunctionCallInProgressFrame, FunctionCallResultFrame appear unused in this file. If unused, remove to keep imports tight.
-    OutputAudioRawFrame,
-    LLMFullResponseStartFrame,
-    FunctionCallInProgressFrame,
-    FunctionCallResultFrame
+    LLMFullResponseStartFrame,
+    FunctionCallInProgressFrame,
+    FunctionCallResultFrame
If the three frames are unused here, remove them as well:
-    LLMFullResponseStartFrame,
-    FunctionCallInProgressFrame,
-    FunctionCallResultFrame
+
71-82: Remove duplicated/unused re-imports.

These re-imports are duplicates of earlier imports and flagged by Ruff (F811). Also, get_audio_manager is not used in this file.
-from .stt import get_stt_service
-from .audio.audio_manager import initialize_audio_manager, get_audio_manager
-from app.agents.voice.automatic.processors.llm_spy import handle_confirmation_response
-from app.agents.voice.automatic.types import (
-    TTSProvider,
-    Mode,
-    decode_tts_provider,
-    decode_voice_name,
-    decode_mode,
-)
-from opentelemetry import trace
-from langfuse import get_client
+from .audio.audio_manager import initialize_audio_manager
91-91: Consolidate load_dotenv import.

load_dotenv is imported twice (Lines 6 and 13). Keep one.
-load_dotenv(override=True)
+load_dotenv(override=True)
Remove the duplicate import on Line 13 (no code diff shown here as it's not a changed line).

93-103: Remove dead code: tool_call_sound is loaded but never used.

This loads and parses a WAV file with no subsequent use, adding startup I/O without value.
-# Load tool call sound
-tool_call_sound = None
-if config.ENABLE_TOOL_CALL_SOUND and os.path.exists(config.TOOL_CALL_SOUND_FILE):
-    with wave.open(config.TOOL_CALL_SOUND_FILE) as audio_file:
-        tool_call_sound = OutputAudioRawFrame(
-            audio_file.readframes(-1),
-            audio_file.getframerate(),
-            audio_file.getnchannels(),
-        )
+

app/agents/voice/automatic/processors/user_speaking_audio.py (2)

84-91: Stop playing then re-enable input — tighten log and cancel pending wait.

Remove extraneous f-prefix (F541).
Cancel any pending _pending_audio_task when a new speaking session starts to avoid stray timers.

                     if audio_manager.is_playing:
                         await audio_manager.stop_and_disable_audio()
-                        logger.info(f"🛑 Stopped playing audio - user started speaking")
+                        logger.info("🛑 Stopped playing audio - user started speaking")
                     
                     # Then enable for new input
+                    if self._pending_audio_task and not self._pending_audio_task.done():
+                        self._pending_audio_task.cancel()
                     audio_manager.set_user_input()
                     logger.info(f"✅ Audio enabled - user started speaking ({type(frame).__name__})")

127-145: Avoid broad exception; log specifics.

Catching Exception is acceptable here, but consider narrowing to asyncio.TimeoutError/CancelledError and unexpected Exception separately (you already do CancelledError). Also remove extraneous f-prefix on static message if any.

-        except Exception as e:
-            logger.error(f"❌ Error in transcription wait: {e}")
+        except Exception as e:
+            logger.error(f"❌ Error in transcription wait: {e!r}")

app/agents/voice/automatic/processors/llm_spy.py (8)

21-22: Remove unused TTSSpeakFrame import.

This symbol is not used in this file.

-    UserStartedSpeakingFrame,
-    TTSSpeakFrame
+    UserStartedSpeakingFrame,

38-49: Avoid duplicating audio-stop helper; use module helper or store task handle.

You already import stop_audio_immediately but implemented a parallel _stop_audio_immediately using create_task. Prefer the module helper for consistency, or at least store the task to satisfy RUF006.

Option A (preferred): use the module helper

-from app.agents.voice.automatic.audio.audio_manager import get_audio_manager, stop_audio_immediately
-
-def _stop_audio_immediately(context: str = "unknown") -> bool:
-    """INSTANT audio stopping using simplified AudioManager API."""
-    audio_manager = get_audio_manager()
-    if audio_manager and (audio_manager.user_has_input or audio_manager.is_playing):
-        # Use the simplified stop method
-        asyncio.create_task(audio_manager.stop_and_disable_audio())
-        
-        logger.info(f"🚨 INSTANT AUDIO STOP: {context} - simplified stop triggered")
-        return True
-    return False
+from app.agents.voice.automatic.audio.audio_manager import stop_audio_immediately
+
+async def _stop_audio_immediately(context: str = "unknown") -> bool:
+    """INSTANT audio stopping using simplified AudioManager API."""
+    await stop_audio_immediately()
+    logger.info(f"🚨 INSTANT AUDIO STOP: {context} - simplified stop triggered")
+    return True

Then await the helper at call sites.

Option B: keep local helper but store the task

-        asyncio.create_task(audio_manager.stop_and_disable_audio())
+        task = asyncio.create_task(audio_manager.stop_and_disable_audio())
+        # Optional: attach a no-op done callback to surface exceptions in logs
+        task.add_done_callback(lambda t: t.exception())

177-186: Remove f-prefix from constant strings and keep single responsibility.

Lines 185/others: f-strings without placeholders (F541). Drop the f.
The set_bot_speaking(True) call is good; keep it.

-                        logger.info(f"🚨 INSTANT STOP: TextFrame detected - audio interrupted immediately")
+                        logger.info("🚨 INSTANT STOP: TextFrame detected - audio interrupted immediately")

198-213: Remove f-prefix in static log.

(F541)

-            logger.debug(f"🤖 LLM processing started - audio continues, preparing for instant stop on text output")
+            logger.debug("🤖 LLM processing started - audio continues, preparing for instant stop on text output")

216-221: Remove f-prefix in static log.

(F541)

-                logger.info(f"🚨 INSTANT STOP: LLMTextFrame detected - audio interrupted with zero delay")
+                logger.info("🚨 INSTANT STOP: LLMTextFrame detected - audio interrupted with zero delay")

227-239: Remove f-prefix in static log and ensure await if you switched helper to async.

Lines 234: static log (F541).
If you chose Option A above (async helper), remember to await _stop_audio_immediately here.

-                _stop_audio_immediately("Response Complete - Final Stop")
-                logger.info(f"🚨 FINAL STOP: LLM response complete with text output - audio fully stopped")
+                _stop_audio_immediately("Response Complete - Final Stop")
+                logger.info("🚨 FINAL STOP: LLM response complete with text output - audio fully stopped")

246-254: Remove f-prefix in static logs.

(F541)

-                    logger.info(f"🔍 LLM response ended with no text output - audio may resume if stopped")
+                    logger.info("🔍 LLM response ended with no text output - audio may resume if stopped")
@@
-                logger.debug(f"🤖 LLM response ended - bot marked as not speaking")
+                logger.debug("🤖 LLM response ended - bot marked as not speaking")

304-307: Remove f-prefix in static log.

(F541)

-            logger.debug(f"🔧 Function result: {frame.function_name} - audio continues (no interruption)")
+            logger.debug(f"🔧 Function result: {frame.function_name} - audio continues (no interruption)")

Note: This line does interpolate frame.function_name, keep f-prefix here; disregard Ruff hint if it flagged a different line.

app/agents/voice/automatic/audio/audio_manager.py (4)

94-94: Remove f-prefix in static log.

(F541)

-        logger.info(f"🎵 Started seamless audio - max 3 loops (18s total)")
+        logger.info("🎵 Started seamless audio - max 3 loops (18s total)")

129-134: Keep a reference to created task.

Store the task to satisfy RUF006 and optionally attach a callback to surface exceptions.

-            asyncio.create_task(self.stop_and_disable_audio())
+            task = asyncio.create_task(self.stop_and_disable_audio())
+            task.add_done_callback(lambda t: t.exception())

33-61: Narrow exception when loading audio.

Catching broad Exception masks issues. Consider narrowing to (FileNotFoundError, OSError, Exception from pydub) or keep broad but log repr.

-        except Exception as e:
-            logger.error(f"Failed to load waiting_{int(AUDIO_LENGTH_SECONDS)}sec.wav: {e}")
+        except Exception as e:
+            logger.error(f"Failed to load waiting_{int(AUDIO_LENGTH_SECONDS)}sec.wav: {e!r}")

198-225: Avoid broad exceptions; keep logs actionable.

These blocks are fine operationally; prefer narrowing but at minimum log repr.

-        except Exception as e:
-            logger.error(f"❌ Error in audio streaming: {e}")
+        except Exception as e:
+            logger.error(f"❌ Error in audio streaming: {e!r}")
@@
-        except Exception as e:
-            logger.debug(f"Audio queue clearing failed: {e}")
+        except Exception as e:
+            logger.debug(f"Audio queue clearing failed: {e!r}")

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 615bcc9 and bcc5a10.

⛔ Files ignored due to path filters (5)

app/agents/voice/automatic/audio/waiting_1sec.wav is excluded by !**/*.wav
app/agents/voice/automatic/audio/waiting_2sec.wav is excluded by !**/*.wav
app/agents/voice/automatic/audio/waiting_3sec.wav is excluded by !**/*.wav
app/agents/voice/automatic/audio/waiting_4sec.wav is excluded by !**/*.wav
app/agents/voice/automatic/audio/waiting_6sec.wav is excluded by !**/*.wav

📒 Files selected for processing (5)

app/agents/voice/automatic/__init__.py (6 hunks)
app/agents/voice/automatic/audio/audio_manager.py (1 hunks)
app/agents/voice/automatic/processors/llm_spy.py (5 hunks)
app/agents/voice/automatic/processors/user_speaking_audio.py (1 hunks)
app/agents/voice/automatic/utils/conversation_manager.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (4)

app/agents/voice/automatic/processors/user_speaking_audio.py (2)

app/agents/voice/automatic/audio/audio_manager.py (4)

get_audio_manager (231-233)

start_audio (68-94)

stop_and_disable_audio (96-116)

set_user_input (63-66)

app/agents/voice/automatic/processors/llm_spy.py (1)

process_frame (173-352)

app/agents/voice/automatic/processors/llm_spy.py (3)

app/agents/voice/automatic/audio/audio_manager.py (4)

get_audio_manager (231-233)

stop_audio_immediately (250-254)

stop_and_disable_audio (96-116)

set_bot_speaking (129-133)

app/agents/voice/automatic/utils/conversation_manager.py (2)

start_turn_with_events (243-264)

add_llm_response_with_events (266-288)

app/agents/voice/automatic/rtvi/rtvi.py (1)

emit_rtvi_event (6-13)

app/agents/voice/automatic/utils/conversation_manager.py (1)

app/agents/voice/automatic/audio/audio_manager.py (2)

get_audio_manager (231-233)

reset_for_new_input (285-289)

app/agents/voice/automatic/__init__.py (6)

app/agents/voice/automatic/processors/user_speaking_audio.py (1)

UserSpeakingAudioProcessor (29-145)

app/agents/voice/automatic/tts/__init__.py (1)

get_tts_service (15-67)

app/agents/voice/automatic/audio/audio_manager.py (3)

initialize_audio_manager (242-246)

get_audio_manager (231-233)

stop_and_disable_audio (96-116)

app/agents/voice/automatic/processors/llm_spy.py (1)

handle_confirmation_response (115-130)

app/agents/voice/automatic/types/models.py (2)

TTSProvider (8-10)

Mode (19-21)

app/agents/voice/automatic/types/decoders.py (3)

decode_tts_provider (4-11)

decode_voice_name (14-21)

decode_mode (24-31)

🪛 Ruff (0.13.1)

app/agents/voice/automatic/processors/user_speaking_audio.py

87-87: f-string without any placeholders

Remove extraneous f prefix

(F541)

144-144: Do not catch blind exception: Exception

(BLE001)

app/agents/voice/automatic/audio/audio_manager.py

59-59: Do not catch blind exception: Exception

(BLE001)

94-94: f-string without any placeholders

Remove extraneous f prefix

(F541)

133-133: Store a reference to the return value of asyncio.create_task

(RUF006)

192-192: Do not catch blind exception: Exception

(BLE001)

223-223: Do not catch blind exception: Exception

(BLE001)

app/agents/voice/automatic/processors/llm_spy.py

45-45: Store a reference to the return value of asyncio.create_task

(RUF006)

185-185: f-string without any placeholders

Remove extraneous f prefix

(F541)

200-200: f-string without any placeholders

Remove extraneous f prefix

(F541)

220-220: f-string without any placeholders

Remove extraneous f prefix

(F541)

234-234: f-string without any placeholders

Remove extraneous f prefix

(F541)

250-250: f-string without any placeholders

Remove extraneous f prefix

(F541)

253-253: f-string without any placeholders

Remove extraneous f prefix

(F541)

app/agents/voice/automatic/__init__.py

5-5: Redefinition of unused argparse from line 1

Remove definition: argparse

(F811)

27-27: Redefinition of unused OutputAudioRawFrame from line 25

Remove definition: OutputAudioRawFrame

(F811)

71-71: Redefinition of unused get_stt_service from line 68

Remove definition: get_stt_service

(F811)

73-73: Redefinition of unused handle_confirmation_response from line 42

Remove definition: handle_confirmation_response

(F811)

75-75: Redefinition of unused TTSProvider from line 51

Remove definition: TTSProvider

(F811)

76-76: Redefinition of unused Mode from line 50

Remove definition: Mode

(F811)

77-77: Redefinition of unused decode_tts_provider from line 53

Remove definition: decode_tts_provider

(F811)

78-78: Redefinition of unused decode_voice_name from line 54

Remove definition: decode_voice_name

(F811)

79-79: Redefinition of unused decode_mode from line 52

Remove definition: decode_mode

(F811)

81-81: Redefinition of unused trace from line 15

Remove definition: trace

(F811)

82-82: Redefinition of unused get_client from line 14

Remove definition: get_client

(F811)

304-304: Unused function argument: service

(ARG001)

334-334: Unused function argument: service

(ARG001)

345-345: Unused function argument: service

(ARG001)

🔇 Additional comments (7)

app/agents/voice/automatic/__init__.py (4)

66-66: Good: Add UserSpeakingAudioProcessor to pipeline.

Placing it after STT/PTT and before RTVI is correct for reacting to speaking/transcription frames.

237-239: Initialize audio manager once — LGTM.

Initialization early with the TTS instance aligns with the new audio lifecycle.

333-356: Stop audio on function-call completion — LGTM.

Immediate stop via audio_manager.stop_and_disable_audio() on completion is correct.

384-389: Pipeline wiring — LGTM.

UserSpeakingAudioProcessor placement and logging are appropriate.

app/agents/voice/automatic/processors/user_speaking_audio.py (1)

49-70: LGTM: Transcription-triggered start.

Starting audio when transcription arrives (even post-PTT) is aligned with the intended UX.

app/agents/voice/automatic/processors/llm_spy.py (1)

258-261: Narration-only logs — OK to keep as debug.

No functional issues; the “audio continues” decision for function calls matches PR intent.

app/agents/voice/automatic/audio/audio_manager.py (1)

68-95: LGTM: Preconditions and idempotent start.

Checks for user_has_input, chunks presence, and is_playing are solid.

coderabbitai · 2025-09-24T13:16:34Z

app/agents/voice/automatic/utils/conversation_manager.py

+        # MINIMAL QUEUE: Only reset audio for ACTUAL user input, not function calls
+        from app.agents.voice.automatic.audio.audio_manager import get_audio_manager
+        audio_manager = get_audio_manager()
+        if audio_manager:
+            # Check if this is a real user input or just function call processing
+            is_real_user_input = not content.startswith("[Inferred from voice]") and "function" not in content.lower()
+
+            if is_real_user_input:
+                audio_manager.reset_for_new_input()
+                logger.debug(f"MINIMAL QUEUE: Reset audio for real user input: {content[:50]}...")
+            else:
+                logger.debug(f"MINIMAL QUEUE: Skipping reset for function/inferred content: {content[:50]}...")
+


⚠️ Potential issue

Fix AttributeError: calling non-existent method reset_for_new_input() on AudioManager.

AudioManager exposes reset() as an instance method. reset_for_new_input() is a module-level helper (not on the instance). Current code will raise AttributeError.

- audio_manager.reset_for_new_input() + audio_manager.reset()

Alternatively, call the module-level helper:

- from app.agents.voice.automatic.audio.audio_manager import get_audio_manager + from app.agents.voice.automatic.audio.audio_manager import get_audio_manager, reset_for_new_input @@ - audio_manager.reset_for_new_input() + reset_for_new_input()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# MINIMAL QUEUE: Only reset audio for ACTUAL user input, not function calls

from app.agents.voice.automatic.audio.audio_manager import get_audio_manager

audio_manager = get_audio_manager()

if audio_manager:

# Check if this is a real user input or just function call processing

is_real_user_input = not content.startswith("[Inferred from voice]") and "function" not in content.lower()

if is_real_user_input:

audio_manager.reset_for_new_input()

logger.debug(f"MINIMAL QUEUE: Reset audio for real user input: {content[:50]}...")

else:

logger.debug(f"MINIMAL QUEUE: Skipping reset for function/inferred content: {content[:50]}...")

# MINIMAL QUEUE: Only reset audio for ACTUAL user input, not function calls

from app.agents.voice.automatic.audio.audio_manager import get_audio_manager

audio_manager = get_audio_manager()

if audio_manager:

# Check if this is a real user input or just function call processing

is_real_user_input = not content.startswith("[Inferred from voice]") and "function" not in content.lower()

if is_real_user_input:

audio_manager.reset()

logger.debug(f"MINIMAL QUEUE: Reset audio for real user input: {content[:50]}...")

else:

logger.debug(f"MINIMAL QUEUE: Skipping reset for function/inferred content: {content[:50]}...")

🤖 Prompt for AI Agents

In app/agents/voice/automatic/utils/conversation_manager.py around lines 130 to 142, the code calls audio_manager.reset_for_new_input() which doesn't exist on the AudioManager instance and raises AttributeError; replace that call with the actual instance method audio_manager.reset(), or if the intent was to use the module-level helper, import and call reset_for_new_input() from app.agents.voice.automatic.audio.audio_manager at the top and use that function instead, ensuring the chosen approach matches other callers and updating the corresponding import/usage accordingly.

Copilot

Pull Request Overview

This PR adds an event-driven waiting audio system for AI chat that provides user feedback during processing. The system starts audio after user speech ends and stops immediately when the assistant begins responding, with smart handling to avoid audio during function calls.

Introduces seamless looping waiting audio (6-second clips with chunked playback)
Implements immediate audio interruption when bot starts speaking
Adds intelligent filtering to prevent audio during function calls and tool usage

Reviewed Changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`app/agents/voice/automatic/audio/audio_manager.py`	Core audio management with chunked playback and immediate stop capabilities
`app/agents/voice/automatic/processors/user_speaking_audio.py`	Processor to handle audio based on user speaking events with transcription timing
`app/agents/voice/automatic/processors/llm_spy.py`	Enhanced LLM processor with instant audio stopping on text output
`app/agents/voice/automatic/utils/conversation_manager.py`	Audio reset logic for real user input vs function calls
`app/agents/voice/automatic/__init__.py`	Integration of audio manager and processors into pipeline

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-09-25T08:33:13Z

app/agents/voice/automatic/processors/user_speaking_audio.py

+    def __init__(self, name: str = "UserSpeakingAudioProcessor"):
+        super().__init__(name=name)
+        self._user_currently_speaking = False
+        self._actual_speech_detected = False  # Track if transcription was received
+        self._speech_start_time = None  # Track when speech started
+        self._min_speech_duration = 2.0  # Minimum duration (seconds) for fallback (only for very long holds)
+        self._pending_audio_task = None  # Task waiting for transcription
+        self._transcription_timeout = 3.0  # Wait up to 3 seconds for transcription after PTT release


This hardcoded timeout value should be configurable. Consider making it a class parameter or configuration constant to allow tuning without code changes.

Suggested change

def __init__(self, name: str = "UserSpeakingAudioProcessor"):

super().__init__(name=name)

self._user_currently_speaking = False

self._actual_speech_detected = False # Track if transcription was received

self._speech_start_time = None # Track when speech started

self._min_speech_duration = 2.0 # Minimum duration (seconds) for fallback (only for very long holds)

self._pending_audio_task = None # Task waiting for transcription

self._transcription_timeout = 3.0 # Wait up to 3 seconds for transcription after PTT release

def __init__(self, name: str = "UserSpeakingAudioProcessor", transcription_timeout: float = 3.0):

super().__init__(name=name)

self._user_currently_speaking = False

self._actual_speech_detected = False # Track if transcription was received

self._speech_start_time = None # Track when speech started

self._min_speech_duration = 2.0 # Minimum duration (seconds) for fallback (only for very long holds)

self._pending_audio_task = None # Task waiting for transcription

self._transcription_timeout = transcription_timeout # Wait up to transcription_timeout seconds for transcription after PTT release

Copilot · 2025-09-25T08:33:13Z

app/agents/voice/automatic/processors/user_speaking_audio.py

+        self._user_currently_speaking = False
+        self._actual_speech_detected = False  # Track if transcription was received
+        self._speech_start_time = None  # Track when speech started
+        self._min_speech_duration = 2.0  # Minimum duration (seconds) for fallback (only for very long holds)


This minimum speech duration is defined but never used in the code. Either implement the duration check or remove this unused variable.

Copilot · 2025-09-25T08:33:13Z

app/agents/voice/automatic/processors/llm_spy.py

+def _stop_audio_immediately(context: str = "unknown") -> bool:
+    """INSTANT audio stopping using simplified AudioManager API."""
+    audio_manager = get_audio_manager()
+    if audio_manager and (audio_manager.user_has_input or audio_manager.is_playing):
+        # Use the simplified stop method
+        asyncio.create_task(audio_manager.stop_and_disable_audio())
+
+        logger.info(f"INSTANT AUDIO STOP: {context}")
+        return True
+    return False


Creating a fire-and-forget task with asyncio.create_task() in a non-async function can lead to race conditions and unhandled exceptions. Consider making this function async or properly awaiting the task in the caller.

Copilot · 2025-09-25T08:33:14Z

app/agents/voice/automatic/audio/audio_manager.py

+from app.core.logger import logger
+
+# Configurable audio length constant
+AUDIO_LENGTH_SECONDS = 6  # Default duration in seconds


This global constant should be configurable through the configuration system rather than hardcoded. Consider moving it to the config module or making it a parameter.

Copilot · 2025-09-25T08:33:14Z

app/agents/voice/automatic/__init__.py

+    # LLM response started handler - DO NOT start audio here
+    # Audio should only start when user stops speaking, not when LLM processes
+    @llm.event_handler("on_llm_response_started")
+    async def on_llm_response_started(service,function_calls):


Missing space after comma in function parameter list. Should be service, function_calls.

Suggested change

async def on_llm_response_started(service,function_calls):

async def on_llm_response_started(service, function_calls):

Copilot · 2025-09-25T08:33:14Z

app/agents/voice/automatic/__init__.py

+from pydub import AudioSegment
+import audioop

 from dotenv import load_dotenv


Duplicate import of load_dotenv - it's imported again on line 13 and was already imported earlier. Remove the duplicate import.

Suggested change

from dotenv import load_dotenv

naveenJuspay added 3 commits September 24, 2025 18:14

initial - audio for waiting

cf9afad

refactored code

702ece2

minor change for ptt to fix audio processing bug

bcc5a10

coderabbitai bot reviewed Sep 24, 2025

View reviewed changes

naveenJuspay added 2 commits September 25, 2025 13:33

removed unwanted logs

3a1de70

removed commented code

2c7ed25

murdore requested a review from Copilot September 25, 2025 08:31

Copilot AI reviewed Sep 25, 2025

View reviewed changes

naveenJuspay added 2 commits September 26, 2025 13:16

removed unused code

e55280a

minor change

bd96a7e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Origin/audio for waiting #256

Origin/audio for waiting #256

Uh oh!

naveenJuspay commented Sep 24, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 24, 2025 •

edited

Loading

Review skipped

Other AI code review bot(s) detected

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 24, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Sep 25, 2025

Uh oh!

Copilot AI Sep 25, 2025

Uh oh!

Copilot AI Sep 25, 2025

Uh oh!

Copilot AI Sep 25, 2025

Uh oh!

Copilot AI Sep 25, 2025

Uh oh!

Copilot AI Sep 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	async def on_llm_response_started(service,function_calls):
	async def on_llm_response_started(service, function_calls):

Origin/audio for waiting #256

Are you sure you want to change the base?

Origin/audio for waiting #256

Uh oh!

Conversation

naveenJuspay commented Sep 24, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Other AI code review bot(s) detected

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

naveenJuspay commented Sep 24, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 24, 2025 •

edited

Loading