Skip to content

Conversation

@naveenJuspay
Copy link

@naveenJuspay naveenJuspay commented Sep 24, 2025

Added waiting audio for ai chat

Summary by CodeRabbit

  • New Features
    • Event-driven “waiting” audio that starts after you speak and loops seamlessly.
    • Instant stop of waiting audio when the assistant starts responding.
    • Smarter handling of speech start/stop and late transcriptions to avoid premature audio.
  • Bug Fixes
    • Prevents waiting audio from continuing during assistant speech.
    • Removes intrusive tool-call sounds during function calls.
    • Reduces edge cases where waiting audio starts without real user input.
  • Chores
    • Improved logging and diagnostics around audio lifecycle and confirmations.

@coderabbitai
Copy link

coderabbitai bot commented Sep 24, 2025

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 78.57% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "Origin/audio for waiting" clearly references the main feature in the changeset—adding waiting audio—but includes an extraneous branch-style prefix ("Origin/") and is slightly informal; it is related to the PR's purpose but could be clearer and more concise.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
app/agents/voice/automatic/__init__.py (1)

486-501: Guard against NameError: ptt_vad_filter may be undefined.

ptt_vad_filter is created only under config.DISABLE_VAD_FOR_PTT, but referenced unconditionally here. If the flag is false, this will crash at runtime on PTT messages.

@@
-                elif message_type == "ptt-start":
+                elif message_type == "ptt-start":
                     # Handle PTT start event
                     logger.debug("PTT started - activating VAD filter")
-                    ptt_vad_filter.set_ptt_active(True)
+                    if 'ptt_vad_filter' in locals() and ptt_vad_filter:
+                        ptt_vad_filter.set_ptt_active(True)
+                    else:
+                        logger.debug("PTT VAD filter not enabled; ignoring ptt-start")
                     # Send emulated user started speaking frame
                     await task.queue_frames([EmulateUserStartedSpeakingFrame()])
@@
-                elif message_type == "ptt-end":
+                elif message_type == "ptt-end":
                     # Handle PTT end event
                     logger.debug(
                         "PTT ended - deactivating VAD filter and sending stop frame"
                     )
-                    ptt_vad_filter.set_ptt_active(False)
+                    if 'ptt_vad_filter' in locals() and ptt_vad_filter:
+                        ptt_vad_filter.set_ptt_active(False)
+                    else:
+                        logger.debug("PTT VAD filter not enabled; ignoring ptt-end")
@@
-                elif message_type == "ptt-sync":
+                elif message_type == "ptt-sync":
                     # Handle PTT state synchronization from client
                     client_ptt_state = message.get("data", {}).get("ptt_active", False)
-                    current_state = ptt_vad_filter._ptt_active
+                    current_state = getattr(ptt_vad_filter, "_ptt_active", False) if 'ptt_vad_filter' in locals() and ptt_vad_filter else False
@@
-                    if client_ptt_state != current_state:
+                    if ('ptt_vad_filter' in locals() and ptt_vad_filter) and (client_ptt_state != current_state):
                         logger.warning(
                             f"PTT state mismatch! client: {client_ptt_state}, server: {current_state}"
                         )
                         # Sync to client state (client is authoritative)
                         ptt_vad_filter.set_ptt_active(client_ptt_state)
                         logger.info(f"PTT state synchronized to: {client_ptt_state}")
@@
-                    else:
+                    else:
                         logger.debug(
                             f"PTT state sync: states match (current_state: {current_state})"
                         )
🧹 Nitpick comments (20)
app/agents/voice/automatic/__init__.py (6)

5-6: Remove duplicate argparse import.

Second import is redundant and flagged by Ruff (F811).

-import argparse
-from dotenv import load_dotenv
+from dotenv import load_dotenv

10-11: Drop unused imports (pydub, audioop).

Neither AudioSegment nor audioop is used in this module.

-from pydub import AudioSegment
-import audioop
+

27-30: Fix duplicate/possibly unused frame imports.

  • OutputAudioRawFrame is already imported on Line 25 (F811).
  • LLMFullResponseStartFrame, FunctionCallInProgressFrame, FunctionCallResultFrame appear unused in this file. If unused, remove to keep imports tight.
-    OutputAudioRawFrame,
-    LLMFullResponseStartFrame,
-    FunctionCallInProgressFrame,
-    FunctionCallResultFrame
+    LLMFullResponseStartFrame,
+    FunctionCallInProgressFrame,
+    FunctionCallResultFrame

If the three frames are unused here, remove them as well:

-    LLMFullResponseStartFrame,
-    FunctionCallInProgressFrame,
-    FunctionCallResultFrame
+

71-82: Remove duplicated/unused re-imports.

These re-imports are duplicates of earlier imports and flagged by Ruff (F811). Also, get_audio_manager is not used in this file.

-from .stt import get_stt_service
-from .audio.audio_manager import initialize_audio_manager, get_audio_manager
-from app.agents.voice.automatic.processors.llm_spy import handle_confirmation_response
-from app.agents.voice.automatic.types import (
-    TTSProvider,
-    Mode,
-    decode_tts_provider,
-    decode_voice_name,
-    decode_mode,
-)
-from opentelemetry import trace
-from langfuse import get_client
+from .audio.audio_manager import initialize_audio_manager

91-91: Consolidate load_dotenv import.

load_dotenv is imported twice (Lines 6 and 13). Keep one.

-load_dotenv(override=True)
+load_dotenv(override=True)

Remove the duplicate import on Line 13 (no code diff shown here as it's not a changed line).


93-103: Remove dead code: tool_call_sound is loaded but never used.

This loads and parses a WAV file with no subsequent use, adding startup I/O without value.

-# Load tool call sound
-tool_call_sound = None
-if config.ENABLE_TOOL_CALL_SOUND and os.path.exists(config.TOOL_CALL_SOUND_FILE):
-    with wave.open(config.TOOL_CALL_SOUND_FILE) as audio_file:
-        tool_call_sound = OutputAudioRawFrame(
-            audio_file.readframes(-1),
-            audio_file.getframerate(),
-            audio_file.getnchannels(),
-        )
+
app/agents/voice/automatic/processors/user_speaking_audio.py (2)

84-91: Stop playing then re-enable input — tighten log and cancel pending wait.

  • Remove extraneous f-prefix (F541).
  • Cancel any pending _pending_audio_task when a new speaking session starts to avoid stray timers.
                     if audio_manager.is_playing:
                         await audio_manager.stop_and_disable_audio()
-                        logger.info(f"🛑 Stopped playing audio - user started speaking")
+                        logger.info("🛑 Stopped playing audio - user started speaking")
                     
                     # Then enable for new input
+                    if self._pending_audio_task and not self._pending_audio_task.done():
+                        self._pending_audio_task.cancel()
                     audio_manager.set_user_input()
                     logger.info(f"✅ Audio enabled - user started speaking ({type(frame).__name__})")

127-145: Avoid broad exception; log specifics.

Catching Exception is acceptable here, but consider narrowing to asyncio.TimeoutError/CancelledError and unexpected Exception separately (you already do CancelledError). Also remove extraneous f-prefix on static message if any.

-        except Exception as e:
-            logger.error(f"❌ Error in transcription wait: {e}")
+        except Exception as e:
+            logger.error(f"❌ Error in transcription wait: {e!r}")
app/agents/voice/automatic/processors/llm_spy.py (8)

21-22: Remove unused TTSSpeakFrame import.

This symbol is not used in this file.

-    UserStartedSpeakingFrame,
-    TTSSpeakFrame
+    UserStartedSpeakingFrame,

38-49: Avoid duplicating audio-stop helper; use module helper or store task handle.

You already import stop_audio_immediately but implemented a parallel _stop_audio_immediately using create_task. Prefer the module helper for consistency, or at least store the task to satisfy RUF006.

Option A (preferred): use the module helper

-from app.agents.voice.automatic.audio.audio_manager import get_audio_manager, stop_audio_immediately
-
-def _stop_audio_immediately(context: str = "unknown") -> bool:
-    """INSTANT audio stopping using simplified AudioManager API."""
-    audio_manager = get_audio_manager()
-    if audio_manager and (audio_manager.user_has_input or audio_manager.is_playing):
-        # Use the simplified stop method
-        asyncio.create_task(audio_manager.stop_and_disable_audio())
-        
-        logger.info(f"🚨 INSTANT AUDIO STOP: {context} - simplified stop triggered")
-        return True
-    return False
+from app.agents.voice.automatic.audio.audio_manager import stop_audio_immediately
+
+async def _stop_audio_immediately(context: str = "unknown") -> bool:
+    """INSTANT audio stopping using simplified AudioManager API."""
+    await stop_audio_immediately()
+    logger.info(f"🚨 INSTANT AUDIO STOP: {context} - simplified stop triggered")
+    return True

Then await the helper at call sites.

Option B: keep local helper but store the task

-        asyncio.create_task(audio_manager.stop_and_disable_audio())
+        task = asyncio.create_task(audio_manager.stop_and_disable_audio())
+        # Optional: attach a no-op done callback to surface exceptions in logs
+        task.add_done_callback(lambda t: t.exception())

177-186: Remove f-prefix from constant strings and keep single responsibility.

  • Lines 185/others: f-strings without placeholders (F541). Drop the f.
  • The set_bot_speaking(True) call is good; keep it.
-                        logger.info(f"🚨 INSTANT STOP: TextFrame detected - audio interrupted immediately")
+                        logger.info("🚨 INSTANT STOP: TextFrame detected - audio interrupted immediately")

198-213: Remove f-prefix in static log.

(F541)

-            logger.debug(f"🤖 LLM processing started - audio continues, preparing for instant stop on text output")
+            logger.debug("🤖 LLM processing started - audio continues, preparing for instant stop on text output")

216-221: Remove f-prefix in static log.

(F541)

-                logger.info(f"🚨 INSTANT STOP: LLMTextFrame detected - audio interrupted with zero delay")
+                logger.info("🚨 INSTANT STOP: LLMTextFrame detected - audio interrupted with zero delay")

227-239: Remove f-prefix in static log and ensure await if you switched helper to async.

  • Lines 234: static log (F541).
  • If you chose Option A above (async helper), remember to await _stop_audio_immediately here.
-                _stop_audio_immediately("Response Complete - Final Stop")
-                logger.info(f"🚨 FINAL STOP: LLM response complete with text output - audio fully stopped")
+                _stop_audio_immediately("Response Complete - Final Stop")
+                logger.info("🚨 FINAL STOP: LLM response complete with text output - audio fully stopped")

246-254: Remove f-prefix in static logs.

(F541)

-                    logger.info(f"🔍 LLM response ended with no text output - audio may resume if stopped")
+                    logger.info("🔍 LLM response ended with no text output - audio may resume if stopped")
@@
-                logger.debug(f"🤖 LLM response ended - bot marked as not speaking")
+                logger.debug("🤖 LLM response ended - bot marked as not speaking")

304-307: Remove f-prefix in static log.

(F541)

-            logger.debug(f"🔧 Function result: {frame.function_name} - audio continues (no interruption)")
+            logger.debug(f"🔧 Function result: {frame.function_name} - audio continues (no interruption)")

Note: This line does interpolate frame.function_name, keep f-prefix here; disregard Ruff hint if it flagged a different line.

app/agents/voice/automatic/audio/audio_manager.py (4)

94-94: Remove f-prefix in static log.

(F541)

-        logger.info(f"🎵 Started seamless audio - max 3 loops (18s total)")
+        logger.info("🎵 Started seamless audio - max 3 loops (18s total)")

129-134: Keep a reference to created task.

Store the task to satisfy RUF006 and optionally attach a callback to surface exceptions.

-            asyncio.create_task(self.stop_and_disable_audio())
+            task = asyncio.create_task(self.stop_and_disable_audio())
+            task.add_done_callback(lambda t: t.exception())

33-61: Narrow exception when loading audio.

Catching broad Exception masks issues. Consider narrowing to (FileNotFoundError, OSError, Exception from pydub) or keep broad but log repr.

-        except Exception as e:
-            logger.error(f"Failed to load waiting_{int(AUDIO_LENGTH_SECONDS)}sec.wav: {e}")
+        except Exception as e:
+            logger.error(f"Failed to load waiting_{int(AUDIO_LENGTH_SECONDS)}sec.wav: {e!r}")

198-225: Avoid broad exceptions; keep logs actionable.

These blocks are fine operationally; prefer narrowing but at minimum log repr.

-        except Exception as e:
-            logger.error(f"❌ Error in audio streaming: {e}")
+        except Exception as e:
+            logger.error(f"❌ Error in audio streaming: {e!r}")
@@
-        except Exception as e:
-            logger.debug(f"Audio queue clearing failed: {e}")
+        except Exception as e:
+            logger.debug(f"Audio queue clearing failed: {e!r}")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 615bcc9 and bcc5a10.

⛔ Files ignored due to path filters (5)
  • app/agents/voice/automatic/audio/waiting_1sec.wav is excluded by !**/*.wav
  • app/agents/voice/automatic/audio/waiting_2sec.wav is excluded by !**/*.wav
  • app/agents/voice/automatic/audio/waiting_3sec.wav is excluded by !**/*.wav
  • app/agents/voice/automatic/audio/waiting_4sec.wav is excluded by !**/*.wav
  • app/agents/voice/automatic/audio/waiting_6sec.wav is excluded by !**/*.wav
📒 Files selected for processing (5)
  • app/agents/voice/automatic/__init__.py (6 hunks)
  • app/agents/voice/automatic/audio/audio_manager.py (1 hunks)
  • app/agents/voice/automatic/processors/llm_spy.py (5 hunks)
  • app/agents/voice/automatic/processors/user_speaking_audio.py (1 hunks)
  • app/agents/voice/automatic/utils/conversation_manager.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
app/agents/voice/automatic/processors/user_speaking_audio.py (2)
app/agents/voice/automatic/audio/audio_manager.py (4)
  • get_audio_manager (231-233)
  • start_audio (68-94)
  • stop_and_disable_audio (96-116)
  • set_user_input (63-66)
app/agents/voice/automatic/processors/llm_spy.py (1)
  • process_frame (173-352)
app/agents/voice/automatic/processors/llm_spy.py (3)
app/agents/voice/automatic/audio/audio_manager.py (4)
  • get_audio_manager (231-233)
  • stop_audio_immediately (250-254)
  • stop_and_disable_audio (96-116)
  • set_bot_speaking (129-133)
app/agents/voice/automatic/utils/conversation_manager.py (2)
  • start_turn_with_events (243-264)
  • add_llm_response_with_events (266-288)
app/agents/voice/automatic/rtvi/rtvi.py (1)
  • emit_rtvi_event (6-13)
app/agents/voice/automatic/utils/conversation_manager.py (1)
app/agents/voice/automatic/audio/audio_manager.py (2)
  • get_audio_manager (231-233)
  • reset_for_new_input (285-289)
app/agents/voice/automatic/__init__.py (6)
app/agents/voice/automatic/processors/user_speaking_audio.py (1)
  • UserSpeakingAudioProcessor (29-145)
app/agents/voice/automatic/tts/__init__.py (1)
  • get_tts_service (15-67)
app/agents/voice/automatic/audio/audio_manager.py (3)
  • initialize_audio_manager (242-246)
  • get_audio_manager (231-233)
  • stop_and_disable_audio (96-116)
app/agents/voice/automatic/processors/llm_spy.py (1)
  • handle_confirmation_response (115-130)
app/agents/voice/automatic/types/models.py (2)
  • TTSProvider (8-10)
  • Mode (19-21)
app/agents/voice/automatic/types/decoders.py (3)
  • decode_tts_provider (4-11)
  • decode_voice_name (14-21)
  • decode_mode (24-31)
🪛 Ruff (0.13.1)
app/agents/voice/automatic/processors/user_speaking_audio.py

87-87: f-string without any placeholders

Remove extraneous f prefix

(F541)


144-144: Do not catch blind exception: Exception

(BLE001)

app/agents/voice/automatic/audio/audio_manager.py

59-59: Do not catch blind exception: Exception

(BLE001)


94-94: f-string without any placeholders

Remove extraneous f prefix

(F541)


133-133: Store a reference to the return value of asyncio.create_task

(RUF006)


192-192: Do not catch blind exception: Exception

(BLE001)


223-223: Do not catch blind exception: Exception

(BLE001)

app/agents/voice/automatic/processors/llm_spy.py

45-45: Store a reference to the return value of asyncio.create_task

(RUF006)


185-185: f-string without any placeholders

Remove extraneous f prefix

(F541)


200-200: f-string without any placeholders

Remove extraneous f prefix

(F541)


220-220: f-string without any placeholders

Remove extraneous f prefix

(F541)


234-234: f-string without any placeholders

Remove extraneous f prefix

(F541)


250-250: f-string without any placeholders

Remove extraneous f prefix

(F541)


253-253: f-string without any placeholders

Remove extraneous f prefix

(F541)

app/agents/voice/automatic/__init__.py

5-5: Redefinition of unused argparse from line 1

Remove definition: argparse

(F811)


27-27: Redefinition of unused OutputAudioRawFrame from line 25

Remove definition: OutputAudioRawFrame

(F811)


71-71: Redefinition of unused get_stt_service from line 68

Remove definition: get_stt_service

(F811)


73-73: Redefinition of unused handle_confirmation_response from line 42

Remove definition: handle_confirmation_response

(F811)


75-75: Redefinition of unused TTSProvider from line 51

Remove definition: TTSProvider

(F811)


76-76: Redefinition of unused Mode from line 50

Remove definition: Mode

(F811)


77-77: Redefinition of unused decode_tts_provider from line 53

Remove definition: decode_tts_provider

(F811)


78-78: Redefinition of unused decode_voice_name from line 54

Remove definition: decode_voice_name

(F811)


79-79: Redefinition of unused decode_mode from line 52

Remove definition: decode_mode

(F811)


81-81: Redefinition of unused trace from line 15

Remove definition: trace

(F811)


82-82: Redefinition of unused get_client from line 14

Remove definition: get_client

(F811)


304-304: Unused function argument: service

(ARG001)


334-334: Unused function argument: service

(ARG001)


345-345: Unused function argument: service

(ARG001)

🔇 Additional comments (7)
app/agents/voice/automatic/__init__.py (4)

66-66: Good: Add UserSpeakingAudioProcessor to pipeline.

Placing it after STT/PTT and before RTVI is correct for reacting to speaking/transcription frames.


237-239: Initialize audio manager once — LGTM.

Initialization early with the TTS instance aligns with the new audio lifecycle.


333-356: Stop audio on function-call completion — LGTM.

Immediate stop via audio_manager.stop_and_disable_audio() on completion is correct.


384-389: Pipeline wiring — LGTM.

UserSpeakingAudioProcessor placement and logging are appropriate.

app/agents/voice/automatic/processors/user_speaking_audio.py (1)

49-70: LGTM: Transcription-triggered start.

Starting audio when transcription arrives (even post-PTT) is aligned with the intended UX.

app/agents/voice/automatic/processors/llm_spy.py (1)

258-261: Narration-only logs — OK to keep as debug.

No functional issues; the “audio continues” decision for function calls matches PR intent.

app/agents/voice/automatic/audio/audio_manager.py (1)

68-95: LGTM: Preconditions and idempotent start.

Checks for user_has_input, chunks presence, and is_playing are solid.

Comment on lines +130 to +142
# MINIMAL QUEUE: Only reset audio for ACTUAL user input, not function calls
from app.agents.voice.automatic.audio.audio_manager import get_audio_manager
audio_manager = get_audio_manager()
if audio_manager:
# Check if this is a real user input or just function call processing
is_real_user_input = not content.startswith("[Inferred from voice]") and "function" not in content.lower()

if is_real_user_input:
audio_manager.reset_for_new_input()
logger.debug(f"MINIMAL QUEUE: Reset audio for real user input: {content[:50]}...")
else:
logger.debug(f"MINIMAL QUEUE: Skipping reset for function/inferred content: {content[:50]}...")

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix AttributeError: calling non-existent method reset_for_new_input() on AudioManager.

AudioManager exposes reset() as an instance method. reset_for_new_input() is a module-level helper (not on the instance). Current code will raise AttributeError.

-                audio_manager.reset_for_new_input()
+                audio_manager.reset()

Alternatively, call the module-level helper:

-        from app.agents.voice.automatic.audio.audio_manager import get_audio_manager
+        from app.agents.voice.automatic.audio.audio_manager import get_audio_manager, reset_for_new_input
@@
-                audio_manager.reset_for_new_input()
+                reset_for_new_input()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# MINIMAL QUEUE: Only reset audio for ACTUAL user input, not function calls
from app.agents.voice.automatic.audio.audio_manager import get_audio_manager
audio_manager = get_audio_manager()
if audio_manager:
# Check if this is a real user input or just function call processing
is_real_user_input = not content.startswith("[Inferred from voice]") and "function" not in content.lower()
if is_real_user_input:
audio_manager.reset_for_new_input()
logger.debug(f"MINIMAL QUEUE: Reset audio for real user input: {content[:50]}...")
else:
logger.debug(f"MINIMAL QUEUE: Skipping reset for function/inferred content: {content[:50]}...")
# MINIMAL QUEUE: Only reset audio for ACTUAL user input, not function calls
from app.agents.voice.automatic.audio.audio_manager import get_audio_manager
audio_manager = get_audio_manager()
if audio_manager:
# Check if this is a real user input or just function call processing
is_real_user_input = not content.startswith("[Inferred from voice]") and "function" not in content.lower()
if is_real_user_input:
audio_manager.reset()
logger.debug(f"MINIMAL QUEUE: Reset audio for real user input: {content[:50]}...")
else:
logger.debug(f"MINIMAL QUEUE: Skipping reset for function/inferred content: {content[:50]}...")
🤖 Prompt for AI Agents
In app/agents/voice/automatic/utils/conversation_manager.py around lines 130 to
142, the code calls audio_manager.reset_for_new_input() which doesn't exist on
the AudioManager instance and raises AttributeError; replace that call with the
actual instance method audio_manager.reset(), or if the intent was to use the
module-level helper, import and call reset_for_new_input() from
app.agents.voice.automatic.audio.audio_manager at the top and use that function
instead, ensuring the chosen approach matches other callers and updating the
corresponding import/usage accordingly.

@murdore murdore requested a review from Copilot September 25, 2025 08:31
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds an event-driven waiting audio system for AI chat that provides user feedback during processing. The system starts audio after user speech ends and stops immediately when the assistant begins responding, with smart handling to avoid audio during function calls.

  • Introduces seamless looping waiting audio (6-second clips with chunked playback)
  • Implements immediate audio interruption when bot starts speaking
  • Adds intelligent filtering to prevent audio during function calls and tool usage

Reviewed Changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
app/agents/voice/automatic/audio/audio_manager.py Core audio management with chunked playback and immediate stop capabilities
app/agents/voice/automatic/processors/user_speaking_audio.py Processor to handle audio based on user speaking events with transcription timing
app/agents/voice/automatic/processors/llm_spy.py Enhanced LLM processor with instant audio stopping on text output
app/agents/voice/automatic/utils/conversation_manager.py Audio reset logic for real user input vs function calls
app/agents/voice/automatic/__init__.py Integration of audio manager and processors into pipeline

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines 40 to 47
def __init__(self, name: str = "UserSpeakingAudioProcessor"):
super().__init__(name=name)
self._user_currently_speaking = False
self._actual_speech_detected = False # Track if transcription was received
self._speech_start_time = None # Track when speech started
self._min_speech_duration = 2.0 # Minimum duration (seconds) for fallback (only for very long holds)
self._pending_audio_task = None # Task waiting for transcription
self._transcription_timeout = 3.0 # Wait up to 3 seconds for transcription after PTT release
Copy link

Copilot AI Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hardcoded timeout value should be configurable. Consider making it a class parameter or configuration constant to allow tuning without code changes.

Suggested change
def __init__(self, name: str = "UserSpeakingAudioProcessor"):
super().__init__(name=name)
self._user_currently_speaking = False
self._actual_speech_detected = False # Track if transcription was received
self._speech_start_time = None # Track when speech started
self._min_speech_duration = 2.0 # Minimum duration (seconds) for fallback (only for very long holds)
self._pending_audio_task = None # Task waiting for transcription
self._transcription_timeout = 3.0 # Wait up to 3 seconds for transcription after PTT release
def __init__(self, name: str = "UserSpeakingAudioProcessor", transcription_timeout: float = 3.0):
super().__init__(name=name)
self._user_currently_speaking = False
self._actual_speech_detected = False # Track if transcription was received
self._speech_start_time = None # Track when speech started
self._min_speech_duration = 2.0 # Minimum duration (seconds) for fallback (only for very long holds)
self._pending_audio_task = None # Task waiting for transcription
self._transcription_timeout = transcription_timeout # Wait up to transcription_timeout seconds for transcription after PTT release

Copilot uses AI. Check for mistakes.
self._user_currently_speaking = False
self._actual_speech_detected = False # Track if transcription was received
self._speech_start_time = None # Track when speech started
self._min_speech_duration = 2.0 # Minimum duration (seconds) for fallback (only for very long holds)
Copy link

Copilot AI Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This minimum speech duration is defined but never used in the code. Either implement the duration check or remove this unused variable.

Copilot uses AI. Check for mistakes.
Comment on lines +40 to +49
def _stop_audio_immediately(context: str = "unknown") -> bool:
"""INSTANT audio stopping using simplified AudioManager API."""
audio_manager = get_audio_manager()
if audio_manager and (audio_manager.user_has_input or audio_manager.is_playing):
# Use the simplified stop method
asyncio.create_task(audio_manager.stop_and_disable_audio())

logger.info(f"INSTANT AUDIO STOP: {context}")
return True
return False
Copy link

Copilot AI Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a fire-and-forget task with asyncio.create_task() in a non-async function can lead to race conditions and unhandled exceptions. Consider making this function async or properly awaiting the task in the caller.

Copilot uses AI. Check for mistakes.
from app.core.logger import logger

# Configurable audio length constant
AUDIO_LENGTH_SECONDS = 6 # Default duration in seconds
Copy link

Copilot AI Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This global constant should be configurable through the configuration system rather than hardcoded. Consider moving it to the config module or making it a parameter.

Copilot uses AI. Check for mistakes.
# LLM response started handler - DO NOT start audio here
# Audio should only start when user stops speaking, not when LLM processes
@llm.event_handler("on_llm_response_started")
async def on_llm_response_started(service,function_calls):
Copy link

Copilot AI Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space after comma in function parameter list. Should be service, function_calls.

Suggested change
async def on_llm_response_started(service,function_calls):
async def on_llm_response_started(service, function_calls):

Copilot uses AI. Check for mistakes.
from pydub import AudioSegment
import audioop

from dotenv import load_dotenv
Copy link

Copilot AI Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate import of load_dotenv - it's imported again on line 13 and was already imported earlier. Remove the duplicate import.

Suggested change
from dotenv import load_dotenv

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant