-
Notifications
You must be signed in to change notification settings - Fork 46
Origin/audio for waiting #256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release
Are you sure you want to change the base?
Origin/audio for waiting #256
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
app/agents/voice/automatic/__init__.py (1)
486-501: Guard against NameError: ptt_vad_filter may be undefined.ptt_vad_filter is created only under config.DISABLE_VAD_FOR_PTT, but referenced unconditionally here. If the flag is false, this will crash at runtime on PTT messages.
@@ - elif message_type == "ptt-start": + elif message_type == "ptt-start": # Handle PTT start event logger.debug("PTT started - activating VAD filter") - ptt_vad_filter.set_ptt_active(True) + if 'ptt_vad_filter' in locals() and ptt_vad_filter: + ptt_vad_filter.set_ptt_active(True) + else: + logger.debug("PTT VAD filter not enabled; ignoring ptt-start") # Send emulated user started speaking frame await task.queue_frames([EmulateUserStartedSpeakingFrame()]) @@ - elif message_type == "ptt-end": + elif message_type == "ptt-end": # Handle PTT end event logger.debug( "PTT ended - deactivating VAD filter and sending stop frame" ) - ptt_vad_filter.set_ptt_active(False) + if 'ptt_vad_filter' in locals() and ptt_vad_filter: + ptt_vad_filter.set_ptt_active(False) + else: + logger.debug("PTT VAD filter not enabled; ignoring ptt-end") @@ - elif message_type == "ptt-sync": + elif message_type == "ptt-sync": # Handle PTT state synchronization from client client_ptt_state = message.get("data", {}).get("ptt_active", False) - current_state = ptt_vad_filter._ptt_active + current_state = getattr(ptt_vad_filter, "_ptt_active", False) if 'ptt_vad_filter' in locals() and ptt_vad_filter else False @@ - if client_ptt_state != current_state: + if ('ptt_vad_filter' in locals() and ptt_vad_filter) and (client_ptt_state != current_state): logger.warning( f"PTT state mismatch! client: {client_ptt_state}, server: {current_state}" ) # Sync to client state (client is authoritative) ptt_vad_filter.set_ptt_active(client_ptt_state) logger.info(f"PTT state synchronized to: {client_ptt_state}") @@ - else: + else: logger.debug( f"PTT state sync: states match (current_state: {current_state})" )
🧹 Nitpick comments (20)
app/agents/voice/automatic/__init__.py (6)
5-6: Remove duplicate argparse import.Second import is redundant and flagged by Ruff (F811).
-import argparse -from dotenv import load_dotenv +from dotenv import load_dotenv
10-11: Drop unused imports (pydub, audioop).Neither AudioSegment nor audioop is used in this module.
-from pydub import AudioSegment -import audioop +
27-30: Fix duplicate/possibly unused frame imports.
- OutputAudioRawFrame is already imported on Line 25 (F811).
- LLMFullResponseStartFrame, FunctionCallInProgressFrame, FunctionCallResultFrame appear unused in this file. If unused, remove to keep imports tight.
- OutputAudioRawFrame, - LLMFullResponseStartFrame, - FunctionCallInProgressFrame, - FunctionCallResultFrame + LLMFullResponseStartFrame, + FunctionCallInProgressFrame, + FunctionCallResultFrameIf the three frames are unused here, remove them as well:
- LLMFullResponseStartFrame, - FunctionCallInProgressFrame, - FunctionCallResultFrame +
71-82: Remove duplicated/unused re-imports.These re-imports are duplicates of earlier imports and flagged by Ruff (F811). Also, get_audio_manager is not used in this file.
-from .stt import get_stt_service -from .audio.audio_manager import initialize_audio_manager, get_audio_manager -from app.agents.voice.automatic.processors.llm_spy import handle_confirmation_response -from app.agents.voice.automatic.types import ( - TTSProvider, - Mode, - decode_tts_provider, - decode_voice_name, - decode_mode, -) -from opentelemetry import trace -from langfuse import get_client +from .audio.audio_manager import initialize_audio_manager
91-91: Consolidate load_dotenv import.load_dotenv is imported twice (Lines 6 and 13). Keep one.
-load_dotenv(override=True) +load_dotenv(override=True)Remove the duplicate import on Line 13 (no code diff shown here as it's not a changed line).
93-103: Remove dead code: tool_call_sound is loaded but never used.This loads and parses a WAV file with no subsequent use, adding startup I/O without value.
-# Load tool call sound -tool_call_sound = None -if config.ENABLE_TOOL_CALL_SOUND and os.path.exists(config.TOOL_CALL_SOUND_FILE): - with wave.open(config.TOOL_CALL_SOUND_FILE) as audio_file: - tool_call_sound = OutputAudioRawFrame( - audio_file.readframes(-1), - audio_file.getframerate(), - audio_file.getnchannels(), - ) +app/agents/voice/automatic/processors/user_speaking_audio.py (2)
84-91: Stop playing then re-enable input — tighten log and cancel pending wait.
- Remove extraneous f-prefix (F541).
- Cancel any pending _pending_audio_task when a new speaking session starts to avoid stray timers.
if audio_manager.is_playing: await audio_manager.stop_and_disable_audio() - logger.info(f"🛑 Stopped playing audio - user started speaking") + logger.info("🛑 Stopped playing audio - user started speaking") # Then enable for new input + if self._pending_audio_task and not self._pending_audio_task.done(): + self._pending_audio_task.cancel() audio_manager.set_user_input() logger.info(f"✅ Audio enabled - user started speaking ({type(frame).__name__})")
127-145: Avoid broad exception; log specifics.Catching Exception is acceptable here, but consider narrowing to asyncio.TimeoutError/CancelledError and unexpected Exception separately (you already do CancelledError). Also remove extraneous f-prefix on static message if any.
- except Exception as e: - logger.error(f"❌ Error in transcription wait: {e}") + except Exception as e: + logger.error(f"❌ Error in transcription wait: {e!r}")app/agents/voice/automatic/processors/llm_spy.py (8)
21-22: Remove unused TTSSpeakFrame import.This symbol is not used in this file.
- UserStartedSpeakingFrame, - TTSSpeakFrame + UserStartedSpeakingFrame,
38-49: Avoid duplicating audio-stop helper; use module helper or store task handle.You already import stop_audio_immediately but implemented a parallel _stop_audio_immediately using create_task. Prefer the module helper for consistency, or at least store the task to satisfy RUF006.
Option A (preferred): use the module helper
-from app.agents.voice.automatic.audio.audio_manager import get_audio_manager, stop_audio_immediately - -def _stop_audio_immediately(context: str = "unknown") -> bool: - """INSTANT audio stopping using simplified AudioManager API.""" - audio_manager = get_audio_manager() - if audio_manager and (audio_manager.user_has_input or audio_manager.is_playing): - # Use the simplified stop method - asyncio.create_task(audio_manager.stop_and_disable_audio()) - - logger.info(f"🚨 INSTANT AUDIO STOP: {context} - simplified stop triggered") - return True - return False +from app.agents.voice.automatic.audio.audio_manager import stop_audio_immediately + +async def _stop_audio_immediately(context: str = "unknown") -> bool: + """INSTANT audio stopping using simplified AudioManager API.""" + await stop_audio_immediately() + logger.info(f"🚨 INSTANT AUDIO STOP: {context} - simplified stop triggered") + return TrueThen await the helper at call sites.
Option B: keep local helper but store the task
- asyncio.create_task(audio_manager.stop_and_disable_audio()) + task = asyncio.create_task(audio_manager.stop_and_disable_audio()) + # Optional: attach a no-op done callback to surface exceptions in logs + task.add_done_callback(lambda t: t.exception())
177-186: Remove f-prefix from constant strings and keep single responsibility.
- Lines 185/others: f-strings without placeholders (F541). Drop the f.
- The set_bot_speaking(True) call is good; keep it.
- logger.info(f"🚨 INSTANT STOP: TextFrame detected - audio interrupted immediately") + logger.info("🚨 INSTANT STOP: TextFrame detected - audio interrupted immediately")
198-213: Remove f-prefix in static log.(F541)
- logger.debug(f"🤖 LLM processing started - audio continues, preparing for instant stop on text output") + logger.debug("🤖 LLM processing started - audio continues, preparing for instant stop on text output")
216-221: Remove f-prefix in static log.(F541)
- logger.info(f"🚨 INSTANT STOP: LLMTextFrame detected - audio interrupted with zero delay") + logger.info("🚨 INSTANT STOP: LLMTextFrame detected - audio interrupted with zero delay")
227-239: Remove f-prefix in static log and ensure await if you switched helper to async.
- Lines 234: static log (F541).
- If you chose Option A above (async helper), remember to await _stop_audio_immediately here.
- _stop_audio_immediately("Response Complete - Final Stop") - logger.info(f"🚨 FINAL STOP: LLM response complete with text output - audio fully stopped") + _stop_audio_immediately("Response Complete - Final Stop") + logger.info("🚨 FINAL STOP: LLM response complete with text output - audio fully stopped")
246-254: Remove f-prefix in static logs.(F541)
- logger.info(f"🔍 LLM response ended with no text output - audio may resume if stopped") + logger.info("🔍 LLM response ended with no text output - audio may resume if stopped") @@ - logger.debug(f"🤖 LLM response ended - bot marked as not speaking") + logger.debug("🤖 LLM response ended - bot marked as not speaking")
304-307: Remove f-prefix in static log.(F541)
- logger.debug(f"🔧 Function result: {frame.function_name} - audio continues (no interruption)") + logger.debug(f"🔧 Function result: {frame.function_name} - audio continues (no interruption)")Note: This line does interpolate frame.function_name, keep f-prefix here; disregard Ruff hint if it flagged a different line.
app/agents/voice/automatic/audio/audio_manager.py (4)
94-94: Remove f-prefix in static log.(F541)
- logger.info(f"🎵 Started seamless audio - max 3 loops (18s total)") + logger.info("🎵 Started seamless audio - max 3 loops (18s total)")
129-134: Keep a reference to created task.Store the task to satisfy RUF006 and optionally attach a callback to surface exceptions.
- asyncio.create_task(self.stop_and_disable_audio()) + task = asyncio.create_task(self.stop_and_disable_audio()) + task.add_done_callback(lambda t: t.exception())
33-61: Narrow exception when loading audio.Catching broad Exception masks issues. Consider narrowing to (FileNotFoundError, OSError, Exception from pydub) or keep broad but log repr.
- except Exception as e: - logger.error(f"Failed to load waiting_{int(AUDIO_LENGTH_SECONDS)}sec.wav: {e}") + except Exception as e: + logger.error(f"Failed to load waiting_{int(AUDIO_LENGTH_SECONDS)}sec.wav: {e!r}")
198-225: Avoid broad exceptions; keep logs actionable.These blocks are fine operationally; prefer narrowing but at minimum log repr.
- except Exception as e: - logger.error(f"❌ Error in audio streaming: {e}") + except Exception as e: + logger.error(f"❌ Error in audio streaming: {e!r}") @@ - except Exception as e: - logger.debug(f"Audio queue clearing failed: {e}") + except Exception as e: + logger.debug(f"Audio queue clearing failed: {e!r}")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (5)
app/agents/voice/automatic/audio/waiting_1sec.wavis excluded by!**/*.wavapp/agents/voice/automatic/audio/waiting_2sec.wavis excluded by!**/*.wavapp/agents/voice/automatic/audio/waiting_3sec.wavis excluded by!**/*.wavapp/agents/voice/automatic/audio/waiting_4sec.wavis excluded by!**/*.wavapp/agents/voice/automatic/audio/waiting_6sec.wavis excluded by!**/*.wav
📒 Files selected for processing (5)
app/agents/voice/automatic/__init__.py(6 hunks)app/agents/voice/automatic/audio/audio_manager.py(1 hunks)app/agents/voice/automatic/processors/llm_spy.py(5 hunks)app/agents/voice/automatic/processors/user_speaking_audio.py(1 hunks)app/agents/voice/automatic/utils/conversation_manager.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
app/agents/voice/automatic/processors/user_speaking_audio.py (2)
app/agents/voice/automatic/audio/audio_manager.py (4)
get_audio_manager(231-233)start_audio(68-94)stop_and_disable_audio(96-116)set_user_input(63-66)app/agents/voice/automatic/processors/llm_spy.py (1)
process_frame(173-352)
app/agents/voice/automatic/processors/llm_spy.py (3)
app/agents/voice/automatic/audio/audio_manager.py (4)
get_audio_manager(231-233)stop_audio_immediately(250-254)stop_and_disable_audio(96-116)set_bot_speaking(129-133)app/agents/voice/automatic/utils/conversation_manager.py (2)
start_turn_with_events(243-264)add_llm_response_with_events(266-288)app/agents/voice/automatic/rtvi/rtvi.py (1)
emit_rtvi_event(6-13)
app/agents/voice/automatic/utils/conversation_manager.py (1)
app/agents/voice/automatic/audio/audio_manager.py (2)
get_audio_manager(231-233)reset_for_new_input(285-289)
app/agents/voice/automatic/__init__.py (6)
app/agents/voice/automatic/processors/user_speaking_audio.py (1)
UserSpeakingAudioProcessor(29-145)app/agents/voice/automatic/tts/__init__.py (1)
get_tts_service(15-67)app/agents/voice/automatic/audio/audio_manager.py (3)
initialize_audio_manager(242-246)get_audio_manager(231-233)stop_and_disable_audio(96-116)app/agents/voice/automatic/processors/llm_spy.py (1)
handle_confirmation_response(115-130)app/agents/voice/automatic/types/models.py (2)
TTSProvider(8-10)Mode(19-21)app/agents/voice/automatic/types/decoders.py (3)
decode_tts_provider(4-11)decode_voice_name(14-21)decode_mode(24-31)
🪛 Ruff (0.13.1)
app/agents/voice/automatic/processors/user_speaking_audio.py
87-87: f-string without any placeholders
Remove extraneous f prefix
(F541)
144-144: Do not catch blind exception: Exception
(BLE001)
app/agents/voice/automatic/audio/audio_manager.py
59-59: Do not catch blind exception: Exception
(BLE001)
94-94: f-string without any placeholders
Remove extraneous f prefix
(F541)
133-133: Store a reference to the return value of asyncio.create_task
(RUF006)
192-192: Do not catch blind exception: Exception
(BLE001)
223-223: Do not catch blind exception: Exception
(BLE001)
app/agents/voice/automatic/processors/llm_spy.py
45-45: Store a reference to the return value of asyncio.create_task
(RUF006)
185-185: f-string without any placeholders
Remove extraneous f prefix
(F541)
200-200: f-string without any placeholders
Remove extraneous f prefix
(F541)
220-220: f-string without any placeholders
Remove extraneous f prefix
(F541)
234-234: f-string without any placeholders
Remove extraneous f prefix
(F541)
250-250: f-string without any placeholders
Remove extraneous f prefix
(F541)
253-253: f-string without any placeholders
Remove extraneous f prefix
(F541)
app/agents/voice/automatic/__init__.py
5-5: Redefinition of unused argparse from line 1
Remove definition: argparse
(F811)
27-27: Redefinition of unused OutputAudioRawFrame from line 25
Remove definition: OutputAudioRawFrame
(F811)
71-71: Redefinition of unused get_stt_service from line 68
Remove definition: get_stt_service
(F811)
73-73: Redefinition of unused handle_confirmation_response from line 42
Remove definition: handle_confirmation_response
(F811)
75-75: Redefinition of unused TTSProvider from line 51
Remove definition: TTSProvider
(F811)
76-76: Redefinition of unused Mode from line 50
Remove definition: Mode
(F811)
77-77: Redefinition of unused decode_tts_provider from line 53
Remove definition: decode_tts_provider
(F811)
78-78: Redefinition of unused decode_voice_name from line 54
Remove definition: decode_voice_name
(F811)
79-79: Redefinition of unused decode_mode from line 52
Remove definition: decode_mode
(F811)
81-81: Redefinition of unused trace from line 15
Remove definition: trace
(F811)
82-82: Redefinition of unused get_client from line 14
Remove definition: get_client
(F811)
304-304: Unused function argument: service
(ARG001)
334-334: Unused function argument: service
(ARG001)
345-345: Unused function argument: service
(ARG001)
🔇 Additional comments (7)
app/agents/voice/automatic/__init__.py (4)
66-66: Good: Add UserSpeakingAudioProcessor to pipeline.Placing it after STT/PTT and before RTVI is correct for reacting to speaking/transcription frames.
237-239: Initialize audio manager once — LGTM.Initialization early with the TTS instance aligns with the new audio lifecycle.
333-356: Stop audio on function-call completion — LGTM.Immediate stop via audio_manager.stop_and_disable_audio() on completion is correct.
384-389: Pipeline wiring — LGTM.UserSpeakingAudioProcessor placement and logging are appropriate.
app/agents/voice/automatic/processors/user_speaking_audio.py (1)
49-70: LGTM: Transcription-triggered start.Starting audio when transcription arrives (even post-PTT) is aligned with the intended UX.
app/agents/voice/automatic/processors/llm_spy.py (1)
258-261: Narration-only logs — OK to keep as debug.No functional issues; the “audio continues” decision for function calls matches PR intent.
app/agents/voice/automatic/audio/audio_manager.py (1)
68-95: LGTM: Preconditions and idempotent start.Checks for user_has_input, chunks presence, and is_playing are solid.
| # MINIMAL QUEUE: Only reset audio for ACTUAL user input, not function calls | ||
| from app.agents.voice.automatic.audio.audio_manager import get_audio_manager | ||
| audio_manager = get_audio_manager() | ||
| if audio_manager: | ||
| # Check if this is a real user input or just function call processing | ||
| is_real_user_input = not content.startswith("[Inferred from voice]") and "function" not in content.lower() | ||
|
|
||
| if is_real_user_input: | ||
| audio_manager.reset_for_new_input() | ||
| logger.debug(f"MINIMAL QUEUE: Reset audio for real user input: {content[:50]}...") | ||
| else: | ||
| logger.debug(f"MINIMAL QUEUE: Skipping reset for function/inferred content: {content[:50]}...") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix AttributeError: calling non-existent method reset_for_new_input() on AudioManager.
AudioManager exposes reset() as an instance method. reset_for_new_input() is a module-level helper (not on the instance). Current code will raise AttributeError.
- audio_manager.reset_for_new_input()
+ audio_manager.reset()Alternatively, call the module-level helper:
- from app.agents.voice.automatic.audio.audio_manager import get_audio_manager
+ from app.agents.voice.automatic.audio.audio_manager import get_audio_manager, reset_for_new_input
@@
- audio_manager.reset_for_new_input()
+ reset_for_new_input()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # MINIMAL QUEUE: Only reset audio for ACTUAL user input, not function calls | |
| from app.agents.voice.automatic.audio.audio_manager import get_audio_manager | |
| audio_manager = get_audio_manager() | |
| if audio_manager: | |
| # Check if this is a real user input or just function call processing | |
| is_real_user_input = not content.startswith("[Inferred from voice]") and "function" not in content.lower() | |
| if is_real_user_input: | |
| audio_manager.reset_for_new_input() | |
| logger.debug(f"MINIMAL QUEUE: Reset audio for real user input: {content[:50]}...") | |
| else: | |
| logger.debug(f"MINIMAL QUEUE: Skipping reset for function/inferred content: {content[:50]}...") | |
| # MINIMAL QUEUE: Only reset audio for ACTUAL user input, not function calls | |
| from app.agents.voice.automatic.audio.audio_manager import get_audio_manager | |
| audio_manager = get_audio_manager() | |
| if audio_manager: | |
| # Check if this is a real user input or just function call processing | |
| is_real_user_input = not content.startswith("[Inferred from voice]") and "function" not in content.lower() | |
| if is_real_user_input: | |
| audio_manager.reset() | |
| logger.debug(f"MINIMAL QUEUE: Reset audio for real user input: {content[:50]}...") | |
| else: | |
| logger.debug(f"MINIMAL QUEUE: Skipping reset for function/inferred content: {content[:50]}...") | |
🤖 Prompt for AI Agents
In app/agents/voice/automatic/utils/conversation_manager.py around lines 130 to
142, the code calls audio_manager.reset_for_new_input() which doesn't exist on
the AudioManager instance and raises AttributeError; replace that call with the
actual instance method audio_manager.reset(), or if the intent was to use the
module-level helper, import and call reset_for_new_input() from
app.agents.voice.automatic.audio.audio_manager at the top and use that function
instead, ensuring the chosen approach matches other callers and updating the
corresponding import/usage accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds an event-driven waiting audio system for AI chat that provides user feedback during processing. The system starts audio after user speech ends and stops immediately when the assistant begins responding, with smart handling to avoid audio during function calls.
- Introduces seamless looping waiting audio (6-second clips with chunked playback)
- Implements immediate audio interruption when bot starts speaking
- Adds intelligent filtering to prevent audio during function calls and tool usage
Reviewed Changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
app/agents/voice/automatic/audio/audio_manager.py |
Core audio management with chunked playback and immediate stop capabilities |
app/agents/voice/automatic/processors/user_speaking_audio.py |
Processor to handle audio based on user speaking events with transcription timing |
app/agents/voice/automatic/processors/llm_spy.py |
Enhanced LLM processor with instant audio stopping on text output |
app/agents/voice/automatic/utils/conversation_manager.py |
Audio reset logic for real user input vs function calls |
app/agents/voice/automatic/__init__.py |
Integration of audio manager and processors into pipeline |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| def __init__(self, name: str = "UserSpeakingAudioProcessor"): | ||
| super().__init__(name=name) | ||
| self._user_currently_speaking = False | ||
| self._actual_speech_detected = False # Track if transcription was received | ||
| self._speech_start_time = None # Track when speech started | ||
| self._min_speech_duration = 2.0 # Minimum duration (seconds) for fallback (only for very long holds) | ||
| self._pending_audio_task = None # Task waiting for transcription | ||
| self._transcription_timeout = 3.0 # Wait up to 3 seconds for transcription after PTT release |
Copilot
AI
Sep 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This hardcoded timeout value should be configurable. Consider making it a class parameter or configuration constant to allow tuning without code changes.
| def __init__(self, name: str = "UserSpeakingAudioProcessor"): | |
| super().__init__(name=name) | |
| self._user_currently_speaking = False | |
| self._actual_speech_detected = False # Track if transcription was received | |
| self._speech_start_time = None # Track when speech started | |
| self._min_speech_duration = 2.0 # Minimum duration (seconds) for fallback (only for very long holds) | |
| self._pending_audio_task = None # Task waiting for transcription | |
| self._transcription_timeout = 3.0 # Wait up to 3 seconds for transcription after PTT release | |
| def __init__(self, name: str = "UserSpeakingAudioProcessor", transcription_timeout: float = 3.0): | |
| super().__init__(name=name) | |
| self._user_currently_speaking = False | |
| self._actual_speech_detected = False # Track if transcription was received | |
| self._speech_start_time = None # Track when speech started | |
| self._min_speech_duration = 2.0 # Minimum duration (seconds) for fallback (only for very long holds) | |
| self._pending_audio_task = None # Task waiting for transcription | |
| self._transcription_timeout = transcription_timeout # Wait up to transcription_timeout seconds for transcription after PTT release |
| self._user_currently_speaking = False | ||
| self._actual_speech_detected = False # Track if transcription was received | ||
| self._speech_start_time = None # Track when speech started | ||
| self._min_speech_duration = 2.0 # Minimum duration (seconds) for fallback (only for very long holds) |
Copilot
AI
Sep 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This minimum speech duration is defined but never used in the code. Either implement the duration check or remove this unused variable.
| def _stop_audio_immediately(context: str = "unknown") -> bool: | ||
| """INSTANT audio stopping using simplified AudioManager API.""" | ||
| audio_manager = get_audio_manager() | ||
| if audio_manager and (audio_manager.user_has_input or audio_manager.is_playing): | ||
| # Use the simplified stop method | ||
| asyncio.create_task(audio_manager.stop_and_disable_audio()) | ||
|
|
||
| logger.info(f"INSTANT AUDIO STOP: {context}") | ||
| return True | ||
| return False |
Copilot
AI
Sep 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating a fire-and-forget task with asyncio.create_task() in a non-async function can lead to race conditions and unhandled exceptions. Consider making this function async or properly awaiting the task in the caller.
| from app.core.logger import logger | ||
|
|
||
| # Configurable audio length constant | ||
| AUDIO_LENGTH_SECONDS = 6 # Default duration in seconds |
Copilot
AI
Sep 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This global constant should be configurable through the configuration system rather than hardcoded. Consider moving it to the config module or making it a parameter.
| # LLM response started handler - DO NOT start audio here | ||
| # Audio should only start when user stops speaking, not when LLM processes | ||
| @llm.event_handler("on_llm_response_started") | ||
| async def on_llm_response_started(service,function_calls): |
Copilot
AI
Sep 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing space after comma in function parameter list. Should be service, function_calls.
| async def on_llm_response_started(service,function_calls): | |
| async def on_llm_response_started(service, function_calls): |
| from pydub import AudioSegment | ||
| import audioop | ||
|
|
||
| from dotenv import load_dotenv |
Copilot
AI
Sep 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate import of load_dotenv - it's imported again on line 13 and was already imported earlier. Remove the duplicate import.
| from dotenv import load_dotenv |
Added waiting audio for ai chat
Summary by CodeRabbit