-
Notifications
You must be signed in to change notification settings - Fork 46
Origin/audio for waiting v2 #279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release
Are you sure you want to change the base?
Origin/audio for waiting v2 #279
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughAdds a centralized AudioManager, integrates it with LLM and user-speaking events, and updates processing flow to stop/resume audio around user speech, LLM text, and function-call lifecycles. Inserts a new UserSpeakingAudioProcessor into the pipeline, updates LLM spy logic for immediate audio interruption, and resets audio state on genuine user messages. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant Mic as PTT/STT
participant USP as UserSpeakingAudioProcessor
participant LLM as LLM Pipeline
participant AM as AudioManager
participant TTS as TTS Transport
rect rgba(230,245,255,0.5)
Note over User,Mic: User begins speaking
User->>Mic: Voice input
Mic->>USP: Transcription frames (partial/empty)
USP->>AM: stop() and enable_input()
end
alt User stops speaking
Mic-->>USP: PTT release
USP->>USP: Wait for transcription (timeout)
opt Transcription arrives (non-empty)
USP->>AM: start() after grace
AM->>TTS: queue_frame(audio chunks)
loop 100ms chunks
AM->>TTS: stream next chunk
end
end
end
rect rgba(255,245,230,0.5)
Note over LLM,AM: LLM emits text
LLM-->>USP: LLMTextFrame
USP->>AM: stop() immediately
end
sequenceDiagram
autonumber
participant LLM as LLM/Orchestrator
participant Spy as LLMSpyProcessor
participant AM as AudioManager
rect rgba(240,255,240,0.6)
Note over LLM,Spy: Function call lifecycle
LLM-->>Spy: FunctionCallInProgressFrame (first)
Spy->>AM: start(function_call_set)
end
LLM-->>Spy: LLMTextFrame (any non-empty text)
Spy->>AM: stop() immediately
LLM-->>Spy: FunctionCallResultFrame (possibly final)
alt Turn completed (all calls done)
Spy->>AM: stop() function_call_set
end
LLM-->>Spy: LLMFullResponseEndFrame
Spy->>AM: stop() if any text output
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (11)
app/agents/voice/automatic/utils/conversation_manager.py (2)
131-131: Consider moving the import to module-level.The import of
get_audio_manageris inside the method. For better performance and clarity, consider moving it to the top of the file with other imports.from app.agents.voice.automatic.features.charts.conversation import ( ConversationDebugData, ConversationEvent, ConversationMessage, ConversationTurn, ToolCall, ToolResult, ) +from app.agents.voice.automatic.audio.audio_manager import get_audio_manager from app.core.logger import loggerThen remove the import from line 131:
def add_user_message( self, session_id: str, content: str, message_id: Optional[str] = None ) -> ConversationMessage: """Add a user message and start a new turn""" # MINIMAL QUEUE: Only reset audio for ACTUAL user input, not function calls - from app.agents.voice.automatic.audio.audio_manager import get_audio_manager audio_manager = get_audio_manager()
135-143: Replace content-based heuristics with an explicit source flag
The logic in conversation_manager.py uses content.startswith("[Inferred from voice]") and a lowercase “function” check—tied to the default user_content value “[Inferred from voice]” in start_turn_with_events—which is brittle. Introduce an explicit enum or boolean on the message object to signal its origin instead of inferring it from the content.app/agents/voice/automatic/__init__.py (3)
285-287: Remove unused function parameter.The
serviceparameter inon_llm_response_startedis unused. Consider removing it or prefixing with underscore to indicate it's intentionally unused.- async def on_llm_response_started(service, function_calls): + async def on_llm_response_started(_service, function_calls): logger.info(f"Function calls started event triggered with {len(function_calls)} calls")Or if the event handler signature requires it, use
_as the parameter name:- async def on_llm_response_started(service, function_calls): + async def on_llm_response_started(_service, _function_calls): logger.info(f"Function calls started event triggered with {len(_function_calls)} calls")
293-298: Consider encapsulating audio state manipulation.The handler directly accesses and modifies
audio_managerinternal state (is_playing,loop_task). This breaks encapsulation and creates tight coupling. Consider adding a method toAudioManagerlikestop_without_disable()orpause_for_function_calls()to encapsulate this logic.In
audio_manager.py, add:async def pause_for_function_calls(self): """Pause audio playback during function calls without disabling.""" if self.is_playing: self.is_playing = False if self.loop_task and not self.loop_task.done(): self.loop_task.cancel() try: await self.loop_task except asyncio.CancelledError: pass logger.info("Audio paused for function calls")Then simplify the handler:
async def on_function_calls_finished(_service): - # Stop audio when function calls finish but keep it enabled for potential resumption if not audio_manager: logger.warning("Audio manager not available, skipping audio control") return - if audio_manager.is_playing: - # Stop current audio but don't disable - allow resumption if no LLM text follows - audio_manager.is_playing = False - if audio_manager.loop_task and not audio_manager.loop_task.done(): - audio_manager.loop_task.cancel() - logger.info("Function calls finished - stopped audio but kept enabled for resumption") - else: - logger.debug("Function calls finished - audio not playing, no action needed") + await audio_manager.pause_for_function_calls()
22-22: Remove unused import
TheTTSSpeakFrameimport on line 22 isn’t referenced or re-exported anywhere—remove it.app/agents/voice/automatic/processors/user_speaking_audio.py (3)
40-46: Consider making transcription timeout configurable.The
_transcription_timeoutis hardcoded to 3.0 seconds. Consider making this configurable via a constructor parameter or config file, especially since different STT services may have different latency characteristics.- def __init__(self, name: str = "UserSpeakingAudioProcessor"): + def __init__(self, name: str = "UserSpeakingAudioProcessor", transcription_timeout: float = 3.0): super().__init__(name=name) self._user_currently_speaking = False self._actual_speech_detected = False # Track if transcription was received self._speech_start_time = None # Track when speech started self._pending_audio_task = None # Task waiting for transcription - self._transcription_timeout = 3.0 # Wait up to 3 seconds for transcription after PTT release + self._transcription_timeout = transcription_timeout # Wait for transcription after PTT release
44-44: Unused state variable: _speech_start_time.The
_speech_start_timefield is set on line 76 but never read. If you plan to use it for duration checks or logging, implement that logic; otherwise, remove it to reduce cognitive load.self._user_currently_speaking = False self._actual_speech_detected = False # Track if transcription was received - self._speech_start_time = None # Track when speech started self._pending_audio_task = None # Task waiting for transcriptionAnd remove the assignment on line 76:
if not self._user_currently_speaking: self._user_currently_speaking = True self._actual_speech_detected = False # Reset speech detection for new session - self._speech_start_time = time.time() # Record when speech started audio_manager = get_audio_manager()
61-62: Remove commented-out code.Lines 62, 82, and 86 contain commented-out logger statements. Remove them to keep the code clean, or uncomment them if the logging is valuable for debugging.
# If we have a pending audio task waiting for transcription, start audio now if self._pending_audio_task and not self._pending_audio_task.done(): - # logger.info("Starting audio due to delayed transcription") audio_manager = get_audio_manager()# First stop any currently playing audio immediately if audio_manager.is_playing: await audio_manager.stop_and_disable_audio() - # logger.info("Stopped playing audio - user started speaking")# Then enable for new input audio_manager.set_user_input() - # logger.info(f"Audio enabled - user started speaking ({type(frame).__name__})") else:Also applies to: 82-82, 86-86
app/agents/voice/automatic/audio/audio_manager.py (2)
105-136: Minor: Remove unnecessary f-string prefix.The logic correctly implements the grace period and immediate start flows with proper cancellation handling.
At line 113, remove the
fprefix since there are no placeholders:- logger.info(f"Grace period ended - starting audio") + logger.info("Grace period ended - starting audio")
174-178: Consider tracking the background task or adding error handling.The
set_bot_speakingmethod creates a fire-and-forget task. Ifstop_and_disable_audioraises an exception, it will be silently ignored, potentially leaving audio in an inconsistent state.Store and optionally await the task, or add a done callback for error logging:
def set_bot_speaking(self, speaking: bool): """Track when bot starts/stops speaking to prevent audio during speech.""" if speaking and self.is_playing: logger.info("Bot started speaking - stopping audio") - asyncio.create_task(self.stop_and_disable_audio()) + task = asyncio.create_task(self.stop_and_disable_audio()) + task.add_done_callback(lambda t: logger.error(f"Error stopping audio: {t.exception()}") if t.exception() else None)app/agents/voice/automatic/processors/llm_spy.py (1)
21-21: Optional: Remove unused import if not needed.
TTSSpeakFrameis imported but not referenced in the file. If it's not needed for future work, consider removing it to keep imports clean.UserStartedSpeakingFrame, - TTSSpeakFrame )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
app/agents/voice/automatic/audio/waiting_6sec.wavis excluded by!**/*.wav
📒 Files selected for processing (5)
app/agents/voice/automatic/__init__.py(5 hunks)app/agents/voice/automatic/audio/audio_manager.py(1 hunks)app/agents/voice/automatic/processors/llm_spy.py(7 hunks)app/agents/voice/automatic/processors/user_speaking_audio.py(1 hunks)app/agents/voice/automatic/utils/conversation_manager.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
app/agents/voice/automatic/processors/llm_spy.py (3)
app/agents/voice/automatic/audio/audio_manager.py (6)
get_audio_manager(313-315)stop_function_call_set_audio(197-218)stop_and_disable_audio(138-163)set_bot_speaking(174-178)start_audio(101-103)start_function_call_set_audio(180-195)app/agents/voice/automatic/utils/conversation_manager.py (2)
start_turn_with_events(245-266)add_llm_response_with_events(268-290)app/agents/voice/automatic/rtvi/rtvi.py (1)
emit_rtvi_event(6-13)
app/agents/voice/automatic/processors/user_speaking_audio.py (1)
app/agents/voice/automatic/audio/audio_manager.py (4)
get_audio_manager(313-315)start_audio(101-103)stop_and_disable_audio(138-163)set_user_input(72-75)
app/agents/voice/automatic/utils/conversation_manager.py (1)
app/agents/voice/automatic/audio/audio_manager.py (3)
get_audio_manager(313-315)reset_for_new_input(332-336)reset_function_call_set(220-223)
app/agents/voice/automatic/__init__.py (2)
app/agents/voice/automatic/processors/user_speaking_audio.py (1)
UserSpeakingAudioProcessor(29-131)app/agents/voice/automatic/audio/audio_manager.py (1)
initialize_audio_manager(324-328)
🪛 Ruff (0.13.1)
app/agents/voice/automatic/audio/audio_manager.py
68-68: Do not catch blind exception: Exception
(BLE001)
113-113: f-string without any placeholders
Remove extraneous f prefix
(F541)
178-178: Store a reference to the return value of asyncio.create_task
(RUF006)
277-277: Do not catch blind exception: Exception
(BLE001)
305-305: Do not catch blind exception: Exception
(BLE001)
app/agents/voice/automatic/processors/llm_spy.py
46-46: Store a reference to the return value of asyncio.create_task
(RUF006)
50-50: Store a reference to the return value of asyncio.create_task
(RUF006)
app/agents/voice/automatic/processors/user_speaking_audio.py
130-130: Do not catch blind exception: Exception
(BLE001)
app/agents/voice/automatic/__init__.py
286-286: Unused function argument: service
(ARG001)
291-291: Unused function argument: service
(ARG001)
🔇 Additional comments (15)
app/agents/voice/automatic/__init__.py (2)
219-221: LGTM: Audio manager initialization.The audio manager is correctly initialized early in the flow and passed the TTS service. This establishes the centralized audio control as intended.
329-331: LGTM: UserSpeakingAudioProcessor integration.The processor is correctly instantiated and positioned in the pipeline after STT/PTT and before RTVI/context aggregation, which aligns with the intended flow for managing audio based on user speech events.
app/agents/voice/automatic/processors/user_speaking_audio.py (2)
73-88: LGTM: User started speaking flow.The logic correctly stops any playing audio and enables the audio manager for new input. The null check and error logging provide good defensive programming.
113-131: Broad exception catch is acceptable here.The static analysis tool flags catching
Exceptionon line 130, but in this context it's appropriate because:
- The method properly handles
asyncio.CancelledErrorseparately- Any unexpected error is logged with context
- The error won't propagate and crash the processor
app/agents/voice/automatic/audio/audio_manager.py (6)
1-13: LGTM!Imports and configuration are clean. The module uses
pydubfor audio manipulation—consider pinning to a maintained fork (e.g.,pozalabs-pyduborpydub-ng) if you encounter compatibility issues on newer Python versions.
15-40: LGTM!Constructor design is clean with minimal state. Synchronous audio loading in
__init__is acceptable for one-time setup.
42-70: LGTM!Audio loading logic correctly chunks the audio into 100ms segments for fast interruption. The broad exception handler at line 68 is justified here—it provides a safe fallback for any audio loading failure while logging the error.
72-103: LGTM!The grace period logic correctly guards against duplicate starts and properly manages task lifecycle. The static analysis warning about not storing the task reference is a false positive—the task is stored in
self.delay_taskat line 99.
180-280: LGTM!Function call set audio methods correctly manage state and lifecycle. The streaming logic implements a clever timing strategy (80ms for first chunk, 100ms thereafter) to maintain a minimal 20ms buffer for seamless playback. The broad exception handler at line 277 is justified for robustness and always resets the playing state in the finally block.
282-337: LGTM!Audio queue clearing is defensive and handles both sync/async interrupt methods correctly. The broad exception handler at line 305 is justified—it's logged at debug level and prevents clearing failures from propagating. Global singleton API is clean and straightforward.
app/agents/voice/automatic/processors/llm_spy.py (5)
182-197: LGTM!TextFrame handling correctly stops audio immediately on non-empty bot output and notifies the audio manager that the bot is speaking. The logic properly preserves existing text sanitization flow.
203-226: LGTM!LLM response handling correctly implements the instant-stop-on-text strategy. The response start frame prepares for collection without stopping audio, and text frames trigger immediate audio interruption on the first non-empty output. Chart accumulation is properly guarded.
228-256: LGTM!Response end handling correctly captures text output before clearing state and implements smart audio resumption logic: if the LLM produced no text (e.g., function calls only) and the user has provided input, waiting audio resumes. This prevents silence during function execution.
258-309: LGTM!Function call start handling correctly detects the first call in a function call set and initiates waiting audio. The check on
is_function_call_set_activeensures audio starts only once per set. Existing tracing and event emission logic is properly preserved.
311-366: LGTM!Function call result handling correctly detects turn completion (when all function calls finish) and stops the function call set audio. The detection via
"turn_complete"event type is appropriate, and the guard onfunction_call_set_audio_playingprevents unnecessary stops. Existing result emission and chart logic is preserved.
| async def on_function_calls_finished(service): | ||
| # Stop audio when function calls finish but keep it enabled for potential resumption | ||
| if audio_manager.is_playing: | ||
| # Stop current audio but don't disable - allow resumption if no LLM text follows | ||
| audio_manager.is_playing = False | ||
| if audio_manager.loop_task and not audio_manager.loop_task.done(): | ||
| audio_manager.loop_task.cancel() | ||
| logger.info("Function calls finished - stopped audio but kept enabled for resumption") | ||
| else: | ||
| logger.debug("Function calls finished - audio not playing, no action needed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add null safety check and address unused parameter.
Two issues:
- The
serviceparameter is unused (static analysis hint) audio_manageris accessed without verifying it was successfully initialized on line 220
If initialize_audio_manager fails or returns None, this code will raise an AttributeError.
- async def on_function_calls_finished(service):
+ async def on_function_calls_finished(_service):
# Stop audio when function calls finish but keep it enabled for potential resumption
+ if not audio_manager:
+ logger.warning("Audio manager not available, skipping audio control")
+ return
+
if audio_manager.is_playing:
# Stop current audio but don't disable - allow resumption if no LLM text follows
audio_manager.is_playing = False
if audio_manager.loop_task and not audio_manager.loop_task.done():
audio_manager.loop_task.cancel()
logger.info("Function calls finished - stopped audio but kept enabled for resumption")
else:
logger.debug("Function calls finished - audio not playing, no action needed")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async def on_function_calls_finished(service): | |
| # Stop audio when function calls finish but keep it enabled for potential resumption | |
| if audio_manager.is_playing: | |
| # Stop current audio but don't disable - allow resumption if no LLM text follows | |
| audio_manager.is_playing = False | |
| if audio_manager.loop_task and not audio_manager.loop_task.done(): | |
| audio_manager.loop_task.cancel() | |
| logger.info("Function calls finished - stopped audio but kept enabled for resumption") | |
| else: | |
| logger.debug("Function calls finished - audio not playing, no action needed") | |
| async def on_function_calls_finished(_service): | |
| # Stop audio when function calls finish but keep it enabled for potential resumption | |
| if not audio_manager: | |
| logger.warning("Audio manager not available, skipping audio control") | |
| return | |
| if audio_manager.is_playing: | |
| # Stop current audio but don't disable - allow resumption if no LLM text follows | |
| audio_manager.is_playing = False | |
| if audio_manager.loop_task and not audio_manager.loop_task.done(): | |
| audio_manager.loop_task.cancel() | |
| logger.info("Function calls finished - stopped audio but kept enabled for resumption") | |
| else: | |
| logger.debug("Function calls finished - audio not playing, no action needed") |
🧰 Tools
🪛 Ruff (0.13.1)
291-291: Unused function argument: service
(ARG001)
🤖 Prompt for AI Agents
In app/agents/voice/automatic/__init__.py around lines 291 to 300, the
on_function_calls_finished handler both declares an unused service parameter and
accesses audio_manager without null-safety; update the signature to either
remove or underscore the unused parameter (e.g., _service) to satisfy static
analysis, and guard all accesses to audio_manager with a None-check before
touching is_playing or loop_task (if audio_manager is None, log/debug and
return). Also ensure you check for the presence of loop_task before calling
done()/cancel() to avoid AttributeError.
| def _stop_audio_immediately(context: str = "unknown") -> bool: | ||
| """INSTANT audio stopping using simplified AudioManager API.""" | ||
| audio_manager = get_audio_manager() | ||
| if audio_manager and (audio_manager.user_has_input or audio_manager.is_playing or audio_manager.is_in_delay_period): | ||
| # Stop function call set audio if active | ||
| if audio_manager.function_call_set_audio_playing: | ||
| asyncio.create_task(audio_manager.stop_function_call_set_audio()) | ||
| logger.info(f"INSTANT STOP: Function call set audio stopped due to {context}") | ||
|
|
||
| # Use the simplified stop method for regular audio and grace period | ||
| asyncio.create_task(audio_manager.stop_and_disable_audio()) | ||
|
|
||
| logger.info(f"INSTANT AUDIO STOP: {context}") | ||
| return True | ||
| return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Track background tasks to prevent silent failures.
The _stop_audio_immediately helper creates fire-and-forget tasks at lines 46 and 50. If these async methods raise exceptions, they will be silently ignored, potentially leaving audio in an inconsistent state.
Store task references and add error handling:
def _stop_audio_immediately(context: str = "unknown") -> bool:
"""INSTANT audio stopping using simplified AudioManager API."""
audio_manager = get_audio_manager()
if audio_manager and (audio_manager.user_has_input or audio_manager.is_playing or audio_manager.is_in_delay_period):
# Stop function call set audio if active
if audio_manager.function_call_set_audio_playing:
- asyncio.create_task(audio_manager.stop_function_call_set_audio())
+ task = asyncio.create_task(audio_manager.stop_function_call_set_audio())
+ task.add_done_callback(lambda t: logger.error(f"Error stopping function call audio: {t.exception()}") if t.exception() else None)
logger.info(f"INSTANT STOP: Function call set audio stopped due to {context}")
# Use the simplified stop method for regular audio and grace period
- asyncio.create_task(audio_manager.stop_and_disable_audio())
+ task = asyncio.create_task(audio_manager.stop_and_disable_audio())
+ task.add_done_callback(lambda t: logger.error(f"Error stopping audio: {t.exception()}") if t.exception() else None)
logger.info(f"INSTANT AUDIO STOP: {context}")
return True
return False📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _stop_audio_immediately(context: str = "unknown") -> bool: | |
| """INSTANT audio stopping using simplified AudioManager API.""" | |
| audio_manager = get_audio_manager() | |
| if audio_manager and (audio_manager.user_has_input or audio_manager.is_playing or audio_manager.is_in_delay_period): | |
| # Stop function call set audio if active | |
| if audio_manager.function_call_set_audio_playing: | |
| asyncio.create_task(audio_manager.stop_function_call_set_audio()) | |
| logger.info(f"INSTANT STOP: Function call set audio stopped due to {context}") | |
| # Use the simplified stop method for regular audio and grace period | |
| asyncio.create_task(audio_manager.stop_and_disable_audio()) | |
| logger.info(f"INSTANT AUDIO STOP: {context}") | |
| return True | |
| return False | |
| def _stop_audio_immediately(context: str = "unknown") -> bool: | |
| """INSTANT audio stopping using simplified AudioManager API.""" | |
| audio_manager = get_audio_manager() | |
| if audio_manager and (audio_manager.user_has_input or audio_manager.is_playing or audio_manager.is_in_delay_period): | |
| # Stop function call set audio if active | |
| if audio_manager.function_call_set_audio_playing: | |
| task = asyncio.create_task(audio_manager.stop_function_call_set_audio()) | |
| task.add_done_callback( | |
| lambda t: logger.error(f"Error stopping function call audio: {t.exception()}") | |
| if t.exception() else None | |
| ) | |
| logger.info(f"INSTANT STOP: Function call set audio stopped due to {context}") | |
| # Use the simplified stop method for regular audio and grace period | |
| task = asyncio.create_task(audio_manager.stop_and_disable_audio()) | |
| task.add_done_callback( | |
| lambda t: logger.error(f"Error stopping audio: {t.exception()}") | |
| if t.exception() else None | |
| ) | |
| logger.info(f"INSTANT AUDIO STOP: {context}") | |
| return True | |
| return False |
🧰 Tools
🪛 Ruff (0.13.1)
46-46: Store a reference to the return value of asyncio.create_task
(RUF006)
50-50: Store a reference to the return value of asyncio.create_task
(RUF006)
| async def process_frame(self, frame: Frame, direction: FrameDirection): | ||
| """Process frames and manage audio based on user speaking events.""" | ||
| await super().process_frame(frame, direction) | ||
|
|
||
|
|
||
| # Handle transcription frames - detect actual speech content | ||
| if isinstance(frame, (TranscriptionFrame, InterimTranscriptionFrame)): | ||
| # Accept transcription even after PTT release (delayed transcription) | ||
| if frame.text.strip(): | ||
| if not self._actual_speech_detected: | ||
| self._actual_speech_detected = True | ||
|
|
||
| # If we have a pending audio task waiting for transcription, start audio now | ||
| if self._pending_audio_task and not self._pending_audio_task.done(): | ||
| # logger.info("Starting audio due to delayed transcription") | ||
| audio_manager = get_audio_manager() | ||
| if audio_manager: | ||
| await audio_manager.start_audio() | ||
|
|
||
| # Handle user started speaking events | ||
| elif isinstance(frame, ( | ||
| UserStartedSpeakingFrame, | ||
| VADUserStartedSpeakingFrame, | ||
| EmulateUserStartedSpeakingFrame | ||
| )): | ||
| if not self._user_currently_speaking: | ||
| self._user_currently_speaking = True | ||
| self._actual_speech_detected = False # Reset speech detection for new session | ||
| self._speech_start_time = time.time() # Record when speech started | ||
| audio_manager = get_audio_manager() | ||
| if audio_manager: | ||
| # First stop any currently playing audio immediately | ||
| if audio_manager.is_playing: | ||
| await audio_manager.stop_and_disable_audio() | ||
| # logger.info("Stopped playing audio - user started speaking") | ||
|
|
||
| # Then enable for new input | ||
| audio_manager.set_user_input() | ||
| # logger.info(f"Audio enabled - user started speaking ({type(frame).__name__})") | ||
| else: | ||
| logger.error("No audio manager found!") | ||
|
|
||
| # Handle user stopped speaking events | ||
| elif isinstance(frame, ( | ||
| UserStoppedSpeakingFrame, | ||
| VADUserStoppedSpeakingFrame, | ||
| EmulateUserStoppedSpeakingFrame | ||
| )): | ||
| if self._user_currently_speaking: | ||
| self._user_currently_speaking = False | ||
| audio_manager = get_audio_manager() | ||
| if audio_manager: | ||
|
|
||
| # Check if transcription was already detected during speaking | ||
| if self._actual_speech_detected: | ||
| await audio_manager.start_audio() | ||
| else: | ||
| # Wait for delayed transcription (common case) | ||
| self._pending_audio_task = asyncio.create_task(self._wait_for_transcription()) | ||
| else: | ||
| logger.error("No audio manager found!") | ||
|
|
||
| # Pass frame through to next processor | ||
| await self.push_frame(frame, direction) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Add cleanup method to cancel pending tasks.
The processor creates async tasks but doesn't provide a cleanup mechanism. If the processor is destroyed or the pipeline shuts down, _pending_audio_task may remain active, leading to resource leaks or unexpected behavior.
Add a cleanup method:
async def cleanup(self):
"""Cancel any pending tasks on shutdown."""
if self._pending_audio_task and not self._pending_audio_task.done():
self._pending_audio_task.cancel()
try:
await self._pending_audio_task
except asyncio.CancelledError:
pass
await super().cleanup()Verify that pipecat's FrameProcessor base class calls cleanup() during pipeline shutdown, or manually invoke it in the shutdown flow.
🤖 Prompt for AI Agents
In app/agents/voice/automatic/processors/user_speaking_audio.py around lines 48
to 111, the processor creates an asyncio task in _pending_audio_task but lacks a
cleanup method to cancel it, which can leak resources on shutdown; add an async
cleanup(self) that cancels and awaits the pending task (handling
asyncio.CancelledError) and then calls await super().cleanup(); ensure the
method checks task.done() before canceling and that pipeline shutdown invokes
this cleanup (or document calling it from the shutdown flow).
| @@ -0,0 +1,336 @@ | |||
| """ | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move wav file to assets/sounds
d86c661 to
9c29dba
Compare
Summary by CodeRabbit