Skip to content

Feat: real-time Voice Input + Transcript on Rebrowse Recorder #89

@zk1tty

Description

@zk1tty

Problem

Analyzing the intent behind the uesr's input is very hard. since it's hard, the time to text and monitor the initial workflows take more and more time. Currently I made a recorder to capture user's behavior(mouse input, keyboard input, user's click). But still no way to capture the intent of user's behavior.

Solution

I want to add voice recorder or voice agents to extract user's intention very well. So users can tell what they're looking for very comfortably, and it reduces the development/testing time for making the deterministic workflows.

Options

  1. Voice memo during recording → transcribe on stop
  2. Live caption while recording
  3. Realtime conversation with an Voice AI

Current Recording Pipeline

  1. Sidepanel UI
  • User clicks “Start Recording” in InitialView → startRecording() sends START_RECORDING.
  • User clicks “Stop Recording” in RecordingView → stopRecording() sends STOP_RECORDING.
  1. Background (service worker)
  • On START_RECORDING: clears sessionLogs/tabInfo, sets isRecordingEnabled=true, broadcasts SET_RECORDING_STATUS to tabs and recording_status_updated to sidepanel.
  • On STOP_RECORDING: sets isRecordingEnabled=false, broadcasts status (and notifies server with RECORDING_STOPPED).
  • On events from content: receives RRWEB_EVENT and CUSTOM_* events, attaches screenshots, stores in sessionLogs[tabId], rebuilds Workflow via convertStoredEventsToSteps, and exposes it via GET_RECORDING_DATA.
  1. Content script (per tab)
  • On SET_RECORDING_STATUS=true: starts rrweb recorder + custom listeners; emits RRWEB_EVENT, CUSTOM_CLICK_EVENT, CUSTOM_INPUT_EVENT, CUSTOM_KEY_EVENT, etc.
  • On SET_RECORDING_STATUS=false: stops recorder and removes listeners.
  1. Sidepanel data flow
  • Polls GET_RECORDING_DATA during recording; listens to recording_status_updated. Renders EventViewer from workflow.steps. Transitions to StoppedView after stop for upload.

Must-have UI

  • We can safely add a Mic toggle in InitialView (beside “Start Recording”)

  • show a transcript panel in RecordingView's bottom;

  • transcript data can live in sidepanel state and be uploaded alongside, or after workflow.

  • where to store transcript upding recording: in-memory state mirrored to localStorage keyed by runId
    -where to store transcript after recording: backend writes JSON as transcript file (keep it seperate from rrweb recording data)

  • AssemblyAI credential management: okay for dev(which I need by tomorrow), think apter about production

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions