A comprehensive Speech-to-Speech (S2S) WebRTC solution integrating AWS Bedrock Nova Sonic, Amazon Kinesis Video Streams with WebRTC, and real-time audio processing capabilities.
The sample solution architecture:
- Real-time WebRTC Communication: Low-latency audio streaming using Amazon KVS WebRTC
- AI-Powered Speech Processing: Integration with AWS Bedrock Nova Sonic for advanced speech-to-speech capabilities
- Cross-Platform Support: Works seamlessly on Windows, macOS, and Linux
- Production Ready: Optimized for both development and production environments
- Modular Architecture: Separate Python backend and React frontend for flexibility
- WebRTC Audio Processing: High-quality audio capture, processing, and playback
- AWS Integration: Seamless integration with AWS services (KVS, Bedrock, S3)
- Agent Integration: Support for MCP (Model Context Protocol) and Strands agents
- Performance Monitoring: Built-in performance tracking and optimization
- Configurable Logging: Comprehensive logging with adjustable levels
- CPU: Multi-core processor (Intel i5/AMD Ryzen 5 or better recommended)
- RAM: Minimum 4GB, 8GB recommended for optimal performance
- Storage: At least 2GB free space for dependencies and build files
- Network: Stable internet connection with low latency for real-time communication
- Audio: Microphone and speakers/headphones for testing
- Windows: Windows 10 or later
- macOS: macOS 10.15 (Catalina) or later (Intel and Apple Silicon)
- Linux: Ubuntu 18.04+, CentOS 7+, or equivalent distributions
- Python: 3.8 or higher (3.9+ recommended)
- Conda: Miniconda or Anaconda (recommended for cross-platform compatibility)
- Alternative: Python venv with manual system dependencies
- Node.js: 16.0 or higher (18.x LTS recommended)
- npm: 8.0 or higher (comes with Node.js)
- Browser: Modern browser with WebRTC support (Chrome 80+, Firefox 75+, Safari 14+, Edge 80+)
- AWS Account: Active AWS account with appropriate permissions
- AWS Services Access:
- Amazon Kinesis Video Streams
- AWS Bedrock (Nova Sonic model access)
- IAM permissions for KVS and Bedrock
sample-nova-sonic-speech2speech-webrtc/
βββ README.md # This file
βββ start-python-server.sh # Python server launcher script
βββ start-react-client.sh # React client launcher script
βββ python-webrtc-server/ # Python WebRTC backend
β βββ webrtc_server.py # Main server application
β βββ requirements.txt # Python dependencies
β βββ .env.template # Environment configuration template
β βββ webrtc/ # WebRTC modules
β βββ integration/ # AWS and agent integrations
β βββ server_test_audio/ # Test audio files
βββ react-webrtc-client/ # React frontend application
β βββ src/ # React source code
β βββ public/ # Static assets
β βββ package.json # Node.js dependencies
β βββ .env.template # Frontend environment template
βββ docs/ # Additional documentation
βββ troubleshooting.md # Comprehensive troubleshooting guide
βββ architecture.md # System architecture
βββ api-reference.md # API documentation
βββ deployment.md # Deployment guide
macOS:
# Using Homebrew (easiest)
brew install miniconda
# Or download installer
curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh # Intel
curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh # Apple Silicon
bash Miniconda3-latest-MacOSX-*.sh
Linux:
# Download and install
curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
# Or use package manager
sudo apt install miniconda3 # Ubuntu/Debian
sudo yum install miniconda3 # CentOS/RHEL
Windows:
# Using Windows Package Manager
winget install Anaconda.Miniconda3
# Or using Chocolatey
choco install miniconda3
# Or download installer from: https://repo.anaconda.com/miniconda/
- Download from nodejs.org (LTS version recommended)
- Or use version managers like nvm
# Navigate to the project directory
cd sample-nova-sonic-speech2speech-webrtc/
# Make scripts executable (Linux/macOS)
chmod +x *.sh
# Verify prerequisites
python3 --version # Should be 3.8+
node --version # Should be 16.0+
conda --version # Should show conda version
- Create AWS Account if you don't have one
- Create IAM User with programmatic access
- Attach Required Policies:
AmazonKinesisVideoStreamsFullAccess
AmazonBedrockFullAccess
Before running the application, you must create the KVS WebRTC signaling channel:
Option 1: Using AWS Console (Recommended)
- Open the Amazon Kinesis Video Streams Console
- Navigate to Signaling channels in the left sidebar
- Click Create signaling channel
- Enter channel name:
nova-s2s-webrtc-test
- Leave other settings as default
- Click Create signaling channel
Option 2: Using AWS CLI
# Create the signaling channel
aws kinesisvideo create-signaling-channel \
--channel-name nova-s2s-webrtc-test \
--region ap-northeast-1
# Verify the channel was created
aws kinesisvideo list-signaling-channels \
--region ap-northeast-1 \
--query 'ChannelInfoList[?ChannelName==`nova-s2s-webrtc-test`]'
Important Notes:
- The channel name must match the
KVS_CHANNEL_NAME
in your environment configuration - The channel must be created in the same AWS region as specified in your configuration
- If using a different channel name, update the
KVS_CHANNEL_NAME
variable in your.env
files
Python Backend (.env):
# Copy and edit environment template
cp python-webrtc-server/.env.template python-webrtc-server/.env
nano python-webrtc-server/.env # Edit with your values
Required variables:
# AWS Configuration
AWS_REGION=ap-northeast-1
AWS_ACCESS_KEY_ID=your_access_key_here
AWS_SECRET_ACCESS_KEY=your_secret_key_here
# KVS WebRTC Configuration
KVS_CHANNEL_NAME=nova-s2s-webrtc-test
# Bedrock Configuration
BEDROCK_MODEL_ID=amazon.nova-sonic-v1:0
# Logging Configuration
LOGLEVEL=INFO
React Frontend (.env):
# Copy and edit environment template
cp react-webrtc-client/.env.template react-webrtc-client/.env
nano react-webrtc-client/.env # Edit with your values
Required variables:
# AWS Configuration (embedded in client-side code)
REACT_APP_AWS_REGION=ap-northeast-1
REACT_APP_AWS_ACCESS_KEY_ID=your_access_key_here
REACT_APP_AWS_SECRET_ACCESS_KEY=your_secret_key_here
# KVS WebRTC Configuration
REACT_APP_KVS_CHANNEL_NAME=nova-s2s-webrtc-test
Terminal 1 - Python Backend:
# This script handles conda environment creation, dependency installation, and server startup
./start-python-server.sh
# Available options:
# ./start-python-ser
# ./start-python-server.sh --region us-west-2
# ./start-python-server.sh --skip-deps # Skip dependency installation
Terminal 2 - React Frontend:
# This script handles npm installation and client startup
./start-react-client.sh
# Available options:
# ./start-react-client.sh --port 3001
# ./start-react-client.sh --build # Production build
# ./start-react-client.sh --serve # Serve production build
Python Backend:
cd python-webrtc-server
# Create and activate conda environment
conda env create -f environment.yml
conda activate nova-s2s-webrtc
# Or use venv if conda is not available
python3 -m venv .venv
source .venv/bin/activate # Linux/macOS
# .venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
# If your have successful run start-python-server.sh, there is Conda environment "nova-s2s-webrtc".
# So you can start the Python server manually as below.
conda activate nova-s2s-webrtc
# Configure AWS credentials and Kinesis Video Streams signaling channel name
export AWS_ACCESS_KEY_ID=your_access_key_here
export AWS_SECRET_ACCESS_KEY=your_secret
export AWS_REGION=ap-northeast-1
export KVS_CHANNEL_NAME=nova-s2s-webrtc-test
# Start server
python webrtc_server.py
# Available options:
python webrtc_server.py --agent mcp
React Frontend:
cd react-webrtc-client
# Install dependencies
npm install
# Start development server
npm start
- Frontend: Open
http://localhost:3000
in your browser - Grant Permissions: Allow microphone access when prompted
- Test Connection: Click "Start Session" and speak into your microphone
The React app includes a built-in WebRTC testing feature that verifies your complete setup:
# 1. Start the Python server
./start-python-server.sh
# 2. Start the React client
./start-react-client.sh
# 3. In browser (http://localhost:3000):
# - Click the Settings icon (βοΈ) in the top-right corner
# - Scroll down and click "Test WebRTC Configuration"
# - Grant microphone and camera permissions when prompted
# - You should see your video feed and hear test scale audio tones
# - The Python server will save the captured audio/video files in the logs folder
What this test does:
- β Establishes WebRTC peer connection with Python server
- β Captures audio from microphone and video from camera
- β Transmits real-time audio/video data to Python server
- β
Server saves captured media files in
logs/media_test/
folder for verification - β Plays back test scale audio tones to verify audio pipeline
- β Confirms end-to-end WebRTC functionality
Files created during test:
logs/media_test/webrtc_test_*.mp4
- Captured video from your camera and microphone- Check these files to verify audio/video quality and synchronization
Note: This test requires the Python server to be running and uses the full WebRTC pipeline including server-side processing.
The Python server supports both Master and Viewer modes for KVS WebRTC signaling channels. Viewer mode allows the server to join an existing WebRTC session as a participant rather than initiating it.
# Navigate to server directory and activate conda environment
cd sample-nova-sonic-speech2speech-webrtc/python-webrtc-server
conda activate nova-s2s-webrtc
# Configure AWS credentials and region
export AWS_ACCESS_KEY_ID=your_access_key_here
export AWS_SECRET_ACCESS_KEY=your_secret_access_key_here
export AWS_REGION=ap-northeast-1
export KVS_CHANNEL_NAME=nova-s2s-webrtc-test
# Optional: Knowledge Base integration
export KB_ID="your_knowledge_base_id"
export KB_REGION="ap-northeast-1"
# Configure server logging level
export LOGLEVEL="DEBUG" # or "INFO" for production
Master Mode (Default):
# Basic master mode - initiates WebRTC signaling
python webrtc_server.py
python webrtc_server.py --webrtc-role Master
# Master mode with MCP agent integration
python webrtc_server.py --webrtc-role Master --agent mcp
Viewer Mode:
# Basic viewer mode - joins existing WebRTC session
python webrtc_server.py --webrtc-role Viewer
# Viewer mode with MCP agent integration
python webrtc_server.py --webrtc-role Viewer --agent mcp
Mode Differences:
- Master Mode: Initiates and manages the signaling channel, designed for integration with the React frontend application (as the Viewer)
- Viewer Mode: Joins existing signaling channels as a participant, operates independently and supports integration with KVS WebRTC test page and KVS WebRTC SDK applications as the Master.
# macOS/Linux Terminal
./start-python-server.sh
# Windows Git Bash (Recommended)
./start-python-server.sh
# Windows PowerShell
bash ./start-python-server.sh
# Windows Command Prompt
bash start-python-server.sh
Feature | Conda (Recommended) | Venv |
---|---|---|
Cross-platform | β Excellent | |
aiortc installation | β Easy | β Complex, requires system deps |
System dependencies | β Handled automatically | β Manual installation required |
Binary packages | β Pre-compiled | β May require compilation |
Environment isolation | β Complete |
# Basic usage
./start-python-server.sh
# Custom AWS region and signaling channel configuration
./start-python-server.sh \
--region us-west-2 \
--channel my-test-channel
# Testing and development
./start-python-server.sh --skip-deps # Skip dependency installation
./start-python-server.sh --test-only # Test environment setup only
# Development server
./start-react-client.sh
# Production build and deployment
./start-react-client.sh --build # Build for production
./start-react-client.sh --serve # Serve production build
./start-react-client.sh --port 3001 # Custom port
# List environments
conda env list
# Activate/deactivate
conda activate nova-s2s-webrtc
conda deactivate
# Update environment
conda env update -n nova-s2s-webrtc -f environment.yml
# Remove environment
conda env remove -n nova-s2s-webrtc
macOS:
# Install Xcode Command Line Tools
xcode-select --install
# Install dependencies via Homebrew
brew install ffmpeg pkg-config
Linux (Ubuntu/Debian):
sudo apt update
sudo apt install -y \
build-essential \
pkg-config \
ffmpeg \
libavformat-dev \
libavcodec-dev \
libavdevice-dev \
libavfilter-dev \
libavutil-dev \
libswscale-dev \
libswresample-dev \
libasound2-dev \
portaudio19-dev
Windows:
# Install Visual Studio Build Tools
# Download from: https://visualstudio.microsoft.com/visual-cpp-build-tools/
# Install FFmpeg
choco install ffmpeg # Using Chocolatey
# Or download from: https://ffmpeg.org/download.html
- Start Normal Mode:
./scripts/start-python-server.sh
- Open Browser: Navigate to
http://localhost:3000
- Grant Permissions: Allow microphone access
- Test Speech: Click "Start Session" and speak
- Verify AI Response: Wait for Nova Sonic AI response
- Microphone Test: Use built-in browser microphone test
- Test Audio Files: Use provided files in
server_test_audio/
- Latency Monitoring: Check browser console for timing metrics
- Audio Levels: Verify input/output audio levels in interface
# Monitor system resources during testing
top -p $(pgrep -f "python.*webrtc") # Linux/macOS
# Task Manager on Windows
# Check memory usage
ps aux | grep -E "(python|node)" | grep -v grep
# Network connectivity test
ping your-aws-region.amazonaws.com
# System health check
ps aux | grep -E "(python|node)" | grep -v grep
# Check port availability
netstat -tulpn | grep -E "(3000|8765)" # Linux
lsof -i :3000,8765 # macOS
netstat -an | findstr "3000" # Windows
# Check system resources
free -h # Linux
vm_stat # macOS
# Task Manager > Performance tab (Windows)
# aiortc installation fails
conda install -c conda-forge aiortc # Recommended approach
# Or install system dependencies first (if using venv)
# See "Manual System Dependencies" section above
# Check AWS credentials
aws configure list
echo $AWS_ACCESS_KEY_ID
# Test AWS connectivity
aws sts get-caller-identity
# Verify KVS signaling channel exists
aws kinesisvideo list-signaling-channels --region ap-northeast-1
aws kinesisvideo describe-signaling-channel --channel-name nova-s2s-webrtc-test --region ap-northeast-1
# Common KVS channel issues:
# Error: "Signaling channel not found" - Create the channel first (see AWS Configuration section)
# Error: "Access denied" - Check IAM permissions for KinesisVideoStreams
# Error: "Invalid region" - Ensure channel exists in the correct region
# Use the built-in Test WebRTC Configuration first (see Testing section above)
# Check logs/media_test/ folder for saved test files to verify data transmission
# Check browser console for errors:
# - "getUserMedia failed" - Check microphone permissions
# - "ICE connection failed" - Check network/firewall
# - "WebSocket connection failed" - Check server status
# Find and kill processes using ports
# Linux/macOS:
lsof -ti:3000 | xargs kill -9
# Windows:
netstat -ano | findstr :3000
taskkill /PID <PID> /F
# Or use different port for React client:
./scripts/start-react-client.sh --port 3001
macOS:
# Update Xcode Command Line Tools
xcode-select --install
# Apple Silicon specific
conda config --add channels conda-forge
conda config --set channel_priority strict
Linux:
# Permission issues (never use sudo with conda)
conda config --set auto_activate_base false
# Update system packages
sudo apt update && sudo apt upgrade # Ubuntu/Debian
Windows:
# Initialize conda for different shells
conda init bash # Git Bash
conda init powershell # PowerShell
# Enable long paths (Windows 10+)
# Windows Settings > Update & Security > For developers > Developer Mode
# High CPU usage - check processing load
top -p $(pgrep -f "python.*webrtc")
# Memory leaks - monitor over time
watch -n 1 'ps aux | grep python | grep webrtc'
# Audio quality issues - check sample rates and buffer sizes
# See docs/troubleshooting.md for detailed audio optimization
- Check Logs:
- Python:
logs/webrtc_server.log
- Browser: Developer Tools Console
- Python:
- Test WebRTC: Use "Test WebRTC Configuration" in React app Settings
- Verify test files are created in
logs/media_test/
folder - Listen to captured audio and check video quality
- Verify test files are created in
- Detailed Troubleshooting: See docs/troubleshooting.md
- docs/troubleshooting.md: Comprehensive troubleshooting guide
- docs/architecture.md: System architecture and design
- docs/api-reference.md: API endpoints and WebSocket events
- docs/deployment.md: Production deployment guide
For production deployment:
- Security: Use IAM roles instead of access keys where possible
- Scaling: Consider load balancing for multiple server instances
- Monitoring: Implement comprehensive logging and monitoring
- SSL/TLS: Use HTTPS for production deployments
See docs/deployment.md for detailed production setup instructions.
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.